Flexible software profiling of GPU architectures (English)

Stephenson, Mark

In: Computer architecture news ; 43 , 3 ; 185-197 ; 2016

ISSN:

0163-5964

Article (Journal) / Print

How to get this title?

Local TIB services

Order copy

LUH Campus collection

TIB document delivery Purchase

Pricing information

Export, share and cite

http://dl.acm.org/citation.cfm?id=2750375

To aid application characterization and architecture design space exploration, researchers and engineers have developed a wide range of tools for CPUs, including simulators, profilers, and binary instrumentation tools. With the advent of GPU computing, GPU manufacturers have developed similar tools leveraging hardware profiling and debugging hooks. To date, these tools are largely limited by the fixed menu of options provided by the tool developer and do not offer the user the flexibility to observe or act on events not in the menu. This paper presents SASSI (NVIDIA assembly code "SASS" Instrumentor), a low-level assembly-language instrumentation tool for GPUs. Like CPU binary instrumentation tools, SASSI allows a user to specify instructions at which to inject user-provided instrumentation code. These facilities allow strategic placement of counters and code into GPU assembly code to collect user-directed, fine-grained statistics at hardware speeds. SASSI instrumentation is inherently parallel, leveraging the concurrency of the underlying hardware. In addition to the details of SASSI, this paper provides four case studies that show how SASSI can be used to characterize applications and explore the architecture design space along the dimensions of instruction control flow, memory systems, value similarity, and resilience.

Title:

Flexible software profiling of GPU architectures
Contributors:

Stephenson, Mark ( author ) / Sastry Hari, Siva / Lee, Yunsup / Ebrahimi, Eiman / Johnson, Daniel / Nellans, David / O'Connor, Mike / Keckler, Stephen
Published in:

Computer architecture news ; 43, 3 ; 185-197
Publisher:

ACM

Place of publication:

New York, NY
Publication date:

2016
ISSN:

0163-5964
ZDBID:

1860124
DOI:

https://doi.org/10.1145/2872887.2750375
Type of media:

Article (Journal)
Type of material:

Print
Language:

English

Keywords:

Computerarchitektur , Zeitschrift

Classification:

BKL:		54.30 Systemarchitektur: Allgemeines / 54.00 Informatik: Allgemeines
Local classification TIB:		770/3155

Source:

Online Contents

Table of contents – Volume 43, Issue 3

Show all volumes and issues

The tables of contents are generated automatically and are based on the data records of the individual contributions available in the index of the TIB portal. The display of the Tables of Contents may therefore be incomplete.

1: BlueDBM
Jun, Sang-Woo et al. | 2016
print version
2: 10x10
Chien, Andrew A et al. | 2015
print version
10: Internet Nuggets
Thorson, Mark et al. | 2015
print version
14: Towards sustainable in-situ server systems in the big data era
Li, Chao et al. | 2016
print version
27: DjiNN and Tonic
Hauswald, Johann et al. | 2016
print version
41: A case for core-assisted bottleneck acceleration in GPUs
Vijaykumar, Nandita et al. | 2016
print version
54: Harmonia
Paul, Indrani et al. | 2016
print version
66: Redundant memory mappings for fast access to large memories
Karakostas, Vasileios et al. | 2016
print version
79: Page overlays
Seshadri, Vivek et al. | 2016
print version
92: ShiDianNao
Du, Zidong et al. | 2016
print version
105: A scalable processing-in-memory accelerator for parallel graph processing
Ahn, Junwhan et al. | 2016
print version
118: Efficient execution of memory access phases using dataflow specialization
Ho, Chen-Han et al. | 2016
print version
131: Data reorganization in memory using 3D-stacked DRAM
Akin, Berkin et al. | 2016
print version
144: Quantitative comparison of hardware transactional memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8
Nakaike, Takuya et al. | 2016
print version
158: Profiling a warehouse-scale computer
Kanev, Svilen et al. | 2016
print version
170: Computer performance microscopy with S him
Yang, Xi et al. | 2016
print version
185: Flexible software profiling of GPU architectures
Stephenson, Mark et al. | 2016
print version
198: BEAR
Chou, Chiachen et al. | 2016
print version
211: A fully associative, tagless DRAM cache
Lee, Yongjun et al. | 2016
print version
223: Multiple clone row DRAM
Choi, Jungwhan et al. | 2016
print version
235: Flexible auto-refresh
Bhati, Ishwar et al. | 2016
print version
247: Cost-effective speculative scheduling in high performance processors
Perais, Arthur et al. | 2016
print version
260: LaZy superscalar
Aşılıoğlu, Görkem et al. | 2016
print version
272: The load slice core microarchitecture
Carlson, Trevor et al. | 2016
print version
285: Semantic locality and context-based prefetching using reinforcement learning
Peled, Leeor et al. | 2016
print version
298: Exploring the potential of heterogeneous von neumann/dataflow execution models
Nowatzki, Tony et al. | 2016
print version
311: SHRINK
Lopes, Bruno et al. | 2016
print version
323: Branch vanguard
McFarlin, Daniel et al. | 2016
print version
336: PIM-enabled instructions
Ahn, Junwhan et al. | 2016
print version
349: SLIP
Das, Subhasis et al. | 2016
print version
362: CloudMonatt
Zhang, Tianwei et al. | 2016
print version
375: Reducing world switches in virtualized environment with flexible cross-world calls
Li, Wenhao et al. | 2016
print version
388: ArMOR
Lustig, Daniel et al. | 2016
print version
401: C lean
Segulja, Cedomir et al. | 2016
print version
414: MiSAR
Liang, Ching-Kai et al. | 2016
print version
427: Callback
Ros, Alberto et al. | 2016
print version
439: Thermal time shifting
Skach, Matt et al. | 2016
print version
450: Heracles
Lo, David et al. | 2016
print version
463: HEB
Liu, Longjun et al. | 2016
print version
476: Architecting to achieve a billion requests per second throughput on a single key-value store server platform
Li, Sheng et al. | 2016
print version
489: A variable warp size architecture
Rogers, Timothy et al. | 2016
print version
502: Warped-compression
Lee, Sangpil et al. | 2016
print version
515: CAWA
Lee, Shin-Ying et al. | 2016
print version
528: Dynamic thread block launch
Wang, Jin et al. | 2016
print version
541: DynaSpAM
Liu, Feng et al. | 2016
print version
554: Rumba
Khudia, Daya et al. | 2016
print version
567: Manycore network interfaces for in-memory rack-scale computing
Daglis, Alexandros et al. | 2016
print version
580: Unified address translation for memory-mapped SSDs with FlashMap
Huang, Jian et al. | 2016
print version
592: FASE
Callan, Robert et al. | 2016
print version
604: Probable cause
Rahmati, Amir et al. | 2016
print version
616: PrORAM
Yu, Xiangyao et al. | 2016
print version
629: MBus
Pannuto, Pat et al. | 2016
print version
642: Accelerating asynchronous programs through event sneak peek
Chadha, Gaurav et al. | 2016
print version
655: VIP
Nachiappan, Nachiappan et al. | 2016
print version
668: FaultHound
Nitin et al. | 2016
print version
682: COP
Palframan, David et al. | 2016
print version
694: Hi-fi playback
Zhang, Chao et al. | 2016
print version
707: Stash
Komuravelli, Rakesh et al. | 2016
print version
720: Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures
Alvarez, Lluc et al. | 2016
print version
733: Fusion
Kumar, Snehasish et al. | 2016
print version

How to get this title?

Local TIB services

Order copy

LUH Campus collection

TIB document delivery Purchase

Pricing information

Quicklinks

Borrowing & Ordering

Quicklinks

Search & discover

Quicklinks

Learning & working

Quicklinks

Publishing & Archiving

Quicklinks

About the TIB

Quicklinks

Research & Development

Flexible software profiling of GPU architectures (English)

How to get this title?

Export, share and cite

More details on this result

Table of contents

Table of contents – Volume 43, Issue 3

Similar titles

How to get this title?

Export, share and cite