Flexible software profiling of GPU architectures (English)
- New search for: Stephenson, Mark
- New search for: Stephenson, Mark
- New search for: Sastry Hari, Siva
- New search for: Lee, Yunsup
- New search for: Ebrahimi, Eiman
- New search for: Johnson, Daniel
- New search for: Nellans, David
- New search for: O'Connor, Mike
- New search for: Keckler, Stephen
In:
Computer architecture news
;
43
, 3
; 185-197
;
2016
-
ISSN:
- Article (Journal) / Print
-
Title:Flexible software profiling of GPU architectures
-
Contributors:Stephenson, Mark ( author ) / Sastry Hari, Siva / Lee, Yunsup / Ebrahimi, Eiman / Johnson, Daniel / Nellans, David / O'Connor, Mike / Keckler, Stephen
-
Published in:Computer architecture news ; 43, 3 ; 185-197
-
Publisher:
- New search for: ACM
-
Place of publication:New York, NY
-
Publication date:2016
-
ISSN:
-
ZDBID:
-
DOI:
-
Type of media:Article (Journal)
-
Type of material:Print
-
Language:English
- New search for: 54.30 / 54.00
- Further information on Basic classification
- New search for: 770/3155
-
Keywords:
-
Classification:
-
Source:
Table of contents – Volume 43, Issue 3
The tables of contents are generated automatically and are based on the data records of the individual contributions available in the index of the TIB portal. The display of the Tables of Contents may therefore be incomplete.
- 1
-
BlueDBMJun, Sang-Woo et al. | 2016
- 2
-
10x10Chien, Andrew A et al. | 2015
- 10
-
Internet NuggetsThorson, Mark et al. | 2015
- 14
-
Towards sustainable in-situ server systems in the big data eraLi, Chao et al. | 2016
- 27
-
DjiNN and TonicHauswald, Johann et al. | 2016
- 41
-
A case for core-assisted bottleneck acceleration in GPUsVijaykumar, Nandita et al. | 2016
- 54
-
HarmoniaPaul, Indrani et al. | 2016
- 66
-
Redundant memory mappings for fast access to large memoriesKarakostas, Vasileios et al. | 2016
- 79
-
Page overlaysSeshadri, Vivek et al. | 2016
- 92
-
ShiDianNaoDu, Zidong et al. | 2016
- 105
-
A scalable processing-in-memory accelerator for parallel graph processingAhn, Junwhan et al. | 2016
- 118
-
Efficient execution of memory access phases using dataflow specializationHo, Chen-Han et al. | 2016
- 131
-
Data reorganization in memory using 3D-stacked DRAMAkin, Berkin et al. | 2016
- 144
-
Quantitative comparison of hardware transactional memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8Nakaike, Takuya et al. | 2016
- 158
-
Profiling a warehouse-scale computerKanev, Svilen et al. | 2016
- 170
-
Computer performance microscopy with S himYang, Xi et al. | 2016
- 185
-
Flexible software profiling of GPU architecturesStephenson, Mark et al. | 2016
- 198
-
BEARChou, Chiachen et al. | 2016
- 211
-
A fully associative, tagless DRAM cacheLee, Yongjun et al. | 2016
- 223
-
Multiple clone row DRAMChoi, Jungwhan et al. | 2016
- 235
-
Flexible auto-refreshBhati, Ishwar et al. | 2016
- 247
-
Cost-effective speculative scheduling in high performance processorsPerais, Arthur et al. | 2016
- 260
-
LaZy superscalarAşılıoğlu, Görkem et al. | 2016
- 272
-
The load slice core microarchitectureCarlson, Trevor et al. | 2016
- 285
-
Semantic locality and context-based prefetching using reinforcement learningPeled, Leeor et al. | 2016
- 298
-
Exploring the potential of heterogeneous von neumann/dataflow execution modelsNowatzki, Tony et al. | 2016
- 311
-
SHRINKLopes, Bruno et al. | 2016
- 323
-
Branch vanguardMcFarlin, Daniel et al. | 2016
- 336
-
PIM-enabled instructionsAhn, Junwhan et al. | 2016
- 349
-
SLIPDas, Subhasis et al. | 2016
- 362
-
CloudMonattZhang, Tianwei et al. | 2016
- 375
-
Reducing world switches in virtualized environment with flexible cross-world callsLi, Wenhao et al. | 2016
- 388
-
ArMORLustig, Daniel et al. | 2016
- 401
-
C leanSegulja, Cedomir et al. | 2016
- 414
-
MiSARLiang, Ching-Kai et al. | 2016
- 427
-
CallbackRos, Alberto et al. | 2016
- 439
-
Thermal time shiftingSkach, Matt et al. | 2016
- 450
-
HeraclesLo, David et al. | 2016
- 463
-
HEBLiu, Longjun et al. | 2016
- 476
-
Architecting to achieve a billion requests per second throughput on a single key-value store server platformLi, Sheng et al. | 2016
- 489
-
A variable warp size architectureRogers, Timothy et al. | 2016
- 502
-
Warped-compressionLee, Sangpil et al. | 2016
- 515
-
CAWALee, Shin-Ying et al. | 2016
- 528
-
Dynamic thread block launchWang, Jin et al. | 2016
- 541
-
DynaSpAMLiu, Feng et al. | 2016
- 554
-
RumbaKhudia, Daya et al. | 2016
- 567
-
Manycore network interfaces for in-memory rack-scale computingDaglis, Alexandros et al. | 2016
- 580
-
Unified address translation for memory-mapped SSDs with FlashMapHuang, Jian et al. | 2016
- 592
-
FASECallan, Robert et al. | 2016
- 604
-
Probable causeRahmati, Amir et al. | 2016
- 616
-
PrORAMYu, Xiangyao et al. | 2016
- 629
-
MBusPannuto, Pat et al. | 2016
- 642
-
Accelerating asynchronous programs through event sneak peekChadha, Gaurav et al. | 2016
- 655
-
VIPNachiappan, Nachiappan et al. | 2016
- 668
-
FaultHoundNitin et al. | 2016
- 682
-
COPPalframan, David et al. | 2016
- 694
-
Hi-fi playbackZhang, Chao et al. | 2016
- 707
-
StashKomuravelli, Rakesh et al. | 2016
- 720
-
Coherence protocol for transparent management of scratchpad memories in shared memory manycore architecturesAlvarez, Lluc et al. | 2016
- 733
-
FusionKumar, Snehasish et al. | 2016