-
2
-
AlphaZ: A System for Design Space Exploration in the Polyhedral Model
-
1
-
Automatic Scaling of OpenMP Beyond Shared Memory
-
1
-
Just in Time Load Balancing
-
16
-
A Methodology for Fine-Grained Parallelism in JavaScript Applications
-
31
-
Evaluation of Power Consumption at Execution of Multiple Automatically Parallelized and Power Controlled Media Applications on the RP2 Low-Power Multicore
-
32
-
Compiler Optimizations: Machine Learning versus O3
-
46
-
Double Inspection for Run-Time Loop Parallelization
-
46
-
The STAPL Parallel Graph Library
-
61
-
A Hybrid Approach to Proving Memory Reference Monotonicity
-
61
-
Set and Relation Manipulation for the Sparse Polyhedral Framework
-
76
-
OpenCL as a Programming Model for GPU Clusters
-
76
-
Parallel Clustered Low-Rank Approximation of Graphs and Its Application to Link Prediction
-
91
-
CellCilk: Extending Cilk for Heterogeneous Multicore Platforms
-
96
-
OmpSs-OpenCL Programming Model for Heterogeneous Systems
-
106
-
OPELL and PM: A Case Study on Porting Shared Memory Programming Models to Accelerators Architectures
-
112
-
Compiler Optimizations for Industrial Unstructured Mesh CFD Applications on GPUs
-
124
-
Optimizing the Concurrent Execution of Locks and Transactions
-
127
-
UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores
-
141
-
A Study of the Usefulness of Producer/Consumer Synchronization
-
143
-
A Study on the Impact of Compiler Optimizations on High-Level Synthesis
-
156
-
Lock-Free Resizeable Concurrent Tries
-
158
-
FlowPools: A Lock-Free Deterministic Concurrent Dataflow Abstraction
-
171
-
Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation
-
174
-
Task Parallelism and Data Distribution: An Overview of Explicit Parallel Programming Languages
-
185
-
A Mutable Hardware Abstraction to Replace Threads
-
190
-
A Fast Parallel Graph Partitioner for Shared-Memory Inspector/Executor Strategies
-
203
-
Dynamic Task Parallelism with a GPU Work-Stealing Runtime System
-
205
-
A Software-Based Method-Level Speculation Framework for the Java Platform
-
218
-
A Code Merging Optimization Technique for GPU
-
220
-
Ant: A Debugging Framework for MPI Parallel Programs
-
234
-
Compiler Automatic Discovery of OmpSs Task Dependencies
-
237
-
Static Compilation Analysis for Host-Accelerator Communication Optimization
-
249
-
Beyond Do Loops: Data Transfer Generation with Convex Array Regions
-
252
-
Scheduling Support for Communicating Parallel Tasks
-
264
-
Finish Accumulators: An Efficient Reduction Construct for Dynamic Task Parallelism
-
266
-
FlashbackSTM: Improving STM Performance by Remembering the Past
-
268
-
Kaira: Generating Parallel Libraries and Their Usage with Octave
-
268
-
Polytasks: A Compressed Task Representation for HPC Runtimes
-
270
-
Language and Architecture Independent Software Thread-Level Speculation
-
273
-
Abstractions for Defining Semi-Regular Grids Orthogonally from Stencils
-
283
-
Detecting False Sharing in OpenMP Applications Using the DARWIN Framework