Toward OpenCL Automatic Multi-Device Support (English)

Henry, Sylvain / Denis, Alexandre / Barthou, Denis / Counilh, Marie-Christine / Namyst, Raymond

In: Euro-Par 2014 Parallel Processing : 20th International Conference, Porto, Portugal, August 25-29, 2014. Proceedings ; Chapter: 65 ; 776-787 ; 2014

ISBN:

978-3-319-09873-9, 978-3-319-09872-2

ISSN:

1611-3349, 0302-9743

Article/Chapter (Book) / Electronic Resource

How to get this title?

Check access

Download

Commercial Copyright fee: €29.95 Basic fee: €4.00 Total price: €33.95

Academic Copyright fee: €15.00 Basic fee: €2.00 Total price: €17.00

Export, share and cite

To fully tap into the potential of today heterogeneous machines, offloading parts of an application on accelerators is no longer sufficient. The real challenge is to build systems where the application would permanently spread across the entire machine, that is, where parallel tasks would be dynamically scheduled over the full set of available processing units. In this paper we present SOCL, an OpenCL implementation that improves and simplifies the programming experience on heterogeneous architectures. SOCL enables applications to dynamically dispatch computation kernels over processing devices so as to maximize their utilization. OpenCL applications can incrementally make use of light extensions to automatically schedule kernels in a controlled manner on multi-device architectures. We demonstrate the relevance of our approach by experimenting with several OpenCL applications on a range of heterogeneous architectures. We show that performance portability is enhanced by using SOCL extensions.

Title:

Toward OpenCL Automatic Multi-Device Support
Contributors:

Silva, Fernando ( editor ) / Dutra, Inês ( editor ) / Santos Costa, Vítor ( editor ) / Henry, Sylvain ( author ) / Denis, Alexandre ( author ) / Barthou, Denis ( author ) / Counilh, Marie-Christine ( author ) / Namyst, Raymond ( author )
Conference:

European Conference on Parallel Processing ; 2014 ; Porto, Portugal
Published in:

Euro-Par 2014 Parallel Processing : 20th International Conference, Porto, Portugal, August 25-29, 2014. Proceedings ; Chapter: 65 ; 776-787

Lecture Notes in Computer Science ; 8632 ; 776-787
Publisher:

Springer International Publishing

Place of publication:

Cham
Publication date:

2014-01-01
Size:

12 pages
ISBN:

978-3-319-09873-9, 978-3-319-09872-2
ISSN:

1611-3349, 0302-9743
DOI:

https://doi.org/10.1007/978-3-319-09873-9_65
Type of media:

Article/Chapter (Book)
Type of material:

Electronic Resource
Language:

English
Keywords:

Runtime System , Memory Transfer , Host Memory , Heterogeneous Architecture , OpenCL Kernel

Computer Science , Programming Languages, Compilers, Interpreters , Operating Systems , System Performance and Evaluation , Computer Communication Networks , Algorithm Analysis and Problem Complexity
Source:

Springer Verlag

Table of contents eBook

The tables of contents are generated automatically and are based on the data records of the individual contributions available in the index of the TIB portal. The display of the Tables of Contents may therefore be incomplete.

1: MPI Trace Compression Using Event Flow Graphs
Aguilar, Xavier / Fürlinger, Karl / Laure, Erwin et al. | 2014
digital version
2: ScalaJack: Customized Scalable Tracing with In-situ Data Analysis
Ananthakrishnan, Srinath Krishna / Mueller, Frank et al. | 2014
digital version
3: Performance Measurement and Analysis of Transactional Memory and Speculative Execution on IBM Blue Gene/Q
Jiang, Jie / Philippen, Peter / Knobloch, Michael / Mohr, Bernd et al. | 2014
digital version
4: c-Eclipse: An Open-Source Management Framework for Cloud Applications
Sofokleous, Chrystalla / Loulloudes, Nicholas / Trihinas, Demetris / Pallis, George / Dikaiakos, Marios D. et al. | 2014
digital version
5: Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-core Architectures
Stanisic, Luka / Thibault, Samuel / Legrand, Arnaud / Videau, Brice / Méhaut, Jean-François et al. | 2014
digital version
6: Modeling the Impact of Reduced Memory Bandwidth on HPC Applications
Tiwari, Ananta / Gamst, Anthony / Laurenzano, Michael A. / Schulz, Martin / Carrington, Laura et al. | 2014
digital version
7: ParaShares: Finding the Important Basic Blocks in Multithreaded Programs
Kambadur, Melanie / Tang, Kui / Kim, Martha A. et al. | 2014
digital version
8: Multi-Objective Auto-Tuning with Insieme: Optimization and Trade-Off Analysis for Time, Energy and Resource Usage
Gschwandtner, Philipp / Durillo, Juan J. / Fahringer, Thomas et al. | 2014
digital version
9: Performance Prediction and Evaluation of Parallel Applications in KVM, Xen, and VMware
Hong, Cheol-Ho / Kim, Beom-Joon / Kim, Young-Pil / Park, Hyunchan / Yoo, Chuck et al. | 2014
digital version
10: DReAM: Per-Task DRAM Energy Metering in Multicore Systems
Liu, Qixiao / Moreto, Miquel / Abella, Jaume / Cazorla, Francisco J. / Valero, Mateo et al. | 2014
digital version
11: Characterizing the Performance-Energy Tradeoff of Small ARM Cores in HPC Computation
Laurenzano, Michael A. / Tiwari, Ananta / Jundt, Adam / Peraza, Joshua / Ward, William A. / Campbell, Roy / Carrington, Laura et al. | 2014
digital version
12: On Interactions among Scheduling Policies: Finding Efficient Queue Setup Using High-Resolution Simulations
Klusáček, Dalibor / Tóth, Šimon et al. | 2014
digital version
13: ProPS: A Progressively Pessimistic Scheduler for Software Transactional Memory
Rito, Hugo / Cachopo, João et al. | 2014
digital version
14: A Queueing Theory Approach to Pareto Optimal Bags-of-Tasks Scheduling on Clouds
Dumitru, Cosmin / Oprescu, Ana-Maria / Živković, Miroslav / van der Mei, Rob / Grosso, Paola / de Laat, Cees et al. | 2014
digital version
15: SPAGHETtI: Scheduling/Placement Approach for Task-Graphs on HETerogeneous archItecture
Barthou, Denis / Jeannot, Emmanuel et al. | 2014
digital version
16: Energy-Aware Multi-Organization Scheduling Problem
Cohen, Johanne / Cordeiro, Daniel / Raphael, Pedro Luis F. et al. | 2014
digital version
17: Energy Efficient Scheduling of MapReduce Jobs
Bampis, Evripidis / Chau, Vincent / Letsios, Dimitrios / Lucarelli, Giorgio / Milis, Ioannis / Zois, Georgios et al. | 2014
digital version
18: Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs
Huang, Dafei / Wen, Mei / Xun, Changqing / Chen, Dong / Cai, Xing / Qiao, Yuran / Wu, Nan / Zhang, Chunyuan et al. | 2014
digital version
19: Switchable Scheduling for Runtime Adaptation of Optimization
Bagnères, Lénaïc / Bastoul, Cédric et al. | 2014
digital version
20: A New GCC Plugin-Based Compiler Pass to Add Support for Thread-Level Speculation into OpenMP
Aldea, Sergio / Estebanez, Alvaro / Llanos, Diego R. / Gonzalez-Escribano, Arturo et al. | 2014
digital version
21: Improving Read Performance with Online Access Pattern Analysis and Prefetching
Tang, Houjun / Zou, Xiaocheng / Jenkins, John / Boyuka, David A. / Ranshous, Stephen / Kimpe, Dries / Klasky, Scott / Samatova, Nagiza F. et al. | 2014
digital version
22: Robust and Efficient Large-Large Table Outer Joins on Distributed Infrastructures
Cheng, Long / Kotoulas, Spyros / Ward, Tomas E / Theodoropoulos, Georgios et al. | 2014
digital version
23: Top-k Item Identification on Dynamic and Distributed Datasets
Guerrieri, Alessio / Montresor, Alberto / Velegrakis, Yannis et al. | 2014
digital version
24: Applying Selectively Parallel I/O Compression to Parallel Storage Systems
Filgueira, Rosa / Atkinson, Malcolm / Tanimura, Yusuke / Kojima, Isao et al. | 2014
digital version
25: Ultra-Fast Load Balancing of Distributed Key-Value Stores through Network-Assisted Lookups
De Cesaris, Davide / Katrinis, Kostas / Kotoulas, Spyros / Corradi, Antonio et al. | 2014
digital version
26: Virtual Machine Consolidation in Cloud Data Centers Using ACO Metaheuristic
Ferdaus, Md Hasanul / Murshed, Manzur / Calheiros, Rodrigo N. / Buyya, Rajkumar et al. | 2014
digital version
27: Workflow Scheduling on Federated Clouds
Durillo, Juan J. / Prodan, Radu et al. | 2014
digital version
28: Locality-Aware Cooperation for VM Scheduling in Distributed Clouds
Pastor, Jonathan / Bertier, Marin / Desprez, Frédéric / Lebre, Adrien / Quesnel, Flavien / Tedeschi, Cédric et al. | 2014
digital version
29: Can Inter-VM Shmem Benefit MPI Applications on SR-IOV Based Virtualized Infiniband Clusters?
Zhang, Jie / Lu, Xiaoyi / Jose, Jithin / Shi, Rong / Panda, Dhabaleswar K. (DK) et al. | 2014
digital version
30: Power-Aware L₁ and L₂ Caches for GPGPUs
Atoofian, Ehsan / Manzak, Ali et al. | 2014
digital version
31: Power Consumption Due to Data Movement in Distributed Programming Models
Jana, Siddhartha / Hernandez, Oscar / Poole, Stephen / Chapman, Barbara et al. | 2014
digital version
32: Spanning Tree or Gossip for Aggregation: A Comparative Study
Nyers, Lehel / Jelasity, Márk et al. | 2014
digital version
33: Shades: Expediting Kademlia’s Lookup Process
Einziger, Gil / Friedman, Roy / Kantor, Yoav et al. | 2014
digital version
34: Analysis and Comparison of Truly Distributed Solvers for Linear Least Squares Problems on Wireless Sensor Networks
Prikopa, Karl E. / Straková, Hana / Gansterer, Wilfried N. et al. | 2014
digital version
35: High-Performance Computer Algebra: A Hecke Algebra Case Study
Maier, Patrick / Livesey, Daria / Loidl, Hans-Wolfgang / Trinder, Phil et al. | 2014
digital version
36: Generic Deterministic Random Number Generation in Dynamic-Multithreaded Platforms
Mor, Stefano / Roch, Jean-Louis / Maillard, Nicolas et al. | 2014
digital version
37: Implementation and Performance Analysis of SkelGIS for Network Mesh-Based Simulations
Coullon, Hélène / Limet, Sébastien et al. | 2014
digital version
38: GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics
Simmhan, Yogesh / Kumbhare, Alok / Wickramaarachchi, Charith / Nagarkar, Soonil / Ravi, Santosh / Raghavendra, Cauligi / Prasanna, Viktor et al. | 2014
digital version
39: Resolving Semantic Conflicts in Word Based Software Transactional Memory
Sharp, Craig / Blewitt, William / Morgan, Graham et al. | 2014
digital version
40: Automatic Tuning of the Parallelism Degree in Hardware Transactional Memory
Rughetti, Diego / Romano, Paolo / Quaglia, Francesco / Ciciani, Bruno et al. | 2014
digital version
41: A Distributed CPU-GPU Sparse Direct Solver
Sao, Piyush / Vuduc, Richard / Li, Xiaoye Sherry et al. | 2014
digital version
42: Parallel Computation of Echelon Forms
Dumas, Jean-Guillaume / Gautier, Thierry / Pernet, Clément / Sultan, Ziad et al. | 2014
digital version
43: Time-Domain BEM for the Wave Equation: Optimization and Hybrid Parallelization
Bramas, Berenger / Coulaud, Olivier / Sylvand, Guillaume et al. | 2014
digital version
44: Structured Orthogonal Inversion of Block p-Cyclic Matrices on Multicores with GPU Accelerators
Gogolenko, Sergiy / Bai, Zhaojun / Scalettar, Richard et al. | 2014
digital version
45: High-Throughput Maps on Message-Passing Manycore Architectures: Partitioning versus Replication
Shahmirzadi, Omid / Ropars, Thomas / Schiper, André et al. | 2014
digital version
46: A Fast Sparse Block Circulant Matrix Vector Product
Romero, Eloy / Tomás, Andrés / Soriano, Antonio / Blanquer, Ignacio et al. | 2014
digital version
47: Scheduling Data Flow Program in XKaapi: A New Affinity Based Algorithm for Heterogeneous Architectures
Bleuse, Raphaël / Gautier, Thierry / Lima, João V. F. / Mounié, Grégory / Trystram, Denis et al. | 2014
digital version
48: Delegation Locking Libraries for Improved Performance of Multithreaded Programs
Klaftenegger, David / Sagonas, Konstantinos / Winblad, Kjell et al. | 2014
digital version
49: A Generic Strategy for Multi-stage Stencils
Bianco, Mauro / Cumming, Benjamin et al. | 2014
digital version
50: Evaluation of OpenMP Task Scheduling Algorithms for Large NUMA Architectures
Clet-Ortega, Jérôme / Carribault, Patrick / Pérache, Marc et al. | 2014
digital version
51: Power-Aware Replica Placement in Tree Networks with Multiple Servers per Client
Aupy, Guillaume / Benoit, Anne / Journault, Matthieu / Robert, Yves et al. | 2014
digital version
52: On Constructing DAG-Schedules with Large AREAs
Roche, Scott T. / Rosenberg, Arnold L. / Rajaraman, Rajmohan et al. | 2014
digital version
53: Software Defined Multicasting for MPI Collective Operation Offloading with the NetFPGA
Arap, Omer / Brown, Geoffrey / Himebaugh, Bryce / Swany, Martin et al. | 2014
digital version
54: MapReduce over Lustre: Can RDMA-Based Approach Benefit?
Rahman, Md. Wasi-ur / Lu, Xiaoyi / Islam, Nusrat Sharmin / Rajachandrasekar, Raghunath / Panda, Dhabaleswar K. (DK) et al. | 2014
digital version
55: Random Fields Generation on the GPU with the Spectral Turning Bands Method
Hunger, Lars / Cosenza, Biagio / Kimeswenger, Stefan / Fahringer, Thomas et al. | 2014
digital version
56: Fast Set Intersection through Run-Time Bitmap Construction over PForDelta-Compressed Indexes
Zou, Xiaocheng / Lakshminarasimhan, Sriram / Boyuka, David A. / Ranshous, Stephen / Tang, Houjun / Klasky, Scott / Samatova, Nagiza F. et al. | 2014
digital version
57: Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS
González-Domínguez, Jorge / Schmidt, Bertil / Kässens, Jan Christian / Wienbrandt, Lars et al. | 2014
digital version
58: IFM: A Scalable High Resolution Flood Modeling Framework
Singhal, Swati / Aneja, Sandhya / Liu, Frank / Real, Lucas Villa / George, Thomas et al. | 2014
digital version
59: High Performance Pseudo-analytical Simulation of Multi-Object Adaptive Optics over Multi-GPU Systems
Abdelfattah, Ahmad / Gendron, Eric / Gratadour, Damien / Keyes, David / Ltaief, Hatem / Sevin, Arnaud / Vidal, Fabrice et al. | 2014
digital version
60: Parallel Dual Tree Traversal on Multi-core and Many-core Architectures for Astrophysical N-body Simulations
Lange, Benoit / Fortin, Pierre et al. | 2014
digital version
61: Customizing Driving Directions with GPUs
Delling, Daniel / Kobitzsch, Moritz / Werneck, Renato F. et al. | 2014
digital version
62: GPU Accelerated Range Trees with Applications
Maramreddy, Manoj Kumar / Kothapalli, Kishore et al. | 2014
digital version
63: Scalable On-Board Multi-GPU Simulation of Long-Range Molecular Dynamics
Novalbos, Marcos / González, Jaime / Otaduy, Miguel A. / Martinez-Benito, Roberto / Sanchez, Alberto et al. | 2014
digital version
64: Resolution of Linear Algebra for the Discrete Logarithm Problem Using GPU and Multi-core Architectures
Jeljeli, Hamza et al. | 2014
digital version
65: Toward OpenCL Automatic Multi-Device Support
Henry, Sylvain / Denis, Alexandre / Barthou, Denis / Counilh, Marie-Christine / Namyst, Raymond et al. | 2014
digital version
66: Concurrent Kernel Execution on Xeon Phi within Parallel Heterogeneous Workloads
Wende, Florian / Steinke, Thomas / Cordes, Frank et al. | 2014
digital version
67: Writing Self-adaptive Codes for Heterogeneous Systems
Fabeiro, Jorge F. / Andrade, Diego / Fraguela, Basilio B. / Doallo, Ramón et al. | 2014
digital version
68: A Pattern-Based Comparison of OpenACC and OpenMP for Accelerator Computing
Wienke, Sandra / Terboven, Christian / Beyer, James C. / Müller, Matthias S. et al. | 2014
digital version

How to get this title?

Check access

Download

Commercial Copyright fee: €29.95 Basic fee: €4.00 Total price: €33.95

Academic Copyright fee: €15.00 Basic fee: €2.00 Total price: €17.00

Quicklinks

Borrowing & Ordering

Quicklinks

Search & discover

Quicklinks

Learning & working

Quicklinks

Publishing & Archiving

Quicklinks

About the TIB

Quicklinks

Research & Development

Toward OpenCL Automatic Multi-Device Support (English)

How to get this title?

Export, share and cite

More details on this result

Table of contents

Table of contents eBook

Similar titles

How to get this title?

Export, share and cite