2010 2009 2008 2007 2006 2005 2004 2003 2002 2001
2000 1999 1998 1997 1996 Prior to 1995 Whitepapers
Papers listed here are either freely available on the web or obtained legally. Please respect the various copyright stipulations placed on these documents. If any author would like us to add or to remove their paper from here, please contact us at info@multicoreinfo.com .
Multicore Papers 2010
Above the Clouds: A View of Cloud Computing
M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, M. Zaharia
Communications of the ACM, V 53, N 4, (April 2010), pp. 50-58
Statistics Driven Workload Modeling for the Cloud
Archana Ganapathi, Yanpei Chen, Armando Fox, Randy Katz, David Patterson
Workshop on Self-Managing Database Systems (SMDB), March 2010.
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
J. Stone, D. Gohara, G. Shi
Computing in Science and Engineering (CiSE), 2010
IPDPS 2010
A Lock-Free, Cache-Efficient Multi-Core Synchronization Mechanism for Line-Rate Network Traffic Monitoring [Presentation Slides (ppt)]
Patrick P. C. Lee, Tian Bu, and Girish Chandranmenon
International Parallel & Distributed Processing Symposium (IPDPS) 2010
On the Importance of Bandwidth Control Mechanisms for Scheduling on Large Scale Heterogeneous Platforms
Olivier Beaumont, Hejer Rejeb
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Improving Numerical Reproducibility and Stability in Large-Scale Numerical Simulations on GPUs
Michela Taufer, Philip Saponaro, Omar Padron, Sandeep Patel
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Implementing the Himeno Benchmark with CUDA on GPU Clusters
Everett Phillips, Massimiliano Fatica
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Direct Self-Consistent Field Computations on GPU Clusters [Presentation Slides]
Guochun Shi, Volodymyr Kindratenko, Ivan Ufimtsev, Todd Martinez
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Parallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs
Lifan Xu, Michela Taufer, Stuart Collins, Dionisios Vlacho
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Fine-Grained QoS Scheduling for PCM-based Main Memory Systems [Author website]
Ping Zhou, Yu Du, Youtao Zhang, and Jun Yang
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Performance Impact of Resource Contention in Multicore Systems [Presentation]
Robert Hood, Haoqiang Jin, Piyush Mehrotra, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Improving the Performance of Hypervisor-Based Fault Tolerance
Jun Zhu, Wei Dong, ZheFu Jiang, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Supporting Fault Tolerance in a Data-Intensive Computing Middleware [Presentation]
Tekin Bicer, Wei Jiang, Gagan Agrawal
International Parallel & Distributed Processing Symposium (IPDPS) 2010
A High-Performance Fault-Tolerant Software Framework for Memory on Commodity GPUs
Naoya Maruyama, Akira Nukada, Satoshi Matsuoka
International Parallel & Distributed Processing Symposium (IPDPS) 2010
High Performance Comparison-Based Sorting Algorithm on Many-Core GPUs
Xiaochun Ye, Dongrui Fan, Wei Lin, Nan Yuan, and Paolo Ienne
International Parallel & Distributed Processing Symposium (IPDPS) 2010
GPU Sample Sort [Presentation]
Vitaly Osipov, Peter Sanders, Nikolaj Leischner
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Highly Scalable Parallel Sorting
Edgar Solomonik, Laxmikant Kale
International Parallel & Distributed Processing Symposium (IPDPS) 2010
A Scheduling Framework for Large-Scale, Parallel, and Topology-Aware Applications [Author website]
Pavel Bar, David Carmeli, Valentin Kravtsov, Martin Swain, Assaf Schuster
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Load Regulating Algorithm for Static-Priority Task Scheduling on Multiprocessors [Presentation]
Risat Pathan, Jan Jonsson
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Scheduling algorithms for linear workflow optimization
Kunal Agrawal, Anne Benoit, Loic Magnan, and Yves Robert
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Hypergraph-Based Task-Bundle Scheduling Towards Efficiency and Fairness in Heterogeneous Distributed Systems
Han Zhao, Xinxin Liu, Xiaolin (Andy) Li
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Optimizing and Tuning the Fast Multipole Method for State-of-the-Art Multicore Architectures [Presentation]
A. Chandramowlishwaran, S. Williams, L. Oliker, I. Lashuk, G. Biros, and R. Vuduc
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Parallel I/O Performance: From Events to Ensembles [Presentation]
Andrew Uselton, Mark Hawison, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Achieve Constant Performance Guarantees using Asynchronous Crossbar Scheduling without Speedup
D. Pan, K. Makki, and N. Pissinou
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Object-Oriented Stream Programming using Aspects [Author website]
Mingliang Wang, Manish Parashar
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Optimal Loop Unrolling for GPGPU Programs [Thesis]
Giridhar Sreenivasa Murthy, Muthu Ravishankar, Muthu Manikandan Baskaran, Ponnuswamy Sadayappan
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Speculative Execution on Multi-GPU Systems
Gregory Diamos, Sudakhar Yalamanchili
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Dynamic Load Balancing on Single- and Multi-GPU Systems [Author website]
Long Chen, Oreste Villa, Sriram Krishnamoorthy, Guang R. Gao
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Servet: A Benchmark Suite for Autotuning on Multicore Clusters
Jorge González-Domínguez, Guillermo L. Taboada, Basilio B. Fraguela, María J. Martín, Juan Touriño
International Parallel & Distributed Processing Symposium (IPDPS) 2010
KRASH: Reproducible CPU Load Generation on Many-Cores Machines
Swann Perarnau, Guillaume Huard
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Power-aware MPI Task Aggregation Prediction for High-End Computing Systems
Dong Li, Dimitrios Nikolopoulos, Kirk Cameron, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Varying Bandwidth Resource Allocation Problem with Bag Constraints
Venkatesan Chakaravarthy, Vinayaka Pandit, Yogish Sabharwal, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Decentralized Resource Management for Multi-core Desktop Grids
Jaehwan Lee, Pete Keleher, Alan Sussman
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Dynamic Fractional Resource Scheduling for HPC Workloads
Mark Lee Stillwell, Frédéric Vivien, Henri Casanova
International Parallel & Distributed Processing Symposium (IPDPS) 2010
ADEPT Scalability Predictor in Support of Adaptive Resource Allocation [Presentation]
Arash Deshmeh, Jacob Machina, and Angela Sodan.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Exploiting the Forgiving Nature of Applications for Scalable Parallel Execution
Jiayuan Meng and Srimat Chakradhar, Anand Raghunathan, and Surendra Byna
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Large-Scale Multi-Dimensional Document Clustering on GPU Clusters
Yongpeng Zhang, Frank Mueller, Xiaohui Cui, Thomas Potok
International Parallel & Distributed Processing Symposium (IPDPS) 2010
eScience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in Windows Azure Platform
Jie Li, Deb Agarwal, Marty Humphrey, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Extreme Scale Computing: Modeling the Impact of System Noise in Multicore Clustered Systems
Seetharami R Seelam, Liana Fong, Asser Tantawi, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Oblivious Algorithms for Multicores and Network of Processors
Rezaul Chowdhury, Francesco Silvestri, Brandon Blakeley, Vijaya Ramachandran
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Analyzing and Adjusting User Runtime Estimates to Improve Job Scheduling on the Blue Gene/P
Wei Tang, Narayan Desai, Daniel Buettner, Zhiling Lan
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Performance Evaluation of Concurrent Collections on High-Performance Multicore Computing Systems [Presentation]
Aparna Chandramowlishwaran, Kathleen Knobe, Richard W. Vuduc
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Hybrid MPI/OpenMP Power-aware Computing
Dong Li, Bronis R. de Supinski, Martin Schulz, Kirk Cameron, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Performance and Energy Optimization of Concurrent Pipelined Applications
Anne Benoit, Paul Renaud-Goud, Yves Robert
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Robust Control-theoretic Thermal Balancing for Server Clusters
Yong Fu, Chenyang Lu, Hongan Wang
International Parallel & Distributed Processing Symposium (IPDPS) 2010
A Simple Thermal Model for Multi-core Processors and Its Application to Slack Allocation
Zhe Wang, Sanjay Ranka
International Parallel & Distributed Processing Symposium (IPDPS) 2010
GenerOS: An Asymmetric Operating System Kernel for Multi-core Systems
Qingbo Yuan, Jianbo Zhao, Mingyu Chen, Ninghui Sun
International Parallel & Distributed Processing Symposium (IPDPS) 2010
MMT: Exploiting Fine-Grained Parallelism in Dynamic Memory Management
Devesh Tiwari, Sanghoon Lee, James Tuck, Yan Solihin
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Adapting Communication-Avoiding LU and QR Factorizations to Multicore Architectures [Author website]
Laura Grigori, Simplice Donfack, Alok Kumar Gupta
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Tile QR Factorization with Parallel Panel Processing for Multicore Architectures
Bilel Hadri, Hatem Ltaief, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators
Toshio Endo, Akira Nukada, Satoshi Matsuoka and Naoya Maruyama
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Adapting Cache Partitioning Algorithms to Pseudo-LRU Replacement Policies
Kamil Kedzierski, Miquel Moreto, Francisco J. Cazorla and Mateo Valero
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Exploiting Set-Level Non-Uniformity of Capacity Demand to Enhance CMP Cooperative Caching
Dongyuan Zhan, Hong Jiang, Sharad Seth
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Masking I/O Latency using Application Level I/O Caching and Prefetching on Blue Gene System
Seetharami Seelam, I-Hsin Chung, John Bauer, Hui-Fang Wen
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Intra-Application Cache Partitioning [Related Paper]
Sai Prashanth Muralidhara, Mahmut Taylan Kandemir, Padma Raghavan
International Parallel & Distributed Processing Symposium (IPDPS) 2010
SLAW: a Scalable Locality-aware Adaptive Work-stealing Scheduler
Yi Guo, Jlsheng Zhao, Vincent Cave, Vivek Sarkar
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Executing Task Graphs Using Work-Stealing
Kunal Agrawal, Charles Leiserson, Jim Sukha
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Structuring Execution of OpenMP Applications for Multicore Architectures
François Broquedis, Olivier Aumage, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Oversubscription on Multicore Processors
Costin Iancu, Steven Hofmeyr, Yili Zheng, Filip Blagojevic
International Parallel & Distributed Processing Symposium (IPDPS) 2010
An Auto-Tuning Framework for Parallel Multicore Stencil Computations
Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, Samuel Williams
International Parallel & Distributed Processing Symposium (IPDPS) 2010
DynTile: Parametric Tiled Loop Generation for Parallel Execution on Multicore Processors
Albert Hartono, Muthu Manikandan Baskaran, J. Ram Ramanujan, Ponnuswamy Sadayappan
International Parallel & Distributed Processing Symposium (IPDPS) 2010
A Low Cost Split-Issue Technique to Improve Performance of SMT Clustered VLIW Processors [Author website]
Manoj Gupta, Fermín Sánchez, Josep Llosa
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Exploiting Inter-thread Temporal Locality for Chip Multithreading
Jiayuan Meng, Jeremy Sheaffer, Kevin Skadron
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Profitability-Based Power Allocation for Speculative Multithreaded Systems
Polychronis Xekalakis, Nikolas Ioannou, Salman Khan, Marcelo Cintra
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Optimization of Linked List Prefix Computations on Multithreaded GPUs Using CUDA
Zheng Wei, Joseph Jaja
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Toward Understanding Heterogeneity in Computing
Arnold Rosenberg, Ron Chi-Lung Chiang
International Parallel & Distributed Processing Symposium (IPDPS) 2010
BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map/Reduce Applications
Bogdan Nicolae, Diana Moise, Gabriel Antoniu, Luc Bougé, Matthieu Dorier
International Parallel & Distributed Processing Symposium (IPDPS) 2010
PreDatA - Preparatory Data Analytics on Peta-Scale Machines
Fang Zheng, Hasan Abbasi, Ciprian Docan, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Hierarchical Phasers for Scalable Synchronization and Reductions in Dynamic Parallelism
Jun Shirako, Vivek Sarkar
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Inter-Block GPU Communication via Fast Barrier Synchronization
Shucai Xiao, Wu-chun Feng
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Performance Analysis of the FFT Algorithms for a Many-core Architecture
Long Chen, Guang R. Gao
High Performance Computing Symposium (HPC 2010)
HPCA 2010
Operating System Support for Overlapping-ISA Heterogeneous Multi-Core Architectures
OTong Li, Paul Brett, Rob Knauerhase, David Koufaty, et al.
High-Performance Computer Architecture, (HPCA-16) 2010
ATLAS: A Scalable and High Performance Scheduling Algorithm for Multiple Memory Controllers
Yoongu Kim, Dongsu Han, Onur Mutlu, Mor Harchol-Balter
High-Performance Computer Architecture, (HPCA-16) 2010
Understanding How Off-Chip Memory Bandwidth Partitioning in Chip Multiprocessors Affects System Performance
Fang Liu, Xiaowei Jiang, and Yan Solihin
High-Performance Computer Architecture, (HPCA-16) 2010
CHOP:Adaptive Filter-Based DRAM Caching for CMP Server Platforms
Xiaowei Jiang, Niti Madan, Li Zhao, Mike Upton, Ravishankar Iyer, Srihari Makineni, Donald Newell, Yan Solihin and Rajeev Balasubramonian
High-Performance Computer Architecture, (HPCA-16) 2010
LeadOut: Composing Low-Overhead Frequency-Enhancing Techniques for Single Thread Performance in Configurable Multicores
Brian Greskamp, R. Ulya Karpuzcu, Josep Torrellas
High-Performance Computer Architecture, (HPCA-16) 2010
LiteTM: Reducing Transactional State Overhead
Syed Ali Raza Jafri, Mithuna Thottethodi, T. N. Vijaykumar
High-Performance Computer Architecture, (HPCA-16) 2010
A Bandwidth-Aware Memory Subsytem Resource Management Using Non-Invasive Resource Profilers for Large CMP Systems
Dimitris Kaseridis, Jeffrey Stuecheli, Lizy K. John
High-Performance Computer Architecture, (HPCA-16) 2010
StimulusCache: Boosting Performance of Chip Multiprocessors with Excess Cache
Hyunjin Lee, Sangyeun Cho, and Bruce R. Childers
High-Performance Computer Architecture, (HPCA-16) 2010
ESP-NUCA: A Low-Cost Adaptive Non-Uniform Cache Architecture
Javier Merino, Valentin Puente, Jose-Angel Gregorio
High-Performance Computer Architecture, (HPCA-16) 2010
Towards Scalable, Energy-Efficient Bus-Based On-Chip Networks
Aniruddha N. Udipi, Naveen Muralimanohar, Rajeev Balasubramonian
High-Performance Computer Architecture, (HPCA-16) 2010
DMA Cache: Using On-Chip Storage to Architecturally Separate I/O Data from CPU Data for Improving I/O Performance
Dan Tang, Yungang Bao, Weiwu Hu, Mingyu Chen
High-Performance Computer Architecture, (HPCA-16) 2010
Graphite: A Distributed Parallel Simulator for Large-Scale Multicores
Jason Miller, Harshad Kasture, George Kurian, et al.
High-Performance Computer Architecture, (HPCA-16) 2010
Application Performance Modeling in a Virtualized Environment
Sajib Kundu, Raju Rangaswami, Kaushik Dutta, Ming Zhao
High-Performance Computer Architecture, (HPCA-16) 2010
COMIC++: A Software SVM System for Heterogeneous Multicore Accelerator Clusters
Jaejin Lee, Jun Lee, Sangmin Seo, Jungwon Kim, Seungkyun Kim
High-Performance Computer Architecture, (HPCA-16) 2010
BOLT: An Energy-Efficient Latency-Tolerant Processor
Andrew Hilton, Amir Roth
High-Performance Computer Architecture, (HPCA-16) 2010
An Optimized 3D-Stacked Memory Architecture by Exploiting Excessive, High-Density TSV Bandwidth
Dong Hyuk Woo, Nak Hee Seong, Dean L. Lewis, and Hsien-Hsin S. Lee
High-Performance Computer Architecture, (HPCA-16) 2010
PPoPP 2010
Structure-driven Optimizations for Amorphous Data-parallel Programs
Mario Mendez-Lojo, Donald Nguyen, Dimitrios Prountzos, Xin Sui, M. Amber Hassaan, Milind Kulkarni, Martin Burtscher and Keshav Pingali
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Debugging Programs that use Atomic Blocks and Transactional Memory
Ferad Zyulkyarov, Tim Harris, Osman S. Unsal, Adrián Cristal, Mateo Valero
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Gambit: Effective Unit Testing of Concurrency Libraries
Katherine Coons, Sebastian Burckhardt and Madanlal Musuvathi
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Featherweight X10: a Core Calculus for Async-Finish Parallelism
Jonathan Lee and Jens Palsberg
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Compiler Aided Selective Lock Assignment for Improving the Performance of Software Transactional Memory
Sandya Mannarswamy, Dhruva Chakrabarti, Kaushik Rajan and Sujoy Saraswati
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Is Transactional Programming Really Easier?
Christopher Rossbach, Owen Hofmann and Emmett Witchel
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Debugging Programs that use Atomic Blocks and Transactional Memory
Ferad Zyulkyarov, Tim Harris, Osman Unsal, Adrian Cristal and Mateo Valero
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Scheduling Support for Transactional Memory Contention Management
Walther Maldonado, Patrick Marlier, Pascal Felber, Julia Lawall, Gilles Muller, Adi Suissa, Danny Hendler and Alexandra Fedorova
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
An Adaptive Performance Modeling Tool for GPU Architectures
Sara Baghsorkhi, Matthieu Delahaye, Sanjay Patel, William Gropp and Wen-mei Hwu
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Model-driven Autotuning of Sparse Matrix-Vector Multiply on GPUs
Jee Choi, Amik Singh and Richard Vuduc
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Fast Tridiagonal Solvers on GPU
Yao Zhang, Jonathan Cohen and John Owens
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
CUDAlign: Using GPU to Accelerate the Comparison of Megabase Genomic Sequences
Edans Flávius de O. Sandes and Alba Cristina M. A. Melo
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Load Balancing on Speed
Steven Hofmeyr, Costin Iancu and Filip Blagojevic
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Scalable Communication Protocols for Dynamic Sparse Data Exchange
Torsten Hoefler, Christian Siebert and Andrew Lumsdaine
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
The LOFAR Correlator: Implementation and Performance Analysis
John W. Romein, P. Chris Broekema, Jan David Mol and Rob V. van Nieuwpoort
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Lazy Binary-Splitting: A Run-Time Adaptive Work-Stealing Scheduler
Alexandros Tzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Thread to Strand Binding of Parallel Network Applications in Massive Multi-Threaded Systems [Author website]
Petar Radojkovic, Vladimir Cakarevic, Javier Verdu, Alex Pajuelo, Francisco J. Cazorla, et al.
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Does Cache Sharing on Modern CMP Matter to the Performance of Contemporary Multithreaded Programs?
Eddy Z. Zhang, Yunlian Jiang and Xipeng Shen
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Improving Parallelism and Locality with Asynchronous [Presentation Slides]
Lixia Liu and Zhiyuan Li
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Scaling LAPACK Panel Operations Using Parallel Cache Assignment
Anthony M. Castaldo and R. Clint Whaley
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Modeling Advanced Collective Communication Algorithms on Cell-based Systems
Qasim Ali, Samuel Midkiff and Vijay Pai
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Phantom: Predicting Performance of Parallel Applications on Large-Scale Parallel Machines Using a Single Node
Jidong Zhai, Wenguang Chen and Weimin Zheng
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Input-Driven Dynamic Execution Behavior Prediction of Streaming Applications
Farhana Aleen, Monirul Sharif and Santosh Pande
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications
François Broquedis, Jérôme Clet-Ortega, Stéphanie Moreaud, et al.
My Love: 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2010)
Efficient Parallel Programming in Poly/ML and Isabelle/ML
David C. J. Matthews and Makarius Wenzel
ACM SIGPLAN Workshop on Declarative Aspects of Multicore Programming (DAMP 2010)
COMPASS: A Programmable Data Prefetcher Using Idle GPU Shaders
Dong Hyuk Woo and Hsien-Hsin S. Lee
Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2010
2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 Prior to 1995 Whitepapers

