MulticoreInfo.com header image 3

Multicore Research Papers - 2010

2010 2009 2008 2007 2006 2005 2004 2003 2002 2001
2000 1999 1998 1997 1996 Prior to 1995 Whitepapers

Papers listed here are either freely available on the web or obtained legally. Please respect the various copyright stipulations placed on these documents. If any author would like us to add or to remove their paper from here, please contact us at info@multicoreinfo.com .

Multicore Papers 2010

Supercomputing 2010
Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems
Fengguang Song, Hatem Ltaief, Bilel Hadri, Jack Dongarra

On-Chip Network Evaluation Framework [Author web site]
Hanjoon Kim, Seulki Heo, Junghoon Lee, Jaehyuk Huh, John Kim

Parallel Fast Gauss Transform
Rahul S. Sampath, Hari Sundar, Shravan K. Veerapaneni

Circuit-Switched Memory Access in Photonic Interconnection Networks for High-Performance Embedded Computing [Author website]
G. Hendry, E. Robinson, V. Gleyzer, J. Chan, L. P. Carloni, N. Bliss, and K. Bergman

CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors
Asit K. Mishra, Shekhar Srikantaiah, Mahmut Kandemir, Chita R. Das

A Multi-Scale Heart Simulation on Massively Parallel Computers
Akira Hosoi, Takumi Washio, Jun-ichi Okada, Yoshimasa Kadooka, Kengo Nakajima, Toshiaki Hisada

Understanding the Impact of Emerging Non-Volatile Memories on High-Performance, IO-Intensive Computing
Adrian M. Caulfield, Joel Coburn, Todor I. Mollov, Arup De, Ameen Akel, Jiahua He, Arun Jagatheesan, Rajesh K. Gupta, Allan Snavely, Steven Swanson

3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs [Author web site]
Anthony Nguyen, Nadathur Satish, Jatin Chhugani, Changkyu Kim, Pradeep Dubey

DASH: a Recipe for a Flash-based Data Intensive Supercomputer
Jiahua He, Arun Jagatheesan, Sandeep Gupta, Jeffrey Bennett, Allan Snavely

An 80-Fold Speedup, 15.0 TFlops, Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code [Author web site]
Takashi Shimokawabe, Takayuki Aoki, Chiashi Muroi, Junichi Ishida, Kohei Kawano, Toshio Endo, Akira Nukada, Naoya Maruyama, Satoshi Matsuoka

Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support [Author web site]
Xiangyu Dong, Yuan Xie, Naveen Muralimanohar, Norman P. Jouppi

Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
Adam Moody, Greg Bronevetsky, Kathryn Mohror, Bronis R. de Supinski

vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload
Ardalan Kangarlou, Sahan Gamage, Ramana Rao Kompella, Dongyan Xu

A Flexible Reservation Algorithm for Advance Network Provisioning
Mehmet Balman, Evangelos Chaniotakis, Arie Shoshani, Alex Sim

Characterizing the Influence of System Noise on Large-Scale Applications by Simulation
Torsten Hoefler, Timo Schneider, Andrew Lumsdaine

Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed Memories
Jae-Seung Yeom, Dimitrios S. Nikolopoulos

Fast PGAS Implementation of Distributed Graph Algorithms
Guojing Cong, George Almasi, Vijay Saraswat

Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
Louis-Noel Pouchet, Uday Bondhugula, Cedric Bastoul, Albert Cohen, J. Ramanujam, P. Sadayappan

Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory
Roger Pearce, Maya Gokhale, Nancy M. Amato

An Adaptive Framework for Simulation and Online Remote Visualization of Critical Climate Applications in Resource-Constrained Environments
Preeti Malakar, Vijay Natarajan, Sathish S. Vadhiyar

Scalable Graph Exploration on Multicore Processors
Fabrizio Petrini, Virat Agarwal, Davide Pasetto, David Bader

OpenMPC: Extended OpenMP Programming and Tuning for GPUs
Seyong Lee, Rudolf Eigenmann

Accelerating I/O Forwarding in IBM Blue Gene/P Systems
Venkatram Vishwanath, Mark Hereld, Kamil Iskra, Dries Kimpe, Vitali Morozov, Michael E. Papka, Robert Ross, Kazutomo Yoshii

The 48-Core SCC Processor: The Programmer’s View [Author web site]
Tim Mattson, Rob Van der Wijngaart, Michael Riepen, Thomas Lehnig, Paul Brett, et al.

Managing Variability in the I/O Performance of Petascale Storage Systems
Jay Lofstead, Fang Zheng, Qing Liu, Scott Klasky, Ron Oldfield, Todd Kordenbrock, Karsten Schwan, Matthew Wolf

A Block-Oriented Language and Runtime System for Tensor Algebra with Very Large Arrays
Beverly A. Sanders, Rod Bartlett, Erik Deumens, Victor Lotrich, Mark Ponton

IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination
Xuechen Zhang, Kei Davis, Song Jiang

A Scalable and Distributed Dynamic Formal Verifier for MPI Programs
Anh Vo, Sriram Aananthakrishnan, Ganesh Gopalakrishnan, Greg Bronevetsky, Bronis R. de Supinski, Martin Schulz

JAWS: Job-Aware Workload Scheduling for the Exploration of Turbulence Simulations [Author web site]
Xiaodan Wang, Eric Perlman, Randal Burns, Tanu Malik, Tamas Budavari, Charles Meneveau, Alexander Szalay

FlowChecker: Detecting Bugs in MPI Libraries via Message Flow Checking
Zhezhe Chen, Qi Gao, Wenbin Zhang, Feng Qin

Automatic Run-time Parallelization and Transformation of I/O
Thorvald Natvig, Anne C. Elster, Jan Christian Meyer

Scalable Identification of Load Imbalance in Parallel Executions using Call Path Profiles [Author web site]
Nathan R. Tallent, Laksono Adhianto, John M. Mellor-Crummey

Functional Partitioning to Optimize End-to-End Performance on Many-Core Architectures
Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim, Christian Engelmann, Galen Shipman

Diagnosis, Tuning and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method
Aparna Chandramowlishwaran, Kamesh Madduri, Richard Vuduc

Hierarchical Diagonal Blocking and Precision Reduction Applied to Combinatorial Multigrid
Guy Blelloch, Ioannis Koutis, Gary L. Miller, Kanat Tangwongsan

Scaling Hierarchical N-Body Simulations on GPU Clusters
Pritish Jetley, Lukasz Wesolowski, Filippo Gioachin, Laxmikant V. Kale, Thomas R. Quinn

Elastic Cloud Caches for Accelerating Service-Oriented Computations
David Chiu, Gagan Agrawal, Apeksha Shetty

Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance
Abdullah Gharaibeh, Matei Ripeanu

Data Sharing Options for Scientific Workflows on Amazon EC2
Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta, Benjamin P. Berman, Bruce Berriman, Phil Maechling

Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations [Author web site]
Scott Hampton, Sadaf Alam, Paul Crozier, Pratul Agarwal

Power-Aware Consolidation of Scientific Workflows in Virtualized Environments
Qian Zhu, Jiedan Zhu, Gagan Agrawal

Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics
Ronald Babich, Michael A. Clark, et al.

The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches
David Tarjan, Kevin Skadron

A Parallel Implementation of Electron-Phonon Scattering in Nanoelectronic Devices up to 95K Cores
Mathieu Luisier

Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses [Author web site]
Andreas Sandberg, David Eklöv, Erik Hagersten

Direct Numerical Simulation of Particulate Flows on 294912 Processor Cores
Jan Götz, Klaus Iglberger, Markus Stürmer, Ulrich Rüde

PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications
Martin Burtscher, Byoung-Do Kim, Jeff Diamond, John McCalpin, Lars Koesterke, James Browne

End Supercomputing 2010 Papers

Parallel Sparse Polynomial Division Using Heaps
Michael Monagan, Roman Pearce
PASCO 2010 (Parallel Symbolic Computation)

Above the Clouds: A View of Cloud Computing
M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, M. Zaharia
Communications of the ACM, V 53, N 4, (April 2010), pp. 50-58

Statistics Driven Workload Modeling for the Cloud
Archana Ganapathi, Yanpei Chen, Armando Fox, Randy Katz, David Patterson
Workshop on Self-Managing Database Systems (SMDB), March 2010.

OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
J. Stone, D. Gohara, G. Shi
Computing in Science and Engineering (CiSE), 2010

IPDPS 2010
A Lock-Free, Cache-Efficient Multi-Core Synchronization Mechanism for Line-Rate Network Traffic Monitoring [Presentation Slides (ppt)]
Patrick P. C. Lee, Tian Bu, and Girish Chandranmenon
International Parallel & Distributed Processing Symposium (IPDPS) 2010

On the Importance of Bandwidth Control Mechanisms for Scheduling on Large Scale Heterogeneous Platforms
Olivier Beaumont, Hejer Rejeb
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Improving Numerical Reproducibility and Stability in Large-Scale Numerical Simulations on GPUs
Michela Taufer, Philip Saponaro, Omar Padron, Sandeep Patel
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Implementing the Himeno Benchmark with CUDA on GPU Clusters
Everett Phillips, Massimiliano Fatica
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Direct Self-Consistent Field Computations on GPU Clusters [Presentation Slides]
Guochun Shi, Volodymyr Kindratenko, Ivan Ufimtsev, Todd Martinez
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Parallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs
Lifan Xu, Michela Taufer, Stuart Collins, Dionisios Vlacho
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Fine-Grained QoS Scheduling for PCM-based Main Memory Systems [Author website]
Ping Zhou, Yu Du, Youtao Zhang, and Jun Yang
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Performance Impact of Resource Contention in Multicore Systems [Presentation]
Robert Hood, Haoqiang Jin, Piyush Mehrotra, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Improving the Performance of Hypervisor-Based Fault Tolerance
Jun Zhu, Wei Dong, ZheFu Jiang, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Supporting Fault Tolerance in a Data-Intensive Computing Middleware [Presentation]
Tekin Bicer, Wei Jiang, Gagan Agrawal
International Parallel & Distributed Processing Symposium (IPDPS) 2010

A High-Performance Fault-Tolerant Software Framework for Memory on Commodity GPUs
Naoya Maruyama, Akira Nukada, Satoshi Matsuoka
International Parallel & Distributed Processing Symposium (IPDPS) 2010

High Performance Comparison-Based Sorting Algorithm on Many-Core GPUs
Xiaochun Ye, Dongrui Fan, Wei Lin, Nan Yuan, and Paolo Ienne
International Parallel & Distributed Processing Symposium (IPDPS) 2010

GPU Sample Sort [Presentation]
Vitaly Osipov, Peter Sanders, Nikolaj Leischner
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Highly Scalable Parallel Sorting
Edgar Solomonik, Laxmikant Kale
International Parallel & Distributed Processing Symposium (IPDPS) 2010

A Scheduling Framework for Large-Scale, Parallel, and Topology-Aware Applications [Author website]
Pavel Bar, David Carmeli, Valentin Kravtsov, Martin Swain, Assaf Schuster
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Load Regulating Algorithm for Static-Priority Task Scheduling on Multiprocessors [Presentation]
Risat Pathan, Jan Jonsson
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Scheduling algorithms for linear workflow optimization
Kunal Agrawal, Anne Benoit, Loic Magnan, and Yves Robert
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Hypergraph-Based Task-Bundle Scheduling Towards Efficiency and Fairness in Heterogeneous Distributed Systems
Han Zhao, Xinxin Liu, Xiaolin (Andy) Li
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Optimizing and Tuning the Fast Multipole Method for State-of-the-Art Multicore Architectures [Presentation]
A. Chandramowlishwaran, S. Williams, L. Oliker, I. Lashuk, G. Biros, and R. Vuduc
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Parallel I/O Performance: From Events to Ensembles [Presentation]
Andrew Uselton, Mark Hawison, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Achieve Constant Performance Guarantees using Asynchronous Crossbar Scheduling without Speedup
D. Pan, K. Makki, and N. Pissinou
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Object-Oriented Stream Programming using Aspects [Author website]
Mingliang Wang, Manish Parashar
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Optimal Loop Unrolling for GPGPU Programs [Thesis]
Giridhar Sreenivasa Murthy, Muthu Ravishankar, Muthu Manikandan Baskaran, Ponnuswamy Sadayappan
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Speculative Execution on Multi-GPU Systems
Gregory Diamos, Sudakhar Yalamanchili
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Dynamic Load Balancing on Single- and Multi-GPU Systems [Author website]
Long Chen, Oreste Villa, Sriram Krishnamoorthy, Guang R. Gao
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Servet: A Benchmark Suite for Autotuning on Multicore Clusters
Jorge González-Domínguez, Guillermo L. Taboada, Basilio B. Fraguela, María J. Martín, Juan Touriño
International Parallel & Distributed Processing Symposium (IPDPS) 2010

KRASH: Reproducible CPU Load Generation on Many-Cores Machines
Swann Perarnau, Guillaume Huard
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Power-aware MPI Task Aggregation Prediction for High-End Computing Systems
Dong Li, Dimitrios Nikolopoulos, Kirk Cameron, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Varying Bandwidth Resource Allocation Problem with Bag Constraints
Venkatesan Chakaravarthy, Vinayaka Pandit, Yogish Sabharwal, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Decentralized Resource Management for Multi-core Desktop Grids
Jaehwan Lee, Pete Keleher, Alan Sussman
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Dynamic Fractional Resource Scheduling for HPC Workloads
Mark Lee Stillwell, Frédéric Vivien, Henri Casanova
International Parallel & Distributed Processing Symposium (IPDPS) 2010

ADEPT Scalability Predictor in Support of Adaptive Resource Allocation [Presentation]
Arash Deshmeh, Jacob Machina, and Angela Sodan.
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Exploiting the Forgiving Nature of Applications for Scalable Parallel Execution
Jiayuan Meng and Srimat Chakradhar, Anand Raghunathan, and Surendra Byna
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Large-Scale Multi-Dimensional Document Clustering on GPU Clusters
Yongpeng Zhang, Frank Mueller, Xiaohui Cui, Thomas Potok
International Parallel & Distributed Processing Symposium (IPDPS) 2010

eScience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in Windows Azure Platform
Jie Li, Deb Agarwal, Marty Humphrey, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Extreme Scale Computing: Modeling the Impact of System Noise in Multicore Clustered Systems
Seetharami R Seelam, Liana Fong, Asser Tantawi, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Oblivious Algorithms for Multicores and Network of Processors
Rezaul Chowdhury, Francesco Silvestri, Brandon Blakeley, Vijaya Ramachandran
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Analyzing and Adjusting User Runtime Estimates to Improve Job Scheduling on the Blue Gene/P
Wei Tang, Narayan Desai, Daniel Buettner, Zhiling Lan
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Performance Evaluation of Concurrent Collections on High-Performance Multicore Computing Systems [Presentation]
Aparna Chandramowlishwaran, Kathleen Knobe, Richard W. Vuduc
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Hybrid MPI/OpenMP Power-aware Computing
Dong Li, Bronis R. de Supinski, Martin Schulz, Kirk Cameron, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Performance and Energy Optimization of Concurrent Pipelined Applications
Anne Benoit, Paul Renaud-Goud, Yves Robert
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Robust Control-theoretic Thermal Balancing for Server Clusters
Yong Fu, Chenyang Lu, Hongan Wang
International Parallel & Distributed Processing Symposium (IPDPS) 2010

A Simple Thermal Model for Multi-core Processors and Its Application to Slack Allocation
Zhe Wang, Sanjay Ranka
International Parallel & Distributed Processing Symposium (IPDPS) 2010

GenerOS: An Asymmetric Operating System Kernel for Multi-core Systems
Qingbo Yuan, Jianbo Zhao, Mingyu Chen, Ninghui Sun
International Parallel & Distributed Processing Symposium (IPDPS) 2010

MMT: Exploiting Fine-Grained Parallelism in Dynamic Memory Management
Devesh Tiwari, Sanghoon Lee, James Tuck, Yan Solihin
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Adapting Communication-Avoiding LU and QR Factorizations to Multicore Architectures [Author website]
Laura Grigori, Simplice Donfack, Alok Kumar Gupta
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Tile QR Factorization with Parallel Panel Processing for Multicore Architectures
Bilel Hadri, Hatem Ltaief, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators
Toshio Endo, Akira Nukada, Satoshi Matsuoka and Naoya Maruyama
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Adapting Cache Partitioning Algorithms to Pseudo-LRU Replacement Policies
Kamil Kedzierski, Miquel Moreto, Francisco J. Cazorla and Mateo Valero
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Exploiting Set-Level Non-Uniformity of Capacity Demand to Enhance CMP Cooperative Caching
Dongyuan Zhan, Hong Jiang, Sharad Seth
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Masking I/O Latency using Application Level I/O Caching and Prefetching on Blue Gene System
Seetharami Seelam, I-Hsin Chung, John Bauer, Hui-Fang Wen
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Intra-Application Cache Partitioning [Related Paper]
Sai Prashanth Muralidhara, Mahmut Taylan Kandemir, Padma Raghavan
International Parallel & Distributed Processing Symposium (IPDPS) 2010

SLAW: a Scalable Locality-aware Adaptive Work-stealing Scheduler
Yi Guo, Jlsheng Zhao, Vincent Cave, Vivek Sarkar
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Executing Task Graphs Using Work-Stealing
Kunal Agrawal, Charles Leiserson, Jim Sukha
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Structuring Execution of OpenMP Applications for Multicore Architectures
François Broquedis, Olivier Aumage, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Oversubscription on Multicore Processors
Costin Iancu, Steven Hofmeyr, Yili Zheng, Filip Blagojevic
International Parallel & Distributed Processing Symposium (IPDPS) 2010

An Auto-Tuning Framework for Parallel Multicore Stencil Computations
Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, Samuel Williams
International Parallel & Distributed Processing Symposium (IPDPS) 2010

DynTile: Parametric Tiled Loop Generation for Parallel Execution on Multicore Processors
Albert Hartono, Muthu Manikandan Baskaran, J. Ram Ramanujan, Ponnuswamy Sadayappan
International Parallel & Distributed Processing Symposium (IPDPS) 2010

A Low Cost Split-Issue Technique to Improve Performance of SMT Clustered VLIW Processors [Author website]
Manoj Gupta, Fermín Sánchez, Josep Llosa
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Exploiting Inter-thread Temporal Locality for Chip Multithreading
Jiayuan Meng, Jeremy Sheaffer, Kevin Skadron
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Profitability-Based Power Allocation for Speculative Multithreaded Systems
Polychronis Xekalakis, Nikolas Ioannou, Salman Khan, Marcelo Cintra
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Optimization of Linked List Prefix Computations on Multithreaded GPUs Using CUDA
Zheng Wei, Joseph Jaja
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Toward Understanding Heterogeneity in Computing
Arnold Rosenberg, Ron Chi-Lung Chiang
International Parallel & Distributed Processing Symposium (IPDPS) 2010

BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map/Reduce Applications
Bogdan Nicolae, Diana Moise, Gabriel Antoniu, Luc Bougé, Matthieu Dorier
International Parallel & Distributed Processing Symposium (IPDPS) 2010

PreDatA - Preparatory Data Analytics on Peta-Scale Machines
Fang Zheng, Hasan Abbasi, Ciprian Docan, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Hierarchical Phasers for Scalable Synchronization and Reductions in Dynamic Parallelism
Jun Shirako, Vivek Sarkar
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Inter-Block GPU Communication via Fast Barrier Synchronization
Shucai Xiao, Wu-chun Feng
International Parallel & Distributed Processing Symposium (IPDPS) 2010

Performance Analysis of the FFT Algorithms for a Many-core Architecture
Long Chen, Guang R. Gao
High Performance Computing Symposium (HPC 2010)
HPCA 2010
Operating System Support for Overlapping-ISA Heterogeneous Multi-Core Architectures
OTong Li, Paul Brett, Rob Knauerhase, David Koufaty, et al.
High-Performance Computer Architecture, (HPCA-16) 2010

ATLAS: A Scalable and High Performance Scheduling Algorithm for Multiple Memory Controllers
Yoongu Kim, Dongsu Han, Onur Mutlu, Mor Harchol-Balter
High-Performance Computer Architecture, (HPCA-16) 2010

Understanding How Off-Chip Memory Bandwidth Partitioning in Chip Multiprocessors Affects System Performance
Fang Liu, Xiaowei Jiang, and Yan Solihin
High-Performance Computer Architecture, (HPCA-16) 2010

CHOP:Adaptive Filter-Based DRAM Caching for CMP Server Platforms
Xiaowei Jiang, Niti Madan, Li Zhao, Mike Upton, Ravishankar Iyer, Srihari Makineni, Donald Newell, Yan Solihin and Rajeev Balasubramonian
High-Performance Computer Architecture, (HPCA-16) 2010

LeadOut: Composing Low-Overhead Frequency-Enhancing Techniques for Single Thread Performance in Configurable Multicores
Brian Greskamp, R. Ulya Karpuzcu, Josep Torrellas
High-Performance Computer Architecture, (HPCA-16) 2010

LiteTM: Reducing Transactional State Overhead
Syed Ali Raza Jafri, Mithuna Thottethodi, T. N. Vijaykumar
High-Performance Computer Architecture, (HPCA-16) 2010

A Bandwidth-Aware Memory Subsytem Resource Management Using Non-Invasive Resource Profilers for Large CMP Systems
Dimitris Kaseridis, Jeffrey Stuecheli, Lizy K. John
High-Performance Computer Architecture, (HPCA-16) 2010

StimulusCache: Boosting Performance of Chip Multiprocessors with Excess Cache
Hyunjin Lee, Sangyeun Cho, and Bruce R. Childers
High-Performance Computer Architecture, (HPCA-16) 2010

ESP-NUCA: A Low-Cost Adaptive Non-Uniform Cache Architecture
Javier Merino, Valentin Puente, Jose-Angel Gregorio
High-Performance Computer Architecture, (HPCA-16) 2010

Towards Scalable, Energy-Efficient Bus-Based On-Chip Networks
Aniruddha N. Udipi, Naveen Muralimanohar, Rajeev Balasubramonian
High-Performance Computer Architecture, (HPCA-16) 2010

DMA Cache: Using On-Chip Storage to Architecturally Separate I/O Data from CPU Data for Improving I/O Performance
Dan Tang, Yungang Bao, Weiwu Hu, Mingyu Chen
High-Performance Computer Architecture, (HPCA-16) 2010

Graphite: A Distributed Parallel Simulator for Large-Scale Multicores
Jason Miller, Harshad Kasture, George Kurian, et al.
High-Performance Computer Architecture, (HPCA-16) 2010

Application Performance Modeling in a Virtualized Environment
Sajib Kundu, Raju Rangaswami, Kaushik Dutta, Ming Zhao
High-Performance Computer Architecture, (HPCA-16) 2010

COMIC++: A Software SVM System for Heterogeneous Multicore Accelerator Clusters
Jaejin Lee, Jun Lee, Sangmin Seo, Jungwon Kim, Seungkyun Kim
High-Performance Computer Architecture, (HPCA-16) 2010

BOLT: An Energy-Efficient Latency-Tolerant Processor
Andrew Hilton, Amir Roth
High-Performance Computer Architecture, (HPCA-16) 2010

An Optimized 3D-Stacked Memory Architecture by Exploiting Excessive, High-Density TSV Bandwidth
Dong Hyuk Woo, Nak Hee Seong, Dean L. Lewis, and Hsien-Hsin S. Lee
High-Performance Computer Architecture, (HPCA-16) 2010

High order Finite Volume methods on Wavelet-adapted Grids with Local Time-Stepping on Multicore Architectures for the Simulation of Shock-Bubble Interactions
Hejazialhosseini B., Rossinelli D., Bergdorf M.,Koumoutsakos P.
Journal of Computational Physics, Volume 229, Issue 22, Pages 8364-8383, 2010

PPoPP 2010
Structure-driven Optimizations for Amorphous Data-parallel Programs
Mario Mendez-Lojo, Donald Nguyen, Dimitrios Prountzos, Xin Sui, M. Amber Hassaan, Milind Kulkarni, Martin Burtscher and Keshav Pingali
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Debugging Programs that use Atomic Blocks and Transactional Memory
Ferad Zyulkyarov, Tim Harris, Osman S. Unsal, Adrián Cristal, Mateo Valero
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Gambit: Effective Unit Testing of Concurrency Libraries
Katherine Coons, Sebastian Burckhardt and Madanlal Musuvathi
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Featherweight X10: a Core Calculus for Async-Finish Parallelism
Jonathan Lee and Jens Palsberg
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Compiler Aided Selective Lock Assignment for Improving the Performance of Software Transactional Memory
Sandya Mannarswamy, Dhruva Chakrabarti, Kaushik Rajan and Sujoy Saraswati
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Is Transactional Programming Really Easier?
Christopher Rossbach, Owen Hofmann and Emmett Witchel
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Debugging Programs that use Atomic Blocks and Transactional Memory
Ferad Zyulkyarov, Tim Harris, Osman Unsal, Adrian Cristal and Mateo Valero
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Scheduling Support for Transactional Memory Contention Management
Walther Maldonado, Patrick Marlier, Pascal Felber, Julia Lawall, Gilles Muller, Adi Suissa, Danny Hendler and Alexandra Fedorova
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

An Adaptive Performance Modeling Tool for GPU Architectures
Sara Baghsorkhi, Matthieu Delahaye, Sanjay Patel, William Gropp and Wen-mei Hwu
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Model-driven Autotuning of Sparse Matrix-Vector Multiply on GPUs
Jee Choi, Amik Singh and Richard Vuduc
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Fast Tridiagonal Solvers on GPU
Yao Zhang, Jonathan Cohen and John Owens
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

CUDAlign: Using GPU to Accelerate the Comparison of Megabase Genomic Sequences
Edans Flávius de O. Sandes and Alba Cristina M. A. Melo
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Load Balancing on Speed
Steven Hofmeyr, Costin Iancu and Filip Blagojevic
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Scalable Communication Protocols for Dynamic Sparse Data Exchange
Torsten Hoefler, Christian Siebert and Andrew Lumsdaine
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

The LOFAR Correlator: Implementation and Performance Analysis
John W. Romein, P. Chris Broekema, Jan David Mol and Rob V. van Nieuwpoort
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Lazy Binary-Splitting: A Run-Time Adaptive Work-Stealing Scheduler
Alexandros Tzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Thread to Strand Binding of Parallel Network Applications in Massive Multi-Threaded Systems [Author website]
Petar Radojkovic, Vladimir Cakarevic, Javier Verdu, Alex Pajuelo, Francisco J. Cazorla, et al.
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Does Cache Sharing on Modern CMP Matter to the Performance of Contemporary Multithreaded Programs?
Eddy Z. Zhang, Yunlian Jiang and Xipeng Shen
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Improving Parallelism and Locality with Asynchronous [Presentation Slides]
Lixia Liu and Zhiyuan Li
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Scaling LAPACK Panel Operations Using Parallel Cache Assignment
Anthony M. Castaldo and R. Clint Whaley
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Modeling Advanced Collective Communication Algorithms on Cell-based Systems
Qasim Ali, Samuel Midkiff and Vijay Pai
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Phantom: Predicting Performance of Parallel Applications on Large-Scale Parallel Machines Using a Single Node
Jidong Zhai, Wenguang Chen and Weimin Zheng
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

Input-Driven Dynamic Execution Behavior Prediction of Streaming Applications
Farhana Aleen, Monirul Sharif and Santosh Pande
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)

hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications
François Broquedis, Jérôme Clet-Ortega, Stéphanie Moreaud, et al.
My Love: 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2010)

Efficient Parallel Programming in Poly/ML and Isabelle/ML
David C. J. Matthews and Makarius Wenzel
ACM SIGPLAN Workshop on Declarative Aspects of Multicore Programming (DAMP 2010)

COMPASS: A Programmable Data Prefetcher Using Idle GPU Shaders
Dong Hyuk Woo and Hsien-Hsin S. Lee
Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2010

2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 Prior to 1995 Whitepapers

  • Share/Save/Bookmark