2010 2009 2008 2007 2006 2005 2004 2003 2002 2001
2000 1999 1998 1997 1996 Prior to 1995 Whitepapers
Papers listed here are either freely available on the web or obtained legally. Please respect the various copyright stipulations placed on these documents. If any author would like us to add or to remove their paper from here, please contact us at info@multicoreinfo.com .
Multicore Papers 2010
Supercomputing 2010
Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems
Fengguang Song, Hatem Ltaief, Bilel Hadri, Jack Dongarra
On-Chip Network Evaluation Framework [Author web site]
Hanjoon Kim, Seulki Heo, Junghoon Lee, Jaehyuk Huh, John Kim
Parallel Fast Gauss Transform
Rahul S. Sampath, Hari Sundar, Shravan K. Veerapaneni
Circuit-Switched Memory Access in Photonic Interconnection Networks for High-Performance Embedded Computing [Author website]
G. Hendry, E. Robinson, V. Gleyzer, J. Chan, L. P. Carloni, N. Bliss, and K. Bergman
CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors
Asit K. Mishra, Shekhar Srikantaiah, Mahmut Kandemir, Chita R. Das
A Multi-Scale Heart Simulation on Massively Parallel Computers
Akira Hosoi, Takumi Washio, Jun-ichi Okada, Yoshimasa Kadooka, Kengo Nakajima, Toshiaki Hisada
Understanding the Impact of Emerging Non-Volatile Memories on High-Performance, IO-Intensive Computing
Adrian M. Caulfield, Joel Coburn, Todor I. Mollov, Arup De, Ameen Akel, Jiahua He, Arun Jagatheesan, Rajesh K. Gupta, Allan Snavely, Steven Swanson
3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs [Author web site]
Anthony Nguyen, Nadathur Satish, Jatin Chhugani, Changkyu Kim, Pradeep Dubey
DASH: a Recipe for a Flash-based Data Intensive Supercomputer
Jiahua He, Arun Jagatheesan, Sandeep Gupta, Jeffrey Bennett, Allan Snavely
An 80-Fold Speedup, 15.0 TFlops, Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code [Author web site]
Takashi Shimokawabe, Takayuki Aoki, Chiashi Muroi, Junichi Ishida, Kohei Kawano, Toshio Endo, Akira Nukada, Naoya Maruyama, Satoshi Matsuoka
Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support [Author web site]
Xiangyu Dong, Yuan Xie, Naveen Muralimanohar, Norman P. Jouppi
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
Adam Moody, Greg Bronevetsky, Kathryn Mohror, Bronis R. de Supinski
vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload
Ardalan Kangarlou, Sahan Gamage, Ramana Rao Kompella, Dongyan Xu
A Flexible Reservation Algorithm for Advance Network Provisioning
Mehmet Balman, Evangelos Chaniotakis, Arie Shoshani, Alex Sim
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation
Torsten Hoefler, Timo Schneider, Andrew Lumsdaine
Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed Memories
Jae-Seung Yeom, Dimitrios S. Nikolopoulos
Fast PGAS Implementation of Distributed Graph Algorithms
Guojing Cong, George Almasi, Vijay Saraswat
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
Louis-Noel Pouchet, Uday Bondhugula, Cedric Bastoul, Albert Cohen, J. Ramanujam, P. Sadayappan
Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory
Roger Pearce, Maya Gokhale, Nancy M. Amato
An Adaptive Framework for Simulation and Online Remote Visualization of Critical Climate Applications in Resource-Constrained Environments
Preeti Malakar, Vijay Natarajan, Sathish S. Vadhiyar
Scalable Graph Exploration on Multicore Processors
Fabrizio Petrini, Virat Agarwal, Davide Pasetto, David Bader
OpenMPC: Extended OpenMP Programming and Tuning for GPUs
Seyong Lee, Rudolf Eigenmann
Accelerating I/O Forwarding in IBM Blue Gene/P Systems
Venkatram Vishwanath, Mark Hereld, Kamil Iskra, Dries Kimpe, Vitali Morozov, Michael E. Papka, Robert Ross, Kazutomo Yoshii
The 48-Core SCC Processor: The Programmer’s View [Author web site]
Tim Mattson, Rob Van der Wijngaart, Michael Riepen, Thomas Lehnig, Paul Brett, et al.
Managing Variability in the I/O Performance of Petascale Storage Systems
Jay Lofstead, Fang Zheng, Qing Liu, Scott Klasky, Ron Oldfield, Todd Kordenbrock, Karsten Schwan, Matthew Wolf
A Block-Oriented Language and Runtime System for Tensor Algebra with Very Large Arrays
Beverly A. Sanders, Rod Bartlett, Erik Deumens, Victor Lotrich, Mark Ponton
IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination
Xuechen Zhang, Kei Davis, Song Jiang
A Scalable and Distributed Dynamic Formal Verifier for MPI Programs
Anh Vo, Sriram Aananthakrishnan, Ganesh Gopalakrishnan, Greg Bronevetsky, Bronis R. de Supinski, Martin Schulz
JAWS: Job-Aware Workload Scheduling for the Exploration of Turbulence Simulations [Author web site]
Xiaodan Wang, Eric Perlman, Randal Burns, Tanu Malik, Tamas Budavari, Charles Meneveau, Alexander Szalay
FlowChecker: Detecting Bugs in MPI Libraries via Message Flow Checking
Zhezhe Chen, Qi Gao, Wenbin Zhang, Feng Qin
Automatic Run-time Parallelization and Transformation of I/O
Thorvald Natvig, Anne C. Elster, Jan Christian Meyer
Scalable Identification of Load Imbalance in Parallel Executions using Call Path Profiles [Author web site]
Nathan R. Tallent, Laksono Adhianto, John M. Mellor-Crummey
Functional Partitioning to Optimize End-to-End Performance on Many-Core Architectures
Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim, Christian Engelmann, Galen Shipman
Diagnosis, Tuning and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method
Aparna Chandramowlishwaran, Kamesh Madduri, Richard Vuduc
Hierarchical Diagonal Blocking and Precision Reduction Applied to Combinatorial Multigrid
Guy Blelloch, Ioannis Koutis, Gary L. Miller, Kanat Tangwongsan
Scaling Hierarchical N-Body Simulations on GPU Clusters
Pritish Jetley, Lukasz Wesolowski, Filippo Gioachin, Laxmikant V. Kale, Thomas R. Quinn
Elastic Cloud Caches for Accelerating Service-Oriented Computations
David Chiu, Gagan Agrawal, Apeksha Shetty
Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance
Abdullah Gharaibeh, Matei Ripeanu
Data Sharing Options for Scientific Workflows on Amazon EC2
Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta, Benjamin P. Berman, Bruce Berriman, Phil Maechling
Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations [Author web site]
Scott Hampton, Sadaf Alam, Paul Crozier, Pratul Agarwal
Power-Aware Consolidation of Scientific Workflows in Virtualized Environments
Qian Zhu, Jiedan Zhu, Gagan Agrawal
Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics
Ronald Babich, Michael A. Clark, et al.
The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches
David Tarjan, Kevin Skadron
A Parallel Implementation of Electron-Phonon Scattering in Nanoelectronic Devices up to 95K Cores
Mathieu Luisier
Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses [Author web site]
Andreas Sandberg, David Eklöv, Erik Hagersten
Direct Numerical Simulation of Particulate Flows on 294912 Processor Cores
Jan Götz, Klaus Iglberger, Markus Stürmer, Ulrich Rüde
PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications
Martin Burtscher, Byoung-Do Kim, Jeff Diamond, John McCalpin, Lars Koesterke, James Browne
End Supercomputing 2010 Papers
Parallel Sparse Polynomial Division Using Heaps
Michael Monagan, Roman Pearce
PASCO 2010 (Parallel Symbolic Computation)
Above the Clouds: A View of Cloud Computing
M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, M. Zaharia
Communications of the ACM, V 53, N 4, (April 2010), pp. 50-58
Statistics Driven Workload Modeling for the Cloud
Archana Ganapathi, Yanpei Chen, Armando Fox, Randy Katz, David Patterson
Workshop on Self-Managing Database Systems (SMDB), March 2010.
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
J. Stone, D. Gohara, G. Shi
Computing in Science and Engineering (CiSE), 2010
IPDPS 2010
A Lock-Free, Cache-Efficient Multi-Core Synchronization Mechanism for Line-Rate Network Traffic Monitoring [Presentation Slides (ppt)]
Patrick P. C. Lee, Tian Bu, and Girish Chandranmenon
International Parallel & Distributed Processing Symposium (IPDPS) 2010
On the Importance of Bandwidth Control Mechanisms for Scheduling on Large Scale Heterogeneous Platforms
Olivier Beaumont, Hejer Rejeb
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Improving Numerical Reproducibility and Stability in Large-Scale Numerical Simulations on GPUs
Michela Taufer, Philip Saponaro, Omar Padron, Sandeep Patel
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Implementing the Himeno Benchmark with CUDA on GPU Clusters
Everett Phillips, Massimiliano Fatica
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Direct Self-Consistent Field Computations on GPU Clusters [Presentation Slides]
Guochun Shi, Volodymyr Kindratenko, Ivan Ufimtsev, Todd Martinez
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Parallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs
Lifan Xu, Michela Taufer, Stuart Collins, Dionisios Vlacho
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Fine-Grained QoS Scheduling for PCM-based Main Memory Systems [Author website]
Ping Zhou, Yu Du, Youtao Zhang, and Jun Yang
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Performance Impact of Resource Contention in Multicore Systems [Presentation]
Robert Hood, Haoqiang Jin, Piyush Mehrotra, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Improving the Performance of Hypervisor-Based Fault Tolerance
Jun Zhu, Wei Dong, ZheFu Jiang, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Supporting Fault Tolerance in a Data-Intensive Computing Middleware [Presentation]
Tekin Bicer, Wei Jiang, Gagan Agrawal
International Parallel & Distributed Processing Symposium (IPDPS) 2010
A High-Performance Fault-Tolerant Software Framework for Memory on Commodity GPUs
Naoya Maruyama, Akira Nukada, Satoshi Matsuoka
International Parallel & Distributed Processing Symposium (IPDPS) 2010
High Performance Comparison-Based Sorting Algorithm on Many-Core GPUs
Xiaochun Ye, Dongrui Fan, Wei Lin, Nan Yuan, and Paolo Ienne
International Parallel & Distributed Processing Symposium (IPDPS) 2010
GPU Sample Sort [Presentation]
Vitaly Osipov, Peter Sanders, Nikolaj Leischner
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Highly Scalable Parallel Sorting
Edgar Solomonik, Laxmikant Kale
International Parallel & Distributed Processing Symposium (IPDPS) 2010
A Scheduling Framework for Large-Scale, Parallel, and Topology-Aware Applications [Author website]
Pavel Bar, David Carmeli, Valentin Kravtsov, Martin Swain, Assaf Schuster
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Load Regulating Algorithm for Static-Priority Task Scheduling on Multiprocessors [Presentation]
Risat Pathan, Jan Jonsson
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Scheduling algorithms for linear workflow optimization
Kunal Agrawal, Anne Benoit, Loic Magnan, and Yves Robert
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Hypergraph-Based Task-Bundle Scheduling Towards Efficiency and Fairness in Heterogeneous Distributed Systems
Han Zhao, Xinxin Liu, Xiaolin (Andy) Li
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Optimizing and Tuning the Fast Multipole Method for State-of-the-Art Multicore Architectures [Presentation]
A. Chandramowlishwaran, S. Williams, L. Oliker, I. Lashuk, G. Biros, and R. Vuduc
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Parallel I/O Performance: From Events to Ensembles [Presentation]
Andrew Uselton, Mark Hawison, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Achieve Constant Performance Guarantees using Asynchronous Crossbar Scheduling without Speedup
D. Pan, K. Makki, and N. Pissinou
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Object-Oriented Stream Programming using Aspects [Author website]
Mingliang Wang, Manish Parashar
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Optimal Loop Unrolling for GPGPU Programs [Thesis]
Giridhar Sreenivasa Murthy, Muthu Ravishankar, Muthu Manikandan Baskaran, Ponnuswamy Sadayappan
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Speculative Execution on Multi-GPU Systems
Gregory Diamos, Sudakhar Yalamanchili
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Dynamic Load Balancing on Single- and Multi-GPU Systems [Author website]
Long Chen, Oreste Villa, Sriram Krishnamoorthy, Guang R. Gao
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Servet: A Benchmark Suite for Autotuning on Multicore Clusters
Jorge González-Domínguez, Guillermo L. Taboada, Basilio B. Fraguela, María J. Martín, Juan Touriño
International Parallel & Distributed Processing Symposium (IPDPS) 2010
KRASH: Reproducible CPU Load Generation on Many-Cores Machines
Swann Perarnau, Guillaume Huard
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Power-aware MPI Task Aggregation Prediction for High-End Computing Systems
Dong Li, Dimitrios Nikolopoulos, Kirk Cameron, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Varying Bandwidth Resource Allocation Problem with Bag Constraints
Venkatesan Chakaravarthy, Vinayaka Pandit, Yogish Sabharwal, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Decentralized Resource Management for Multi-core Desktop Grids
Jaehwan Lee, Pete Keleher, Alan Sussman
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Dynamic Fractional Resource Scheduling for HPC Workloads
Mark Lee Stillwell, Frédéric Vivien, Henri Casanova
International Parallel & Distributed Processing Symposium (IPDPS) 2010
ADEPT Scalability Predictor in Support of Adaptive Resource Allocation [Presentation]
Arash Deshmeh, Jacob Machina, and Angela Sodan.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Exploiting the Forgiving Nature of Applications for Scalable Parallel Execution
Jiayuan Meng and Srimat Chakradhar, Anand Raghunathan, and Surendra Byna
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Large-Scale Multi-Dimensional Document Clustering on GPU Clusters
Yongpeng Zhang, Frank Mueller, Xiaohui Cui, Thomas Potok
International Parallel & Distributed Processing Symposium (IPDPS) 2010
eScience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in Windows Azure Platform
Jie Li, Deb Agarwal, Marty Humphrey, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Extreme Scale Computing: Modeling the Impact of System Noise in Multicore Clustered Systems
Seetharami R Seelam, Liana Fong, Asser Tantawi, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Oblivious Algorithms for Multicores and Network of Processors
Rezaul Chowdhury, Francesco Silvestri, Brandon Blakeley, Vijaya Ramachandran
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Analyzing and Adjusting User Runtime Estimates to Improve Job Scheduling on the Blue Gene/P
Wei Tang, Narayan Desai, Daniel Buettner, Zhiling Lan
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Performance Evaluation of Concurrent Collections on High-Performance Multicore Computing Systems [Presentation]
Aparna Chandramowlishwaran, Kathleen Knobe, Richard W. Vuduc
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Hybrid MPI/OpenMP Power-aware Computing
Dong Li, Bronis R. de Supinski, Martin Schulz, Kirk Cameron, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Performance and Energy Optimization of Concurrent Pipelined Applications
Anne Benoit, Paul Renaud-Goud, Yves Robert
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Robust Control-theoretic Thermal Balancing for Server Clusters
Yong Fu, Chenyang Lu, Hongan Wang
International Parallel & Distributed Processing Symposium (IPDPS) 2010
A Simple Thermal Model for Multi-core Processors and Its Application to Slack Allocation
Zhe Wang, Sanjay Ranka
International Parallel & Distributed Processing Symposium (IPDPS) 2010
GenerOS: An Asymmetric Operating System Kernel for Multi-core Systems
Qingbo Yuan, Jianbo Zhao, Mingyu Chen, Ninghui Sun
International Parallel & Distributed Processing Symposium (IPDPS) 2010
MMT: Exploiting Fine-Grained Parallelism in Dynamic Memory Management
Devesh Tiwari, Sanghoon Lee, James Tuck, Yan Solihin
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Adapting Communication-Avoiding LU and QR Factorizations to Multicore Architectures [Author website]
Laura Grigori, Simplice Donfack, Alok Kumar Gupta
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Tile QR Factorization with Parallel Panel Processing for Multicore Architectures
Bilel Hadri, Hatem Ltaief, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators
Toshio Endo, Akira Nukada, Satoshi Matsuoka and Naoya Maruyama
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Adapting Cache Partitioning Algorithms to Pseudo-LRU Replacement Policies
Kamil Kedzierski, Miquel Moreto, Francisco J. Cazorla and Mateo Valero
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Exploiting Set-Level Non-Uniformity of Capacity Demand to Enhance CMP Cooperative Caching
Dongyuan Zhan, Hong Jiang, Sharad Seth
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Masking I/O Latency using Application Level I/O Caching and Prefetching on Blue Gene System
Seetharami Seelam, I-Hsin Chung, John Bauer, Hui-Fang Wen
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Intra-Application Cache Partitioning [Related Paper]
Sai Prashanth Muralidhara, Mahmut Taylan Kandemir, Padma Raghavan
International Parallel & Distributed Processing Symposium (IPDPS) 2010
SLAW: a Scalable Locality-aware Adaptive Work-stealing Scheduler
Yi Guo, Jlsheng Zhao, Vincent Cave, Vivek Sarkar
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Executing Task Graphs Using Work-Stealing
Kunal Agrawal, Charles Leiserson, Jim Sukha
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Structuring Execution of OpenMP Applications for Multicore Architectures
François Broquedis, Olivier Aumage, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Oversubscription on Multicore Processors
Costin Iancu, Steven Hofmeyr, Yili Zheng, Filip Blagojevic
International Parallel & Distributed Processing Symposium (IPDPS) 2010
An Auto-Tuning Framework for Parallel Multicore Stencil Computations
Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, Samuel Williams
International Parallel & Distributed Processing Symposium (IPDPS) 2010
DynTile: Parametric Tiled Loop Generation for Parallel Execution on Multicore Processors
Albert Hartono, Muthu Manikandan Baskaran, J. Ram Ramanujan, Ponnuswamy Sadayappan
International Parallel & Distributed Processing Symposium (IPDPS) 2010
A Low Cost Split-Issue Technique to Improve Performance of SMT Clustered VLIW Processors [Author website]
Manoj Gupta, Fermín Sánchez, Josep Llosa
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Exploiting Inter-thread Temporal Locality for Chip Multithreading
Jiayuan Meng, Jeremy Sheaffer, Kevin Skadron
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Profitability-Based Power Allocation for Speculative Multithreaded Systems
Polychronis Xekalakis, Nikolas Ioannou, Salman Khan, Marcelo Cintra
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Optimization of Linked List Prefix Computations on Multithreaded GPUs Using CUDA
Zheng Wei, Joseph Jaja
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Toward Understanding Heterogeneity in Computing
Arnold Rosenberg, Ron Chi-Lung Chiang
International Parallel & Distributed Processing Symposium (IPDPS) 2010
BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map/Reduce Applications
Bogdan Nicolae, Diana Moise, Gabriel Antoniu, Luc Bougé, Matthieu Dorier
International Parallel & Distributed Processing Symposium (IPDPS) 2010
PreDatA - Preparatory Data Analytics on Peta-Scale Machines
Fang Zheng, Hasan Abbasi, Ciprian Docan, et al.
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Hierarchical Phasers for Scalable Synchronization and Reductions in Dynamic Parallelism
Jun Shirako, Vivek Sarkar
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Inter-Block GPU Communication via Fast Barrier Synchronization
Shucai Xiao, Wu-chun Feng
International Parallel & Distributed Processing Symposium (IPDPS) 2010
Performance Analysis of the FFT Algorithms for a Many-core Architecture
Long Chen, Guang R. Gao
High Performance Computing Symposium (HPC 2010)
HPCA 2010
Operating System Support for Overlapping-ISA Heterogeneous Multi-Core Architectures
OTong Li, Paul Brett, Rob Knauerhase, David Koufaty, et al.
High-Performance Computer Architecture, (HPCA-16) 2010
ATLAS: A Scalable and High Performance Scheduling Algorithm for Multiple Memory Controllers
Yoongu Kim, Dongsu Han, Onur Mutlu, Mor Harchol-Balter
High-Performance Computer Architecture, (HPCA-16) 2010
Understanding How Off-Chip Memory Bandwidth Partitioning in Chip Multiprocessors Affects System Performance
Fang Liu, Xiaowei Jiang, and Yan Solihin
High-Performance Computer Architecture, (HPCA-16) 2010
CHOP:Adaptive Filter-Based DRAM Caching for CMP Server Platforms
Xiaowei Jiang, Niti Madan, Li Zhao, Mike Upton, Ravishankar Iyer, Srihari Makineni, Donald Newell, Yan Solihin and Rajeev Balasubramonian
High-Performance Computer Architecture, (HPCA-16) 2010
LeadOut: Composing Low-Overhead Frequency-Enhancing Techniques for Single Thread Performance in Configurable Multicores
Brian Greskamp, R. Ulya Karpuzcu, Josep Torrellas
High-Performance Computer Architecture, (HPCA-16) 2010
LiteTM: Reducing Transactional State Overhead
Syed Ali Raza Jafri, Mithuna Thottethodi, T. N. Vijaykumar
High-Performance Computer Architecture, (HPCA-16) 2010
A Bandwidth-Aware Memory Subsytem Resource Management Using Non-Invasive Resource Profilers for Large CMP Systems
Dimitris Kaseridis, Jeffrey Stuecheli, Lizy K. John
High-Performance Computer Architecture, (HPCA-16) 2010
StimulusCache: Boosting Performance of Chip Multiprocessors with Excess Cache
Hyunjin Lee, Sangyeun Cho, and Bruce R. Childers
High-Performance Computer Architecture, (HPCA-16) 2010
ESP-NUCA: A Low-Cost Adaptive Non-Uniform Cache Architecture
Javier Merino, Valentin Puente, Jose-Angel Gregorio
High-Performance Computer Architecture, (HPCA-16) 2010
Towards Scalable, Energy-Efficient Bus-Based On-Chip Networks
Aniruddha N. Udipi, Naveen Muralimanohar, Rajeev Balasubramonian
High-Performance Computer Architecture, (HPCA-16) 2010
DMA Cache: Using On-Chip Storage to Architecturally Separate I/O Data from CPU Data for Improving I/O Performance
Dan Tang, Yungang Bao, Weiwu Hu, Mingyu Chen
High-Performance Computer Architecture, (HPCA-16) 2010
Graphite: A Distributed Parallel Simulator for Large-Scale Multicores
Jason Miller, Harshad Kasture, George Kurian, et al.
High-Performance Computer Architecture, (HPCA-16) 2010
Application Performance Modeling in a Virtualized Environment
Sajib Kundu, Raju Rangaswami, Kaushik Dutta, Ming Zhao
High-Performance Computer Architecture, (HPCA-16) 2010
COMIC++: A Software SVM System for Heterogeneous Multicore Accelerator Clusters
Jaejin Lee, Jun Lee, Sangmin Seo, Jungwon Kim, Seungkyun Kim
High-Performance Computer Architecture, (HPCA-16) 2010
BOLT: An Energy-Efficient Latency-Tolerant Processor
Andrew Hilton, Amir Roth
High-Performance Computer Architecture, (HPCA-16) 2010
An Optimized 3D-Stacked Memory Architecture by Exploiting Excessive, High-Density TSV Bandwidth
Dong Hyuk Woo, Nak Hee Seong, Dean L. Lewis, and Hsien-Hsin S. Lee
High-Performance Computer Architecture, (HPCA-16) 2010
High order Finite Volume methods on Wavelet-adapted Grids with Local Time-Stepping on Multicore Architectures for the Simulation of Shock-Bubble Interactions
Hejazialhosseini B., Rossinelli D., Bergdorf M.,Koumoutsakos P.
Journal of Computational Physics, Volume 229, Issue 22, Pages 8364-8383, 2010
PPoPP 2010
Structure-driven Optimizations for Amorphous Data-parallel Programs
Mario Mendez-Lojo, Donald Nguyen, Dimitrios Prountzos, Xin Sui, M. Amber Hassaan, Milind Kulkarni, Martin Burtscher and Keshav Pingali
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Debugging Programs that use Atomic Blocks and Transactional Memory
Ferad Zyulkyarov, Tim Harris, Osman S. Unsal, Adrián Cristal, Mateo Valero
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Gambit: Effective Unit Testing of Concurrency Libraries
Katherine Coons, Sebastian Burckhardt and Madanlal Musuvathi
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Featherweight X10: a Core Calculus for Async-Finish Parallelism
Jonathan Lee and Jens Palsberg
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Compiler Aided Selective Lock Assignment for Improving the Performance of Software Transactional Memory
Sandya Mannarswamy, Dhruva Chakrabarti, Kaushik Rajan and Sujoy Saraswati
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Is Transactional Programming Really Easier?
Christopher Rossbach, Owen Hofmann and Emmett Witchel
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Debugging Programs that use Atomic Blocks and Transactional Memory
Ferad Zyulkyarov, Tim Harris, Osman Unsal, Adrian Cristal and Mateo Valero
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Scheduling Support for Transactional Memory Contention Management
Walther Maldonado, Patrick Marlier, Pascal Felber, Julia Lawall, Gilles Muller, Adi Suissa, Danny Hendler and Alexandra Fedorova
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
An Adaptive Performance Modeling Tool for GPU Architectures
Sara Baghsorkhi, Matthieu Delahaye, Sanjay Patel, William Gropp and Wen-mei Hwu
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Model-driven Autotuning of Sparse Matrix-Vector Multiply on GPUs
Jee Choi, Amik Singh and Richard Vuduc
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Fast Tridiagonal Solvers on GPU
Yao Zhang, Jonathan Cohen and John Owens
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
CUDAlign: Using GPU to Accelerate the Comparison of Megabase Genomic Sequences
Edans Flávius de O. Sandes and Alba Cristina M. A. Melo
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Load Balancing on Speed
Steven Hofmeyr, Costin Iancu and Filip Blagojevic
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Scalable Communication Protocols for Dynamic Sparse Data Exchange
Torsten Hoefler, Christian Siebert and Andrew Lumsdaine
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
The LOFAR Correlator: Implementation and Performance Analysis
John W. Romein, P. Chris Broekema, Jan David Mol and Rob V. van Nieuwpoort
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Lazy Binary-Splitting: A Run-Time Adaptive Work-Stealing Scheduler
Alexandros Tzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Thread to Strand Binding of Parallel Network Applications in Massive Multi-Threaded Systems [Author website]
Petar Radojkovic, Vladimir Cakarevic, Javier Verdu, Alex Pajuelo, Francisco J. Cazorla, et al.
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Does Cache Sharing on Modern CMP Matter to the Performance of Contemporary Multithreaded Programs?
Eddy Z. Zhang, Yunlian Jiang and Xipeng Shen
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Improving Parallelism and Locality with Asynchronous [Presentation Slides]
Lixia Liu and Zhiyuan Li
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Scaling LAPACK Panel Operations Using Parallel Cache Assignment
Anthony M. Castaldo and R. Clint Whaley
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Modeling Advanced Collective Communication Algorithms on Cell-based Systems
Qasim Ali, Samuel Midkiff and Vijay Pai
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Phantom: Predicting Performance of Parallel Applications on Large-Scale Parallel Machines Using a Single Node
Jidong Zhai, Wenguang Chen and Weimin Zheng
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
Input-Driven Dynamic Execution Behavior Prediction of Streaming Applications
Farhana Aleen, Monirul Sharif and Santosh Pande
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications
François Broquedis, Jérôme Clet-Ortega, Stéphanie Moreaud, et al.
My Love: 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2010)
Efficient Parallel Programming in Poly/ML and Isabelle/ML
David C. J. Matthews and Makarius Wenzel
ACM SIGPLAN Workshop on Declarative Aspects of Multicore Programming (DAMP 2010)
COMPASS: A Programmable Data Prefetcher Using Idle GPU Shaders
Dong Hyuk Woo and Hsien-Hsin S. Lee
Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2010
2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 Prior to 1995 Whitepapers

