2009 2008 2007 2006 2005 2004 2003 2002 2001
2000 1999 1998 1997 1996 Prior to 1995 Whitepapers
Papers listed here are either freely available on the web or obtained legally. Please respect the various copyright stipulations placed on these documents. If any author would like us to add or to remove their paper from here, please contact us at info@multicoreinfo.com .
Multicore Papers 2009
Above the Clouds: A Berkeley View of Cloud Computing
Armbrust, Michael, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, et al.
Technical report UCB/EECS-2009-28, Electrical Engineering and Computer Sciences, University of California at Berkeley
42nd International Symposium on Microarchitecture (MICRO)
Characterizing and Mitigating the Impact of Process Variations on Phase Change based Memory Systems
Wangyuan Zhang and Tao Li
International Symposium on Microarchitecture (MICRO), December 2009
Enhancing Lifetime and Security of PCM-Based Main Memory with Start-Gap Wear Leveling
Moinuddin K. Qureshi, John Karidis, Vijayalakshmi Srinivasan, et al.
International Symposium on Microarchitecture (MICRO), December 2009
A Tagless Coherence Directory
Jason Zebchuk, Viji Srinivasan, Moinuddin K. Qureshi and Andreas Moshovos
International Symposium on Microarchitecture (MICRO), December 2009
Characterizing Flash Memory: Anomalies, Observations, and Applications
Laura M.Grupp, Adrian M. Caulfield, Joel Coburn, et al.
International Symposium on Microarchitecture (MICRO), December 2009
Complexity Effective Memory Access Scheduling for Many-Core Accelerator Architectures
George L. Yuan, Ali Bakhoda, Tor M. Aamodt
International Symposium on Microarchitecture (MICRO), December 2009
Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping
Chi-Keung Luk, Sunpyo Hong, Hyesoon Kim
International Symposium on Microarchitecture (MICRO), December 2009
DDT: Design and Evaluation of a Dynamic Program Analysis for Optimizing Data Structure Usage
Changhee Jung, Nathan Clark
International Symposium on Microarchitecture (MICRO), December 2009
Portable Compiler Optimization Across Embedded Programs and Microarchitectures using Machine Learning
Dubach, C.; Jones, T. M.; Bonilla, E. V.; Fursin, G.; and O’Boyle, M. F
International Symposium on Microarchitecture (MICRO), December 2009
Improving Cache Lifetime Reliability at Ultra-low Voltages
Zeshan Chishti, Alaa R. Alameldeen, Chris Wilkerson, Wei Wu, Shih-Lien Lu
International Symposium on Microarchitecture (MICRO), December 2009
ZerehCache: Armoring Cache Architectures in High Defect Density Technologies
A. Ansari, S. Gupta, S. Feng and S. Mahlke
International Symposium on Microarchitecture (MICRO), December 2009
mSWAT: Low-Cost Hardware Fault Detection and Diagnosis for Multicore Systems
Siva Kumar Sastry Hari, Man-Lap Li, Pradeep Ramachandran, Byn Choi, Sarita Adve
International Symposium on Microarchitecture (MICRO), December 2009
BulkCompiler: High-Performance Sequential Consistency through Cooperative Compiler and Hardware Support
Wonsun Ahn, Shanxiang Qi, Jae-Woo Lee, Marios Nicolaides, Xing Fang, Josep Torrellas, David Wong, and Samuel Midkiff
International Symposium on Microarchitecture (MICRO), December 2009
EazyHTM: Eager-Lazy Hardware Transactional Memory
Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adria Armejach, et al.
International Symposium on Microarchitecture (MICRO), December 2009
Reducing Peak Power with a Table-Driven Adaptive Processor Core
V.Kontorinis, A.Shayan, R.Kumar, D.Tullsen
International Symposium on Microarchitecture (MICRO), December 2009
Extending the Effectiveness of 3D-Stacked DRAM Caches with an Adaptive Multi-Queue Policy
Gabriel H. Loh
International Symposium on Microarchitecture (MICRO), December 2009
An Hybrid eDRAM/SRAM Macrocell to Implement First-level Data Caches
Alejandro Valero, Julio Sahuquillo, Salvador Petit, Vicente Lorente, Ramon Canal, Pedro Lopez, Jose Duato
International Symposium on Microarchitecture (MICRO), December 2009
Coordinated Control of Multiple Prefetchers in Multi-Core Systems
Eiman Ebrahimi, Onur Mutlu, Chang Joo Lee, and Yale N. Patt
International Symposium on Microarchitecture (MICRO), December 2009
Improving Memory Bank-Level Parallelism in the Presence of Prefetching
Chang Joo Lee, Veynu Narasiman, Onur Mutlu, and Yale N. Patt
International Symposium on Microarchitecture (MICRO), December 2009
Flip-N-Write: A Simple Deterministic Technique to Improve PRAM Write Performance, Energy and Endurance
Sangyeun Cho, Hyunjin Lee
International Symposium on Microarchitecture (MICRO), December 2009
Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtualized Execution for Mobile Multimedia Applications
Hyunchul Park, Yongjun Park, Scott Mahlke
International Symposium on Microarchitecture (MICRO), December 2009
Ordering Decoupled Metadata Accesses in Multiprocessors
Hari Kannan
International Symposium on Microarchitecture (MICRO), December 2009
Pseudo-LIFO: The Foundation of a New Family of Replacement Policies for Last-level Caches
Mainak Chaudhuri
International Symposium on Microarchitecture (MICRO), December 2009
The BubbleWrap Many-core: Popping Cores for Sequential Acceleration
Ulya R. Karpuzcu, Brian Greskamp, Josep Torrellas
International Symposium on Microarchitecture (MICRO), December 2009
McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures
Sheng Li, Jung Ho Ahn, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi
International Symposium on Microarchitecture (MICRO), December 2009
Characterizing the Resource-Sharing Levels in the UltraSPARC T2 Processor
Vladimir Cakarevic, Petar Radojkovic, Javier Verdu, et al.
International Symposium on Microarchitecture (MICRO), December 2009
Offline Symbolic Analysis for Multi-Processor Execution Replay
Dongyoon Lee, Mahmoud Said, Satish Narayanasamy, et al.
International Symposium on Microarchitecture (MICRO), December 2009
Finding Concurrency Bugs with Context-Aware Communication Graphs
Brandon Lucia, Luis Ceze
International Symposium on Microarchitecture (MICRO), December 2009
Light64: Lightweight Hardware Support for Race Detection during Systematic Testing of Parallel Programs
Adrian Nistor, Darko Marinov, Josep Torrellas
International Symposium on Microarchitecture (MICRO), December 2009
Parallel sparse polynomial multiplication using heaps
Michael B. Monagan, Roman Pearce
ISSAC 2009, 263-270
Symposium on Operating Systems Principles (SOSP) 2009
FAWN: A Fast Array of Wimpy Nodes
David G. Andersen, Jason Franklin, Michael Kaminsky,Amar Phanishayee, et al
Symposium on Operating Systems Principles (SOSP) 2009
RouteBricks: Exploiting Parallelism to Scale Software Routers
Mihai Dobrescu, Norbert Egi , Katerina Argyraki, Byung-Gon Chun, Kevin Fall, et al.
Symposium on Operating Systems Principles (SOSP) 2009
The Multikernel: A New OS Architecture for Scalable Multicore Systems
Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, et al.
Symposium on Operating Systems Principles (SOSP) 2009
Fast Byte-granularity Software Fault Isolation
Miguel Castro, Manuel Costa, J.P. Martin, et al.
Symposium on Operating Systems Principles (SOSP) 2009
Tolerating Hardware Device Failures in Software
Asim Kadav, Matthew J. Renzelmann, Michael M. Swift
Symposium on Operating Systems Principles (SOSP) 2009
Automatic Device Driver Synthesis with Termite
Leonid Ryzhyk, Peter Chubb, Ihor Kuz, Etienne Le Sueur, Gernot Heiser
Symposium on Operating Systems Principles (SOSP) 2009
Debugging in the (Very) Large: Ten Years of Implementation and Experience
Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, et al.
Symposium on Operating Systems Principles (SOSP) 2009
Do You Have to Reproduce the Bug at the First Replay Attempt? — PRES: Probabilistic Replay with Execution Sketching on Multiprocessors
Soyeon Park, Weiwei Xiong, Zuoning Yin, et al.
Symposium on Operating Systems Principles (SOSP) 2009
ODR: Output-Deterministic Replay for Multicore Debugging
Gautam Altekar, Ion Stoica
Symposium on Operating Systems Principles (SOSP) 2009
Helios: Heterogeneous Multiprocessing with Satellite Kernels
Edmund B. Nightingale, Orion Hodson, Ross McIlroy, et al.
Symposium on Operating Systems Principles (SOSP) 2009
Distributed Aggregation for Data-Parallel Computing: Interfaces and Implementations
Yuan Yu, Pradeep Kumar Gunda, Michael Isard
Symposium on Operating Systems Principles (SOSP) 2009
Quincy: Fair Scheduling for Distributed Computing Clusters
Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, Andrew Goldberg
Symposium on Operating Systems Principles (SOSP) 2009
Abstractions for Scalable Operating Systems on Manycore Architectures [Slides]
Kevin Klues et al.
WIP, Symposium on Operating Systems Principles (SOSP) 2009
Thread to Core Assignment in SMT On-Chip Multiprocessors
Carmelo Acosta, Francisco J. Cazorla, Alex Ramirez, and Mateo Valero
SBAC-PAD, 2009
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Dimitris Kaseridis, Jeffrey Stuecheli, and Lizy K. John.
The 38th International Conference on Parallel Processing, 2009
TSS:Applying Two Stage Sampling in Micro-architecture Simulations
Zhibin Yu, Hai Jin, Jian Chen, and Lizy K. John
International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems
Efficient Program Scheduling for Heterogeneous Multi-core Processors
Jian Chen and Lizy K. John
46th Design Automation Conference (DAC) July 2009
The Bulk Multicore for Improved Programmability [Presentation Slides]
Josep Torrellas, Luis Ceze, James Tuck, Calin Cascaval, Pablo Montesinos, Wonsun Ahn, and Milos Prvulovic
Communications of the ACM (CACM), December 2009
Lessons Learned During the Development of the CapoOne Deterministic Multiprocessor Replay System
Pablo Montesinos, Matthew Hicks, Wonsun Ahn, Samuel T. King, and Josep Torrellas
Workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA)
Collective optimization
Grigori Fursin and Olivier Temam
HiPEAC 2009
Predictive runtime code scheduling for heterogeneous architectures [Presentation Slides]
Victor Jimenez, Isaac Gelado, Lluis Vilanova, Marisa Gil, Grigori Fursin and Nacho Navarro
HiPEAC 2009
Systematic search within an optimisation space based on Unified Transformation Framework
Shun Long and Grigori Fursin
International Journal of Computational Science and Engineering (IJCSE)
On the (Dis)similarity of Transactional Memory Workloads
Clay Hughes, James Poe, Amer Qouneh, and Tao Li
International Symposium on Workload Characterization (IISWC), October 2009
TransPlant: A Parameterized Methodology For Generating Transactional Memory Workloads
James Poe, Clay Hughes, and Tao Li
International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)
Supercomputing 2009 (SC09)
Increasing Memory Miss Tolerance for SIMD Cores
David Tarjan, Jiayuan Meng, Kevin Skadron
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Terascale Data Organization for Discovering Multivariate Climatic Trends
Wesley Kendall, Markus Glatter, Jian Huang, Tom Peterka, Robert Latham, Robert Ross
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Triangular Matrix Inversion on Graphics Processing Units
Florian Ries, Tommaso De Marco, Matteo Zivieri, Roberto Guerrieri
The 2008 ACM/IEEE conference on Supercomputing (SC09)
A Configurable Algorithm for Parallel Image-Compositing Applications
Tom Peterka, David Goodell, Robert Ross, Han-Wei Shen, Rajeev Thakur
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Auto-Tuning 3-D FFT Library for CUDA GPUs
Akira Nukada, Satoshi Matsuoka
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Autotuning Multigrid with PetaBricks
Cy Chan, Jason Ansel, Yee Lok Wong, Saman Amarasinghe, Alan Edelman
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Minimizing Communication in Sparse Matrix Solvers
Mohiyuddin, M., M. Hoemmen, J. Demmel, and K. Yelick
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors
Nathan Bell and Michael Garland
The 2008 ACM/IEEE conference on Supercomputing (SC09)
A Case for Integrated Processor-Cache Partitioning in Chip Multiprocessors
Shekhar Srikantaiah, Reetuparna Das, Asit K. Mishra, Chita R. Das, and Mahmut Kandemir
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Automating the Generation of Composed Linear Algebra Kernels
Geoffrey Belter, E. R. Jessup, Ian Karlin, Jeremy G. Siek
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Sparse Matrix Factorization on Massively Parallel Computers
Anshul Gupta, Seid Koric, Thomas George
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Enabling Software Management for Multicore Caches with a Lightweight Hardware Support
Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang and P. Sadayappan
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Improving GridFTP Performance Using The Phoebus Session Layer
Kissel, E., Brown, A., Swany, M.
The 2008 ACM/IEEE conference on Supercomputing (SC09)
A massively parallel adaptive fast multipole method on heterogeneous architectures
Ilya Lashuk, Aparna Chandramowlishwaran, Harper Langston, Tuan-Anh Nguyen, et al.
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Efficient Band Approximation of Gram Matrices for Large Scale Kernel Methods on GPUs
Mohamed Hussein, Wael Abd-Almageed
The 2008 ACM/IEEE conference on Supercomputing (SC09)
On the Design of Scalable, Self-Configuring Virtual Networks
David Wolinsky, Yonggang Liu, Pierre Juste, Girish Venkatasubramanian, Renato Figueiredo
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Space-Efficient Time-Series Call-Path Profiling of Parallel Applications
Zoltán Szebenyi, Felix Wolf, Brian J.N. Wylie
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Memory-Efficient Optimization of Gyrokinetic Particle-to-Grid Interpolation for Multicore Processors [Slides]
K. Madduri, S. Williams, S. Ethier, L. Oliker, J. Shalf, E. Strohmaier, K. Yelick
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Evaluating the Impact of Inaccurate Information in Utility-Based Scheduling
Alvin AuYoung, Amin Vahdat, and Alex C. Snoeren
The 2008 ACM/IEEE conference on Supercomputing (SC09)
I/O Performance Challenges at Leadership Scale
S. Lang, P. Carns, R. Latham, R. Ross, K. Harms, and W. Allcock
The 2008 ACM/IEEE conference on Supercomputing (SC09)
PLFS: A Checkpoint Filesystem for Parallel Applications
John Bent, Garth Gibson, Gary Grider, Ben McClelland, et al.
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Multi-core Acceleration of Chemical Kinetics for Simulation and Prediction
J. Linford, J. Michalakes, A. Sandu, and M. Vachharajani
The 2008 ACM/IEEE conference on Supercomputing (SC09)
HyperX: Topology, Routing, and Packaging of Efficient Large-Scale Networks
Ahn, Jung Ho; Binkert, Nathan; Davis, Al; McLaren, Moray; Schreiber, Robert S.
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Future Scaling of Processor-Memory Interfaces
Jung Ho Ahn, Norman P. Jouppi, Christos Kozyrakis, Jacob Leverich, Robert S. Schreiber
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Towards a Framework for Abstracting Accelerators in Parallel Applications: Experience with Cell
David M. Kunzman and Laxmikant V. Kale
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Router Designs for Elastic Buffer On-Chip Networks
George Michelogiannakis, William J. Dally
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Allocator Implementations for Network-on-Chip Routers
Daniel U Becker and William J Dally
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Scalable Temporal Order Analysis for Large Scale Debugging
Dong H. Ahn, Bronis R. de Supinski, Ignacio Laguna, Gregory L. Lee, Ben Liblit, Barton P. Miller, and Martin Schulz
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Machine Learning-Based Prefetch Optimization for Data Center Applications
Shih wei Liao, Tzu-Han Hung, Donald Nguyen, Hucheng Zhou, Chinyen Chou, and Chiaheng Tu
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Optimal Real Number Codes for Fault Tolerant Matrix Operations
Zizhong Chen
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Dynamic Task Scheduling for Linear Algebra Algorithms on Distributed-Memory Multicore Systems
Song, F., YarKhan, A., Dongarra, J.
The 2008 ACM/IEEE conference on Supercomputing (SC09)
SmartStore: A New Metadata Organization Paradigm with Semantic-Awareness for Next-Generation File Systems
Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng, Lei Tian
The 2008 ACM/IEEE conference on Supercomputing (SC09)
PFunc: Modern Task Parallelism for Modern High Performance Computing
Kambadur, P., A. Gupta, A. Ghoting, H. Avron, and A. Lumsdaine
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Comparative Study of One-Sided Factorizations with Multiple Software Packages on Multi-Core Hardware
Agullo, E., Hadri, B., Ltaief, H., Dongarra, J.
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Age Based Scheduling for Asymmetric Multiprocessors
Nagesh B. Lakshminarayana, Jaekyu Lee, Hyesoon Kim
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Dynamic Storage Cache Allocation in Multi-Server Architectures
Ramya Prabhakar, Shekhar Srikantaiah, Christina Patrick and Mahmut Kandemir
The 2008 ACM/IEEE conference on Supercomputing (SC09)
A Design Methodology for Domain-Optimized Power-Efficient Supercomputing
Mohiyddin, M., M. Murphy, L. Oliker, J. Shalf, J. Wawrzynek, and S. W. Williams
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Scalable Work Stealing
James Dinan, Sriram Krishnamoorthy, D. Brian Larkins, Jarek Nieplocha, P. Sadayappan
The 2008 ACM/IEEE conference on Supercomputing (SC09)
Leveraging 3D PCRAM Technologies to Reduce Checkpoint Overhead for Future Exascale Systems
Xiangyu Dong, Naveen Muralimanohar, Norm Jouppi, Richard Kaufmann, and Yuan Xie
The 2008 ACM/IEEE conference on Supercomputing (SC09)
********* END OF SC 09 PAPERS ************
Binary analysis for measurement and attribution of program performance
Nathan Tallent, John Mellor-Crummey, and Michael Fagan
The ACM SIGPLAN Symposium on Program Language Design and Implementation (PLDI)
Transformation Recipes for Code Generation and Auto-Tuning
Mary Hall, Jacqueline Chame, Chun Chen and Jaewook Shin
The 22nd International Workshop on Languages and Compilers for Parallel Computing
McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures
Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, Norman P. Jouppi
In Proceedings of The 42nd Annual IEEE/ACM International Symposium on Microarchitecture
SHARP Control: Controlled Shared Cache Management in Chip Multiprocessors
Shekhar Srikantaiah, Mahmut Kandemir and Qian Wang
In Proceedings of The 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Non-Uniform Power Access in Large Caches with Low-Swing Wires
Aniruddha N. Udipi, Naveen Muralimanohar, and Rajeev Balasubramonian
16th International Conference on High-Performance Computing (HiPC-16)
Scaling to 150K cores: recent algorithm and performance engineering developments enabling XGC1 to run at scale
M. Adams, S. Ku, P. Worley, E. D’Azevedo, J. Cummings, and C-S. Chang
Journal of Physics: Conference Series, 180 (2009)
Overcoming Scalability Challenges for Tool Daemon Launching
Doug H. Ahn
2008 International Conference on Parallel Processing (ICPP-08)
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects
Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.
Journal of Physics: Conference Series, Vol. 180, 2009
Scheduling Linear Algebra Operations on Multicore Processors
Kurzak, J., Ltaief, H., Dongarra, J., Badia, R.
Concurrency Practice and Experience 2009
Analytical Modeling and Optimization for Affinity Based Thread Scheduling on Multicore Systems
F. Song, S. Moore, and J. Dongarra
IEEE Cluster 2009
Shore-MT: a scalable storage manager for the multicore era
Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A. and Falsafi, B.
12th International Conference on Extending Database Technology
Using Coherence Information and Decay Techniques to Optimize L2 Cache Leakage in CMPs
Matteo Monchiero, Ramon Canal, Antonio González
The 38th International Conference on Parallel Processing (ICPP-2009)
Fine-grain Parallelism using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function
Frederico Pratas, Pedro Trancoso, Alexandros Stamatakis, Leonel Sousa
The 38th International Conference on Parallel Processing (ICPP-2009)
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Dimitris Kaseridis, Jeffrey Stuecheli, Lizy John
The 38th International Conference on Parallel Processing (ICPP-2009)
Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems
Xiangyong Ouyang, Karthik Gopalakrishnan, Dhabaleswar K. Panda
The 38th International Conference on Parallel Processing (ICPP-2009)
A Parallel Skeleton Library for Multi-core Clusters
Yuki Karasawa and Hideya Iwasaki
The 38th International Conference on Parallel Processing (ICPP-2009)
Efficient Scheduling of Nested Parallel Loops on Multi-Core Systems
Arun Kejariwal, Alex Nicolau, Alex Veidenbaum, Utpal Banerjee, Constantine Polychronopoulos
The 38th International Conference on Parallel Processing (ICPP-2009)
Parallel Phase Model: A Programming Model for High-end Parallel Machines with Manycores
Ron Brightwell, Mike Heroux, Zhaofang Wen, Junfeng Wu
The 38th International Conference on Parallel Processing (ICPP-2009)
Static Worksharing Strategies for Heterogeneous Computers with Unrecoverable Failures
Anne Benoit, Yves Robert, Arnold Rosenberg and Frédéric Vivien
HeteroPar’2009
An Efficient Weighted Bi-Objective Scheduling Algorithm for Heterogeneous Systems
Idalmis M. Sardina, Cristina Boeres and Lucia M. A. Drummond
HeteroPar’2009
Accelerating S3D: A GPGPU Case Study [Slides]
Kyle Spafford, Jeremy Meredith, Jeffrey Vetter, Jacqueline Chen, Ray Grout and Ramanan Sankaran
HeteroPar’2009
Using Hybrid CPU-GPU Platforms to Accelerate the Computation of the Matrix Sign Function
Peter Benner, Pablo Ezzatti, Enrique S. Quintana-Orti and Alfredo Remón
HeteroPar’2009
Dynamic Cache Clustering for Chip Multiprocessors
Mohammad H. Hammoud, Sangyeun Cho, and Rami Melhem
Proceedings of the ACM Int’l Conference on Supercomputing (ICS)
An Analytical Model to Study Optimal Area Breakdown between Cores and Caches in a Chip Multiprocessor
Taecheol Oh, Hyunjin Lee, Kiyeon Lee, and Sangyeun Cho
Proceedings of the IEEE Computer Society Symposium on VLSI (ISVLSI)
SOS: A Software-Oriented Distributed Shared Cache Management Approach for Chip Multiprocessors
Lei Jin and Sangyeun Cho
Proceedings of the Int’l Conference on Parallel Architectures and Compilation Techniques (PACT)
Parallel Proof Checking in Isabelle/Isar
Makarius Wenzel
The ACM SIGSAM 2009 International Workshop on Programming Languages for Mechanized Mathematics Systems
Beyond Simple Transactions and Atomic Blocks
Victor Luchangco
Exploiting Concurrency Efficiently and Correctly — (EC)2 Workshop 2009
Generalizing the Correctness of Transactional Memory
Rachid Guerraoui, Thomas A. Henzinger, Michal Kapalka, and Vasu Singh
Exploiting Concurrency Efficiently and Correctly — (EC)2 Workshop 2009
Specifying Relaxed Memory Models for State Exploration Tools
Sela Mador-Haim, Rajeev Alur, and Milo M.K. Martin
Exploiting Concurrency Efficiently and Correctly — (EC)2 Workshop 2009
Relaxed memory models must be rigorous
Francesco Zappa Nardelli, Peter Sewell, Jaroslav Sevcik, Susmit Sarkar, et al.
Exploiting Concurrency Efficiently and Correctly — (EC)2 Workshop 2009
Concurrency Concerns in Rich Internet Applications
James Ide, Rastislav Bodik, and Doug Kimelman
Exploiting Concurrency Efficiently and Correctly — (EC)2 Workshop 2009
On some Potential Research Contributions to the Multi-Core Enterprise
Oded Maler
Exploiting Concurrency Efficiently and Correctly — (EC)2 Workshop 2009
Correct and Efficient Implementations of Synchronous Models on Asynchronous Execution Platforms
Stavros Tripakis, Albert Benveniste, Paul Caspi, Claudio Pinello
Exploiting Concurrency Efficiently and Correctly — (EC)2 Workshop 2009
For an Efficient Execution of Data Intensive SoCs
Abdoulaye Gamatie
Exploiting Concurrency Efficiently and Correctly — (EC)2 Workshop 2009
Parallelize the Runtime Checks — Not the Application
Martin Süsskraut, Stefan Weigert, Martin Nowack, et al.
Exploiting Concurrency Efficiently and Correctly — (EC)2 Workshop 2009
Design and Specification of Concurrent System Components
Prakash Chandrasekaran
Exploiting Concurrency Efficiently and Correctly — (EC)2 Workshop 2009
Building Dynamic Verifiers for Real Concurrency APIs and Novel GUIs to Visualize Concurrency Nuances
Ganesh Gopalakrishnan
Exploiting Concurrency Efficiently and Correctly — (EC)2 Workshop 2009
Architecting Phase Change Memory as a Scalable DRAM Alternative
Benjamin C. Lee, Engin Ipek, Onur Mutlu, Doug Burger
The 36th International Symposium on Computer Architecture (ISCA 2009)
A Durable and Energy Efficient Main Memory Using Phase Change Memory Technology
Ping Zhou, Bo Zhao, Jun Yang, Youtao Zhang
The 36th International Symposium on Computer Architecture (ISCA 2009)
Scalable High Performance Main Memory System Using Phase-Change Memory Technology
Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, Jude A. Rivers
The 36th International Symposium on Computer Architecture (ISCA 2009)
Hardware Support for WCET Analysis of Hard Real-Time Multicore Systems [ACM Portal login required]
Marco Paolieri, Eduardo Quiñones, Francisco J. Cazorla, et al.
The 36th International Symposium on Computer Architecture (ISCA 2009)
Stream Chaining: Exploiting Multiple Levels of Correlation in Data Prefetching
Pedro Díaz, Marcelo Cintra
The 36th International Symposium on Computer Architecture (ISCA 2009)
Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance
Michael D. Powell, Arijit Biswas, Shantanu Gupta, Shubhendu S. Mukherjee
The 36th International Symposium on Computer Architecture (ISCA 2009)
Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator
John H. Kelm, Daniel R. Johnson, Matthew R. Johnson, Neal C. Crago, et al.
The 36th International Symposium on Computer Architecture (ISCA 2009)
An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness
Sunpyo Hong, Hyesoon Kim
The 36th International Symposium on Computer Architecture (ISCA 2009)
Multi-Execution: Multicore Caching for Data-Similar Executions
Susmit Biswas, Diana Franklin, Alan Savage, et al.
The 36th International Symposium on Computer Architecture (ISCA 2009)
PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches
Yuejian Xie, Gabriel H. Loh
The 36th International Symposium on Computer Architecture (ISCA 2009)
InvisiFENCE: Performance-Transparent Memory Ordering in Conventional Multiprocessors
Colin Blundell, Milo M. K. Martin, Thomas F. Wenisch
The 36th International Symposium on Computer Architecture (ISCA 2009)
Decoupled Store Completion/Silent Deterministic Replay: Enabling Scalable Data Memory for CPR/CFP Processors
Andrew Hilton, Amir Roth
The 36th International Symposium on Computer Architecture (ISCA 2009)
Decoupled DIMM: Building High-Bandwidth Memory System Using Low-Speed DRAM Devices
Hongzhong Zheng, Jiang Lin, Zhao Zhang, Zhichun Zhu
The 36th International Symposium on Computer Architecture (ISCA 2009)
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors
Abhishek Bhattacharjee, Margaret Martonosi
The 36th International Symposium on Computer Architecture (ISCA 2009)
Thread Motion: Fine-Grained Power Management for Multi-Core Systems
Krishna K. Rangan, Gu-Yeon Wei, David Brooks
The 36th International Symposium on Computer Architecture (ISCA 2009)
Temperature-Constrained Power Control for Chip Multiprocessors with Online Model Estimation
Yefu Wang, Kai Ma, Xiaorui Wang
The 36th International Symposium on Computer Architecture (ISCA 2009)
A Case for an Interleaving Constrained Shared-Memory Multi-Processor
Jie Yu, Satish Narayanasamy
The 36th International Symposium on Computer Architecture (ISCA 2009)
SigRace: Signature-Based Data Race Detection
Abdullah Muzahid, Dario Suárez, Shanxiang Qi, Josep Torrellas
The 36th International Symposium on Computer Architecture (ISCA 2009)
ECMon: Exposing Cache Events for Monitoring
Vijay Nagarajan, Rajiv Gupta
The 36th International Symposium on Computer Architecture (ISCA 2009)
Firefly: Illuminating Future Network-on-Chip with Nanophotonics
Yan Pan, Prabhat Kumar, John Kim, Gokhan Memik, Yu Zhang, Alok Choudhary
The 36th International Symposium on Computer Architecture (ISCA 2009)
Achieving Predictable Performance through Better Memory Controller Placement in Many-Core CMPs
Dennis Abts, Natalie D. Enright Jerger, John Kim, Dan Gibson, Mikko H. Lipasti
The 36th International Symposium on Computer Architecture (ISCA 2009)
Dynamic Performance Tuning for Speculative Threads [ACM Login required]
Yangchun Luo, Venkatesan Packirisamy, Wei-Chung Hsu, Antonia Zhai, Nikhil Mungre, Ankit Tarkas
The 36th International Symposium on Computer Architecture (ISCA 2009)
Boosting Single-thread Performance in Multi-core Systems through Fine-Grain Multi-Threading [ACM Login required]
The 36th International Symposium on Computer Architecture (ISCA 2009)
Simultaneous Speculative Threading: A Novel Pipeline Architecture Implemented in Sun’s ROCK Processor [ACM Login required]
Shailender Chaudhry, Robert Cypher, Magnus Ekman, Martin Karlsson, et al.
The 36th International Symposium on Computer Architecture (ISCA 2009)
High Performance Matrix Multiplication on Many-cores
Nan Yuan, Yongbin Zhou, Guangming Tan, Junchao Zhang, Dongrui Fan
Euro-Par 2009
High-performance regular expression scanning on the Cell/B.E. processor
D. Scarpazza and G. Russell
23rd International Conference on Supercomputing (ICS 09)
Efficient High Performance Collective Communication for the Cell Blade
Q. Ali, S. Midkiff, and V. Pai
23rd International Conference on Supercomputing (ICS 09)
Zero-Content Augmented Caches
J. Dusser, T. Piquet, and A. Seznec
23rd International Conference on Supercomputing (ICS 09)
QuakeTM: Parallelizing a Complex Serial Application Using Transactional Memory
V. Gajinov, F. Zyulkyarov, A. Cristal, O. Unsal, E. Ayguade, T. Harris, and M. Valero
23rd International Conference on Supercomputing (ICS 09)
Refereeing Conflicts in Hardware Transactional Memory
A. Shriraman and S. Dwarkadas
23rd International Conference on Supercomputing (ICS 09)
Parametric Multi-Level Tiling of Imperfectly Nested Loops
A. Hartono, M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy, B. Norris, J. Ramanujam, and P. Sadayappan
23rd International Conference on Supercomputing (ICS 09)
Performance Modeling and Automatic Ghost Zone Optimization for Iterative Stencil Loops on the Tesla Architecture
J. Meng and K. Skadron
23rd International Conference on Supercomputing (ICS 09)
Understanding the Interconnection Network of a Massively Parallel Real-Time Neural Net Simulator
J. Navaridas, M. Lujan, J. Miguel-Alonso, L. Plana, and S. Furber
23rd International Conference on Supercomputing (ICS 09)
Creating Artificial Global History to Improve Branch Prediction Accuracy
L. Porter and D. Tullsen
23rd International Conference on Supercomputing (ICS 09)
Fast and Scalable List Ranking on the GPU
M. Rehman, K. Kothapalli, and P. Narayanan
23rd International Conference on Supercomputing (ICS 09)
Less Reused Filter: Improving L2 Cache Performance via Filtering Less Reused Lines
L. Xiang and T. Chen
23rd International Conference on Supercomputing (ICS 09)
Dynamic parallelization of single-threaded binary programs using speculative slicing
C. Wang, Y. Wu, E. Borin, S. Hu, W. Liu, D. Sager, T. Ngai, and J. Fang
23rd International Conference on Supercomputing (ICS 09)
Synchronization Optimizations for Efficient Execution on Multi-Cores
A. Nicolau, G. Li, A. Veidenbaum and A. Kejariwal
23rd International Conference on Supercomputing (ICS 09)
Chunking Parallel Loops in the Presence of Synchronization
J. Shirako, J. Zhao, V. Nandivada, and V. Sarkar
23rd International Conference on Supercomputing (ICS 09)
Clock Gate on Abort: Towards Energy-Efficient Hardware Transactional Memory
Sutirtha Sanyal, Sourav Roy, Mateo Valero, Osman Unsal, and Adrian Cristal
Fifth Workshop on High-Performance, Power-Aware Computing (Hp-PAC 2009)
ARMLang: A Language and Compiler for Programming Reconfigurable Mesh Many-Cores
Heiner Giefers and Marco Platzner
16th Reconfigurable Architectures Workshop (RAW 2009)
Double Throughput Multiply-Accumulate Unit for FlexCore Processor Enhancements
Tung Hoang
16th Reconfigurable Architectures Workshop (RAW 2009)
Energy Benefits of Reconfigurable Hardware for use in Underwater Sensor Nets
Bridget Benson and Ali Irturk and Ryan Kastner
16th Reconfigurable Architectures Workshop (RAW 2009)
A Distributed, Programming Model-Independent Automatic Analysis System for Parallel Applications
Hung-Hsun Su, Max Billingsley III, Alan D. George
14th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2009)
CuPP — A framework for easy CUDA integration [Project Website]
14th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2009)
Fast Development of Dense Linear Algebra Codes on Graphics Processors
M. Jes´us Zafont, Alberto Mart´ın, Francisco Igual, and Enrique S. Quintana-Ort´
14th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2009)
HW/SW methodologies for synchronization in FPGA multiprocessors
A. Tumeo, C. Pilato, G. Palermo, F. Ferrandi, D. Sciuto.
International Symposium on Field Programmable Gate Arrays (FPGA 2009)
Variability-Aware Robust Design Space Exploration of Chip Multiprocessor Architectures
G. Palermo, C. Silvano, V. Zaccaria.
14th Asia and South Pacific Design Automation Conference (ASP-DAC 2009)
Prototyping Pipelined Applications on a Heterogeneous FPGA Multiprocessor Virtual Platform
A. Tumeo, M. Branca, L. Camerini, M. Ceriani, M. Monchiero, G. Palermo, F. Ferrandi, D. Sciuto
14th Asia and South Pacific Design Automation Conference (ASP-DAC 2009)
Efficient Scheduling of Task Graph Collections on Heterogeneous Resources
Matthieu Gallet, Loris Marchal, Frédéric Vivien
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Sequence Alignment with GPU: Performance and Design Challenges [Slides]
Gregory Striemer, Ali Akoglu
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Evaluating the Use of GPUs for Life Science Applications [Tech Report version]
John Paul Walters, Vidyananth Balu, Suryaprakash Kompalli, Vipin Chaudhary
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Improving MPI-HMMER’s Scalability With Parallel I/O [Tech Report version]
John Paul Walters, Rohan Darole, Vipin Chaudhary
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Accelerating Leukocyte Tracking using CUDA: A Case Study in Leveraging Manycore Coprocessors
Michael Boyer, David Tarjan, Scott Acton, Kevin Skadron
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Core-aware Memory Access Scheduling Schemes
Zhibin Fang, Xian-He Sun, Yong Chen, Surendra Byna
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Using Hardware Transactional Memory for Data Race Detection
Shantanu Gupta, Florin Sultan, Srihari Cadambi, Franjo Ivancic, Martin Roetteler
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Singular Value Decomposition on GPU using CUDA
Sheetal Lahabar
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Automatic detection of parallel applications computation phases
Juan Gonzalez Garcia, Judit Gimenez, Jesus Labarta
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Handling OS Jitter in Multicore Multithreaded Systems
Pradipta De, Vijay Mann, Umang Mittal
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
A framework for efficient and scalable execution of domain-specific templates on GPUs
Narayanan Sundaram, Anand Raghunathan, Srimat Chakradhar
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
CellMR: A Framework for Supporting MapReduce on Asymmetric Cell-Based Clusters
M. Mustafa Rafique, Benjamin Rose, Ali Butt, Dimitrios Nikolopoulos
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
A Cross-Input Adaptive Framework for GPU Programs Optimizations
Yixun Liu, Eddy Zhang, Xipeng Shen
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Message Passing on Data-Parallel Architectures
Jeffery Stuart, John Owens
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Minimizing Total Busy Time in Parallel Scheduling with Application to Optical Networks
Michele Flammini, Tami Tamir, Gianpiero Monaco, Luca Moscardelli, Hadas Shachnai, Mordechai Shalom, Shmuel Zaks
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Energy Minimization for Periodic Real-Time Tasks on Heterogeneous Processing Units
Jian-Jia Chen, Andreas Schranzhofer, Lothar Thiele
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Multi-Users Scheduling in Parallel Systems
Erik Saule, Denis Trystram
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Compact Graph Representations and Parallel Connectivity Algorithms for Massive Dynamic Network Analysis
Kamesh Madduri, David Bader
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Transitive Closure on the Cell Broadband Engine: A study on Self-Scheduling in a Multicore Processor
Sudhir Vinjamuri, Viktor Prasanna
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Parallel Short Sequence Mapping for High Throughput Genome Sequencing [Slides]
Doruk Bozdag, Catalin Barbacioru, Umit Catalyurek
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
TupleQ: Fully-Asynchronous and Zero-Copy MPI over InfiniBand
Matthew Koop, Jaidev Sridhar, Dhabaleswar Panda
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
An Approach for Matching Communication Patterns in Parallel Applications
Yong-Meng Teo
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Small File Access in Parallel File Systems
Philip Carns, Sam Lang, Robert Ross, Murali Vilayannur, Julian Kunkel, Thomas Ludwig
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Making Resonance a Common Case: A High-Performance Implementation of Collective I/O on Parallel File System
Xuechen Zhang, Song Jiang, Kei Davis
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Best-Effort Parallel Execution for Recognition and Mining Applications
Jiayuan Meng, Anand Raghunathan, Srimat Chakradhar
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
vCUDA: GPU Accelerated High Performance Computing in Virtual Machines
Hao Chen
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Work-First and Help-First Scheduling Policies for Async-Finish Task Parallelism
Yi Guo, Rajkishore Barik, Raghavan Raman, Vivek Sarkar
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Dynamic Iterations for the Solution of Ordinary Differential Equations on Multicore Processors
Ashok Srinivasan, Yanan Yu
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Efficient Large-Scale Model Checking
Kees Verstoep, Henri Bal, Jiri Barnat, Lubos Brim
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Scalable Autotuning Framework for Compiler Optimization
Ananta Tiwari, Chun Chen, Jacqueline Chame, Mary Hall, Jeffrey Hollingsworth
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Taking the heat off transactions: dynamic selection of pessimistic concurrency control
Nehir Sonmez, Adrian Cristal, Tim Harris, Osman Unsal, Mateo Valero
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009)
Combining Local and Global History for High Performance Data Prefetching
M. Dimitrov, H. Zhou
The 1st JILP Data Prefetching Championship (DPC-1)
Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables
M. Grannaes, M. Jahre, L. Natvig
The 1st JILP Data Prefetching Championship (DPC-1)
Access Map Pattern Matching Prefetch: Optimization Friendly Method
Y. Ishii, M. Inaba, K. Hiraki
The 1st JILP Data Prefetching Championship (DPC-1)
Enhancement for Accurate Stream Prefetching
G. Liu, Z. Huang, J.-K. Peir, X. Shi, L. Peng
The 1st JILP Data Prefetching Championship (DPC-1)
Multi-level Adaptive Prefetching based on Performance Gradient Tracking
L. Ramos, J. Briz, P. Ibañez, V. Viñals
The 1st JILP Data Prefetching Championship (DPC-1)
Data Prefetching Mechanism by Exploiting Global and Local Access Patterns
A. Sharif, H. Lee
The 1st JILP Data Prefetching Championship (DPC-1)
A Hybrid Adaptive Feedback Based Prefetcher
S. Verma, D. Koppelman, L. Peng
The 1st JILP Data Prefetching Championship (DPC-1)
Spatial Memory Streaming with Rotated Patterns
M. Ferdman, S. Somogyi, B. Falsafi
The 1st JILP Data Prefetching Championship (DPC-1)
Dynamic resource-critical workflow scheduling in heterogeneous environments
Yili Gong, Marlon E. Pierce, and Geoffrey C. Fox
14th Workshop on Job Scheduling Strategies for Parallel Processing
Competitive two-level adaptive scheduling using resource augmentation
Hongyang Sun, Yangjie Cao, and Wen-Jing Hsu
14th Workshop on Job Scheduling Strategies for Parallel Processing
Job scheduling with lookahead group matchmaking for time/space sharing on multi-core parallel machines
Xijie Zeng and Angela Sodan
14th Workshop on Job Scheduling Strategies for Parallel Processing
Adaptive scheduling for QoS virtual machines under different resource availability—first experiences
Angela Sodan
14th Workshop on Job Scheduling Strategies for Parallel Processing
A Case for Machine Learning to Optimize Multicore Performance
Archana Ganapathi, Kaushik Datta, Armando Fox, and David Patterson
First USENIX Workshop on Hot Topics in Parallelism (HotPar ‘09)
Hardware Parallelism vs. Software Parallelism
John A. Chandy and Janardhan Singaraju
First USENIX Workshop on Hot Topics in Parallelism (HotPar ‘09)
Embracing Heterogeneity—Parallel Programming for Changing Hardware
Michael D. Linderman, James Balfour, Teresa H. Meng, and William J. Dally
First USENIX Workshop on Hot Topics in Parallelism (HotPar ‘09)
Parallel Programming Must Be Deterministic by Default
Robert L. Bocchino Jr., Vikram S. Adve, Sarita V. Adve, and Marc Snir
First USENIX Workshop on Hot Topics in Parallelism (HotPar ‘09)
Opportunistic Computing: A New Paradigm for Scalable Realism on Many-Cores
Romain Cledat, Tushar Kumar, Jaswanth Sreeram, and Santosh Pande
First USENIX Workshop on Hot Topics in Parallelism (HotPar ‘09)
A Case for System Support for Concurrency Exceptions
Luis Ceze, Joseph Devietti, Brandon Lucia, Shaz Qadeer,
First USENIX Workshop on Hot Topics in Parallelism (HotPar ‘09)
Parallelizing the Web Browser
Christopher Grant Jones, Rose Liu, Leo Meyerovich, Krste Asanović, and Rastislav Bodik
First USENIX Workshop on Hot Topics in Parallelism (HotPar ‘09)
Exploring the Limits of Disjoint Access Parallelism
Amitabha Roy, Steven Hand, Tim Harris,
First USENIX Workshop on Hot Topics in Parallelism (HotPar ‘09)
Parallel Search on Video Cards
Tim Kaldewey, Jeff Hagen, Andrea Di Blas, and Eric Sedlar
First USENIX Workshop on Hot Topics in Parallelism (HotPar ‘09)
Tessellation: Space-Time Partitioning in a Manycore Client OS
Rose Liu, Kevin Klues, Sarah Bird, Steven Hofmeyr, Krste Asanović and John Kubiatowicz
First USENIX Workshop on Hot Topics in Parallelism (HotPar ‘09)
Lithe: Enabling Efficient Composition of Parallel Libraries
Heidi Pan, Benjamin Hindman and Krste Asanović
First USENIX Workshop on Hot Topics in Parallelism (HotPar ‘09)
Energy-efficient Parallel Software for Mobile Hand-held Devices
Antti P. Miettinen, Vesa Hirvisalo
First USENIX Workshop on Hot Topics in Parallelism (HotPar ‘09)
Transactional Memory Should Be an Implementation Technique, Not a Programming Interface
Hans-J. Boehm
First USENIX Workshop on Hot Topics in Parallelism (HotPar ‘09)
New Abstractions for Data Parallel Programming
James C. Brodman, Basilio B. Fraguela, María J. Garzarán and David Padua
First USENIX Workshop on Hot Topics in Parallelism (HotPar ‘09)
Ease of Use with Concurrent Collections (CnC)
Kathleen Knobe
First USENIX Workshop on Hot Topics in Parallelism (HotPar ‘09)
Optimizing Collective Communication on Multicores
Rajesh Nishtala and Katherine A. Yelick
First USENIX Workshop on Hot Topics in Parallelism (HotPar ‘09)
High Performance Computation and Interactive Display of Molecular Orbitals on GPUs and Multi-core CPUs
John Stone, Jan Saam, David Hardy, Kirby Vandivort, Wen-mei Hwu and Klaus Schulten
Second Workshop on GPGPUs 2009
Celling SHIM: Compiling Deterministic Concurrency to a Heterogeneous Multicore
Nalini Vasudevan, Stephen A. Edwards
24th ACM Symposium on Applied Computing (SAC 2009)
Impact of NVRAM Write Cache for File System Metadata on I/O Performance in Embedded Systems
In Hwan Doh, Hyo J. Lee, Young Je Moon, Eunsam Kim, Jongmoo Choi, Donghee Lee, and Sam H. Noh
24th ACM Symposium on Applied Computing (SAC 2009)
Capo: A Software-Hardware Interface for Practical Deterministic Multiprocessor Replay
Pablo Montesinos, Matthew Hicks, Samuel King, Josep Torrellas
Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘09)
Kendo: Efficient Determistic Multithreading in Software
Marek Olszewski, Jason Ansel, Saman Amarasinghe
Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘09)
RapidMRC: Approximating L2 Miss Rate Curves on Commodity Systems for Online Optimizations
David Tam, Reza Azimi, Livio Soares, Michael Stumm
Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘09)
Per-Thread Cycle Accounting in SMT Processors
Stijn Eyerman, Lieven Eeckhout
Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘09)
Maximal Benefit from a Minimal HTM
Owen Hofmann, Christopher Rossbach, Emmett Witchel
Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘09)
Early Experience with a Commercial Hardware Transactional Memory Implementation
Dave Dice, Yossi Lev, Mark Moir, Dan Nussbaum, Sun Microsystems Labs, USA
Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘09)
PowerNap: Eliminating Server Idle Power
David Meisner, Brian Gold, Thomas Wenisch
Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘09)
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications
Adrian Caulfield, Laura Grupp, Steven Swanson
Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘09)
DFTL: A Flash Translation Layer Employing Demand-based Selective Caching of Page-level Address Mappings
Aayush Gupta, Youngjae Kim, Bhuvan Urgaonkar
Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘09)
Commutativity Analysis for Software Parallelization: Letting Program Transformations See the Big Picture
Farhana Aleen, Nathan Clark
Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘09)
Accelerating Critical Section Execution with Asymmetric Multi-core Architectures
Aater Suleman, Onur Mutlu, Moinuddin Qureshi, Yale Patt
Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘09)
Producing Wrong Data Without Doing Anything Obviously Wrong!
Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, Peter Sweeney,
Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘09)
Phantom-BTB: A Virtualized Branch Target Buffer Design
Ioana Burcea, Andreas Moshovos
Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘09)
Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems
Eiman Ebrahimi, Onur Mutlu, Yale Patt
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
Voltage Emergency Prediction: Using Signatures to Reduce Operating Margins
Vijay Janapa Reddi, Meeta Gupta, Glenn Holloway, Gu Yeon Wei, Michael D. Smith, David Brooks
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
Adaptive Spill-Receive for Robust High-Performance Caching in CMPs
Moinuddin K. Qureshi
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
Design and Implementation of Software-Managed Caches for Multicores with Local Memory
Sangmin Seo, Jaejin Lee, Zehra Sura
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
In-Network Snoop Ordering (INSO): Snoopy Coherence on Unordered Networks
Niket Agarwal, Li-Shiuan Peh, Niraj Jha
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
Practical Off-chip Meta-data for Temporal Memory Streaming
Thomas Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, Andreas Moshovos,
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
Express Cube Topologies for On-Chip Interconnects
Boris Grot, Joel Hestness, Onur Mutlu, Stephen W. Keckler
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
Design and Evaluation of a Hierarchical On-Chip Interconnect for Next-Generation CMPs
Reetuparna Das, Soumya Eachempati, Asit K. Mishra, Vijaykrishnan Narayanan, Chita Das
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
Architectural Contesting
Hashem Hashemi Najaf-abadi, Eric Rotenberg
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
Lightweight Predication Support for Out of Order Processors
Mark Stephenson, Lixin Zhang, Ram Rangan
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
BlueShift: Designing Processors for Timing Speculation from the Ground Up
Brian Greskamp, Lu Wan, Ulya R. Karpuzcu, et. al.
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
PageNUCA: Selected Policies for Page-grain Locality Management in Large Shared Chip-multiprocessor Caches
Mainak Chaudhuri
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
A Novel Architecture of the 3D Stacked MRAM L2 Cache for CMPs
Guangyu Sun, Xiangyu Dong, Yuan Xie, Jian Li, Yiran Chen
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
Dynamic Hardware-Assisted Software-Controlled Page Placement to Manage Capacity Allocation and Sharing within Large Caches
Manu Awasthi, Kshitij Sudan, Rajeev Balasubramonian, John Carter
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
Optimizing Communication and Capacity in a 3D Stacked Reconfigurable Cache Hierarchy
Niti Madan, Li Zhao, Naveen Muralimanohar, Aniruddha Udipi, Rajeev Balasubramonian
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
Bridging the Computation Gap Between Programmable Processors and Hardwired Accelerators
Kevin Fan, Manjunath Kudlur, Ganesh Dasika, Scott Mahlke
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
A First-Order Fine-Grained Multithreaded Throughput Model
Xi E. Chen, Tor M. Aamodt
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
Characterization of Direct Cache Access on Multi-core Systems and 10GbE
Amit Kumar, Ram Huggahalli, Srihari Makineni
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
Criticality-Based Optimizations for Efficient Load Processing
Samantika Subramaniam, Anne C. Bracy, Hong Wang, Gabriel H. Loh
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
iCFP: Tolerating All Level Cache Misses in In-Order Processors
Andrew Hilton, Santosh Nagarakatte, Amir Roth
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
Feedback Mechanisms for Improving Probabilistic Memory Prefetching
Ibrahim Hur, Calvin Lin
15th International Symposium on High-Performance Computer Architecture (HPCA 2009)
How Much Parallelism is There in Irregular Applications?
Milind Kulkarni, Martin Burtscher, R. Inkulu, Keshav Pingali, Calin Cascaval
14th Principles and Practice of Parallel Programming (PPoPP 2009)
An Efficient Transactional Memory Algorithm for Computing Minimum Spanning Forest of Sparse Graphs
Seunghwa Kang, David Bader
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Atomic Quake: Using Transactional Memory in an Interactive Multiplayer Game Server
Ferad Zyulkyarov, Vladimir Gajinov, Osman Unsal, Adrian Cristal, et al.
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Application-Aware Management of Parallel Simulation Collections
Siu Man Yau, Kostadin Damevski, Vijay Karamcheti, Steven G. Parker, Denis Zorin
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Idempotent Work Stealing
Maged Michael, Martin Vechev, Vijay Saraswat
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Backtracking-based Load Balancing
Tasuku Hiraishi, Masahiro Yasugi, Seiji Umatani, Taiichi Yuasa
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Efficient and Scalable Multiprocessor Fair Scheduling Using Distributed Weighted Round-Robin
Tong Li, Dan Baumberger, Scott Hahn
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Mapping Parallelism to Multi-cores: A Machine Learning Based Approach
Zheng Wang, Michael F.P. O’Boyle
14th Principles and Practice of Parallel Programming (PPoPP 2009)
OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization
Seyong Lee, Seung-Jai Min, Rudolf Eigenmann
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Comparability Graph Coloring for Optimizing Utilization of Stream Register Files in Stream Processors
Xuejun Yang, Li Wang, Jingling Xue, Yu Deng, Ying Zhang
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Solving dense linear systems on platforms with multiple hardware accelerators
Gregorio Quintana-Orti, Francisco D. Igual, Enrique S. Quintana- Orti, Robert A. van de Geijn
14th Principles and Practice of Parallel Programming (PPoPP 2009)
A Comparison of Programming Models for Multiprocessors with Explicitly Managed Memory Hierarchies
Scott Schneider, Jae-Seung Yeom, Benjamin Rose, John C. Linford, Adrian Sandu, Dimitrios S. Nikolopoulos
14th Principles and Practice of Parallel Programming (PPoPP 2009)
A Comprehensive Strategy for Contention Management in Software Transactional Memory
Michael Spear, Luke Dalessandro, Virendra Marathe, Michael Scott
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Safe Open-Nested Transactions Through Ownership
Kunal Agrawal, Angelina Lee, Jim Sukha
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Committing Conflicting Transactions in an STM
Hany Ramadan, Indrajit Roy, Maurice Herlihy, Emmett Witchel
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Detecting and Tolerating Asymmetric Races
Paruj Ratanaworabhan, Martin Burtscher, Darko Kirovski, Benjamin Zorn, Karthik Pattabiraman, Rahul Nagpal
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Transactional Memory with Strong Atomicity Using Off-the-Shelf Memory Protection Hardware
Martin Abadi, Tim Harris, Mojtaba Mehrara
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Techniques for Efficient Placement of Synchronization Primitives
Alex Nicolau, Arun Kejariwal, Guangqiang Li
14th Principles and Practice of Parallel Programming (PPoPP 2009)
A Compiler-Directed Data Prefetching Scheme for Chip Multiprocessors
Seung Woo Son, Mahmut Kandemir, Mustafa Karakoy, Dhruva Chakrabarti
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Compiler-Assisted Dynamic Scheduling for Effective Parallelization of Loop Nests on Multicore Processors
Muthu Manikandan Baskaran, Nagavijayalakshmi Vydyanathan, Uday Bondhugula, J Ramanujam, Atanas Rountev. P Sadayappan
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Effective Performance Measurement and Analysis of Multithreaded Applications
Nathan Tallent, John Mellor-Crummey
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Petascale Computing with Accelerators
Michael Kistler, John Gunnels, Daniel Brokenshire, Brad Benton
14th Principles and Practice of Parallel Programming (PPoPP 2009)
MPIWiz: Subgroup Reproducible Replay of MPI Applications
Ruini Xue, Xuezheng Liu, Ming Wu, Zhenyu Guo, Wenguang Chen, et al.
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Formal Verification of Practical MPI Programs
Anh Vo, Sarvani Vakkalanka, Michael Delisi, Ganesh Gopalakrishnan, Mike Kirby, Rajeev Thakur
14th Principles and Practice of Parallel Programming (PPoPP 2009)
Efficient, Portable Implementation of Asynchronous Multi-place Programs
Ganesh Bikshandi, Jose Castanos, Sreedhar Kodali, Krishna Nandivada, et al.
14th Principles and Practice of Parallel Programming (PPoPP 2009)
DMP: Deterministic Shared Memory Multiprocessing
Joseph Devietti, Brandon Lucia, Luis Ceze, Mark Oskin
Fourteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘09)
The Semantics of x86-CC Multiprocessor Machine Code
Susmit Sarkar, Peter Sewell, Francesco Zappa Nardelli, et al.
Symposium on Principles of Programming Languages (POPL 09)
Relaxed memory models: an operational approach
Gerard Boudol, Gustavo Petri
Symposium on Principles of Programming Languages (POPL 09)
The Semantics of Progress in Lock-Based Transactional Memory
Rachid Guerraoui and Michał Kapałka
Symposium on Principles of Programming Languages (POPL 09)
A Model of Cooperative Threads
Martin Abadi, Gordon Plotkin
Symposium on Principles of Programming Languages (POPL 09)
Proving that non-blocking algorithms don’t block
Alexey Gotsman, Byron Cook, Matthew Parkinson, Viktor Vafeiadis
Symposium on Principles of Programming Languages (POPL 09)
The Manticore Project (Invited Talk)
John Reppy
DAMP 2009: Workshop on Declarative Aspects of Multicore Programming
Speculative N-Way Barriers
Lukasz Ziarek, Suresh Jagannathan, Matthew Fluet and Umut Acar
DAMP 2009: Workshop on Declarative Aspects of Multicore Programming
The Semantics of Power and ARM Multiprocessor Machine Code
Jade Alglave, Anthony C. J. Fox, Samin Ishtiaq, Magnus O. Myreen, Susmit Sarkar, Peter Sewell and Francesco Zappa Nardelli
DAMP 2009: Workshop on Declarative Aspects of Multicore Programming
Low-Pain, High-Gain Multicore Programming in Haskell: Coordinating Irregular Symbolic Computations on MultiCore Architectures
Abdallah Al Zain, Jost Berthold, Kevin Hammond, Phil Trinder, Greg Michaelson and Mustafa Aswad
DAMP 2009: Workshop on Declarative Aspects of Multicore Programming
Comparing the performance of concurrent linked-list implementations in Haskell
Martin Sulzmann, Edmund Lam and Simon Marlow
DAMP 2009: Workshop on Declarative Aspects of Multicore Programming
Declarative Aspects of Memory Management in the Concurrent Collections Parallel Programming Model
Zoran Budimlic, Aparna Chandramowlishwaran, Kath Knobe, Geoff Lowney, Vivek Sarkar and Leo Treggiari
DAMP 2009: Workshop on Declarative Aspects of Multicore Programming
Controlling chaos: on safe side-effects in data-parallel operations [ACM Portal Login required]
Stephan Herhut, Clemens Grelck and Sven-Bodo Scholz
DAMP 2009: Workshop on Declarative Aspects of Multicore Programming
2009 2008 2007 2006 2005 2004 2003 2002 2001
2000 1999 1998 1997 1996 Prior to 1995 Whitepapers

