We all love performance hardware; that’s why we’re here and that’s why you’re here. And that’s why we all went weak at the knees over the first Core i7 and why we’re blown away by the latest spin of the i7-920. But in real terms, it’s still a niche market.
Which is why AMD’s Phenom II CPUs have managed to bring the processor wing of the company back into competition with Intel, with its affordable range of performance quad-core CPUs, rather than just being a weak also-ran. But with affordable four-core chips out in the wild, surely the release of these two dual-core processors is a retrograde step?
Full Story
Tags: MulticoreInfo · Processors
We provided many resources of parallel programming tutorials. The following are the ones we have linked so far.
Basic parallel computing tutorial
MapReduce tutorial
Cell processor programming
OpenMP tutorial
PThreads Tutorials
Intel Threading Building Blocks
In Part 7, we visit Message Passing Interface (MPI), the de facto standard for writing parallel programs running on a distributed memory system, such as a compute cluster, and is widely implemented.
Our choice for basic introductory MPI tutorial is again from Blaise Barney’s collection of tutorials. An excerpt from this tutorial:
“The Message Passing Interface Standard (MPI) is a message passing library standard based on the consensus of the MPI Forum, which has over 40 participating organizations, including vendors, researchers, software library developers, and users. The goal of the Message Passing Interface is to establish a portable, efficient, and flexible standard for message passing that will be widely used for writing message passing programs. As such, MPI is the first standardized, vendor independent, message passing library. The advantages of developing message passing software using MPI closely match the design goals of portability, efficiency, and flexibility.”
MPI Tutorial
MCS Divison of Argonne National Lab provides a page with many links to MPI tutorials here. Some of these tutorials have example code available for downloading.
Some More Interesting Sources
Tutorial on MPI: The Message-Passing Interface by William Gropp
Getting started with LAM/MPI
NERSC MPI Tutorial
MPI Exercises
Basic MPI Tutorial
Adaptive Message Passing Interface (AMPI) Tutorial [ppt slides] [AMPI]
Useful Links
MPICH2 Website
Open MPI Website
Fault Tolerant MPI
LAM/MPI
Message Passing Interface Forum
MPI 3.0 Standardization Effort
MPI Specification Documents
The Evolution of MPI by Bill Gropp
Microsoft MPI
Books
* Gropp, William; Lusk, Ewing; Skjellum, Anthony. (1999) Using MPI, 2nd Edition: portable Parallel Programming with the Message Passing Interface. MIT Press In Scientific And Engineering Computation Series, Cambridge, MA, USA. 395 pp. ISBN 978-0-262-57132-6
* Gropp, William; R Thakur, E Lusk (1999) Using MPI-2: Advanced Features of the Message Passing Interface - MIT Press Cambridge, MA, USA ISBN 0-262-57133-1
* Pacheco, Peter S. (1997) Parallel Programming with MPI.[3] 500 pp. Morgan Kaufmann ISBN 1558603395
Tags: MulticoreInfo · Programming · Research · Tools
Tops Systems Corp of Japan, a venture involved in multicore technology, together with Toyota Motor Corp and Nihon Unisys Ltd, both of Japan, is developing a dedicated integrated circuit (IC) for ray tracing, an image rendering method used in 3D computer graphics (3D CG) processing. Ray tracing is a rendering method that traces light rays in reverse, from the point of view toward the pixels. A total of 73 heterogeneous cores designed specifically for ray tracing operations will be single-chipped, and nine of these chips interconnected (see Fig). With high-definition (HD) resolution at 1920 x 1080 pixels, the target processing speed is 800 tera floating point operations per second (TFLOPS).
Researchers have figured out how to resolve the basic issues involved in the overall system architecture and application parallelism analysis, and will begin detail design and implementation as an application-specific IC (ASIC) shortly. They plan to fabricate the chips using 45nm manufacturing technology, and expect the chips to operate at 750MHz, integrating 130 million gates into a 17mm square footprint.
Full Story
Tags: Embedded · Industry News · MulticoreInfo
NetLogic Microsystems Inc.’s recent move to acquire RMI Corp. will help the multicore processor specialist to devise new products more rapidly, according to executives. The move will also intensify the battle in the emerging embedded multicore sector.
During a press event here on July 1, RMI expanded the capabilities of its existing XLR and XLS multicore lines. It also tipped plans to develop future products for the control plane and data center.As reported, networking chip maker NetLogic (Mountain View, Calif.) last month acquired RMI, a supplier of multicore networking processors, for $175.4 million in stock and $8.0 million in cash. An additional $6.5 million would be paid if certain business objectives are reached.
Full Story
Tags: Industry News · MulticoreInfo
CAPS announced the launch of HMPP New Generation (HMPP 2.x). HMPP 2.x is a follow-on to HMPP First Generation, rolled out last fall. It offers a host of new features and enhancements that make HMPP more robust, easy-to-use, flexible and performance-boosting than ever.
HMPP 2.x allows users to pipeline computations in multi-GPU systems and makes better use of asynchronous hardware features to build even better performing GPU accelerated applications. HMPP 2.x fully supports AMD FireStream hardware with a CAL/IL code generator. The addition of an OpenCL code generator is another major milestone planned for the second half of this year that will give HMPP developers another powerful standard programming option.
Full Story
Tags: Press Release · Tools
Enea® (Nordic Exchange/Small Cap/ENEA), a global software and services company focused on solutions for communication-driven products, today announced that it has secured a new multicore deal that is significant in respect to both order value and technological importance. The value of the agreement is approximately ten (10) MSEK over the project lifecycle. The development work, which will be carried out throughout the year will result in a software layer that will permit both Enea OSE and Linux to operate on a new processor that will power next generation mobile base stations.
Full Story
Tags: Industry News
At the RMI “Performance 2009” Conference, RMI Corporation, a leading provider of high-performance processors for communication and media rich applications, announced today a design and performance breakthrough across its volume production multi-core multi-threaded XLR® and XLS® Processor families. This breakthrough allows the processor families to achieve processing speeds up to 1.5 GHz for enterprise and infrastructure applications. The multi-threaded octal-core XLR732 and quad-core XLS Processor® families have added an additional 25% of performance capability enabling customers in the baseband, security and L4-L7 networking sectors additional value and capabilities previously unavailable – a 50% increase over its closest multi-core competitors.
Full Story [pdf]
Tags: Applications · Industry News
Attach a couple of cobalt molecules to a ring of carbon and you have the dream memory material. There’s a challenge facing electronics engineers attempting to build magnetic memory that can store data for more than 10 years or so. The density at which this data is stored depends on the size of the magnetic grains used for this process. Engineers have known for some time that they just can’t continue to make these grains indefinitely smaller.
Full Story
Tags: Memory · MulticoreInfo
Here is a blog post by a new .NET language from Microsoft, called Axum (available here).
“other languages offer us the ability to parallize tasks, so it’s hard to see what Axum brings to the ball, so to speak. Axum removes one feature that normally causes problems for developers creating parallel applications if they aren’t careful. It removes the ability for components to share or mutate state from other threads. It provides an isolation model that promotes a disciplined access to shared state, and encourages its use from the start rather than being added as an afterthought.”
Full Story
Tags: MulticoreInfo · Programming
By James Truchard, National Instruments
Throughout history, lots of important and crucial innovation has happened during economic hard times and recession. Let’s take a look back at the greatest recession of all time–the Great Depression. Two major engineering inventions came out of those very bleak times, Scotch Tape and the fluorescent light bulb. Hewlett Packard was also founded during the Great Depression. It’s important in times like these not to panic and pull back too much on research and development of new technologies and products. The best way to survive a downturn, if you have the available resources in place, is to prepare for the inevitable rebound by investing in change. Here are four tips for engineering and technology companies on how to innovate with less during tough economic times.
1. Avoid the urge to pull back on research
2. Invest in growth areas
3. Realize you can’t do it all
4. Prototype cheaply
Full Story
Tags: MulticoreInfo · Research
In this three part series, Dr. Algosa Vrancic and Jeff Meisel presents findings that demonstrate how a novel approach with Intel hardware and software technology is allowing for real-time high-performance computing (HPC) in order to solve engineering problems with multi-core processors that were not possible only five years ago.
* Part 1 is a review of real-time concepts that are important for understanding this domain of engineering problems, and a comparison of traditional HPC with real-time HPC.
* Part 2 outlines software architecture approaches for utilizing multi-core processors, along with cache optimizations.
“In traditional embedded systems, CPU caches are viewed as a necessary evil. The evil side shows up as a nondeterministic execution time inversely proportional to the amount of code and/or data of a time-critical task located inside the cache when the task execution has been triggered. For demonstration purposes, we will profile cache performance to better understand some important characteristics.”
Full Story
Tags: HPC · MulticoreInfo · Performance · Programming
ARC International (St Albans, England) is to lay off 35 employees and close several of its locations in a major restructuring that sees the licensor of multimedia IP cores reduced to 115 direct staff, excluding its Adaptive Chips joint venture in India. The company also revealed it is readying a new processing range that will be launched in November.
ARC said its 600 processor range, which on average uses 25 percent less power than its nearest competitor for embedded subsystem applications, will be the basis of an instruction-set compatible new generation development aimed at a further halving of power-performance ratio. The new processor range will be designated the ARC 6000 series.
Full Story
Tags: Embedded · MulticoreInfo
The Defense Department wants to take supercomputing to the next level by funding the development of a new breed of supercomputers that will be smarter and faster and yet smaller and require much less power than today’s massive machines.
DOD officials believe such computers will be necessary to make sense of the avalanche of data that will gush forth from tomorrow’s network-tethered sensor systems. Current computer systems will not be able to handle the load.
Full Story
Tags: HPC · MulticoreInfo · Research
IBM today announced the public availability of Milepost GCC, the world’s first open source machine learning compiler. The compiler intelligently optimizes applications, translating directly into shorter software development times and bigger performance gains. Initial IBM experiments conducted on IBM System p servers achieved an average 18 percent performance improvement on embedded-application benchmarks.
In many organizations, software developers are fast becoming the nucleus of innovation, crucial to all business processes. They build the services and capabilities that will underlie future revenue and generate business opportunity. In fact, developers drive so much business value that the average enterprise devotes 30 to 50 percent of its entire technology infrastructure to the development and testing of software.
Full Story
Tags: Applications · MulticoreInfo · Performance
The latest AMD OpteronTM 1000 Series processor, codenamed “Suzuka”, was launched in the shadow of its 6-core bigger brother, the Six-Core AMD Opteron processor codenamed “Istanbul.”
The AMD Opteron 1000 Series processor is designed for applications that are driven by cost or power concerns more than scalability. In the past, this meant a single core in a single socket, but in today’s multi-core world, this means four high performance cores in a single socket.
Typically, these processors are used in web servers, small business servers, workstations and even cloud computing. The flexibility of four cores and a low cost infrastructure gives customers an edge when designing for a cost-effective or power efficient platform.
Full Story
Tags: Industry News · MulticoreInfo · Processors
Rambus, a technology licensing companies specialising in high-speed memory architectures, has demonstrated its XDR memory system running at data rates up to 7.2 Gbps. The demonstration comprised memory-manufacturer Elpida’s recently announced 1 Gbit XDR DRAM device and an XIO memory controller transmitting realistic data patterns.
Elpida claims the XIO memory controller is up to 3.5 times more power efficient than a GDDR5 controller, and says the total memory system can provide up to two times more bandwidth than GDDR5 at equivalent power. In addition, the XIO memory controller demonstrated bi-modal operation with support for both XDR DRAM as well as next-generation XDR2 DRAM.
Full Story
Tags: Memory · MulticoreInfo
Along with our grand release of Cilk++ v.1.1 we are including a new product to help you visualize application performance: Cilkview. Cilkview runs an application binary and generates performance data on sections you specify. It combines this data with performance estimates generated by the work/span calculator and produces a graph of the results.
Cilkview runs your binary on a single core with instrumentation that calculates the work and span of the sections of code you specify. In this case, Cilkview made the same prediction that we did. The steeper line indicates perfect linear speedup up until the theoretical parallelism of the algorithm. The theoretical parallelism does not appear on the graph because it is too large.
Full Story
Tags: MulticoreInfo · Performance · Tools
Penguin Computing today announced that the University of Delaware Global Computing Laboratory has deployed the university’s largest supercomputer, code-named “Geronimo”, based on a custom GPGPU design utilizing NVIDIA Tesla GPU computing technology coupled with Intel 5400 series processors.
The cluster, funded by the University in conjunction with the NVIDIA University Partnership Program, will be used to support the research goals of the Global Computing Laboratory headed by Assistant Professor Michela Taufer. The University of Delaware’s team including Dr. Taufer and key collaborators Dr. Sandeep Patel from the Chemistry Department and Dr. Dionisios G. Vlachos from the Chemical Engineering Department is targeting the enhanced performance of large-scale simulations of molecular systems based on Monte Carlo (MC) and Molecular Dynamics (MD) methods.
Full Story
Tags: MulticoreInfo
Following an active panel on DDR3 DRAM, last week’s Denali Memcon offered up a second panel topic: low-power memory design. That’s a wide enough topic to allow for a range of discussions, and the panelists–Mostafa Abdulla of Numonyx, Roger Isaac of Silicon Image, Areski Maklouf from ST-Ericsson, and Howard Sussman of Etron—ranged all over it.
In opening statements, Maklouf said that LPDDR is a major issue for the platform architect. “Architects must work with memory providers to find a good solution,” he said. Moving in a different direction, Isaac pointed out that no matter how cleverly you architected it, LPDDR was not going to make it for much longer.
Full Story
Tags: Memory · MulticoreInfo
Here is a chat session that occurred on June 18th at the EE Times Multicore Virtual Conference. There were 35 people in attendance. Richard Nass did some very light editing to make it more readable.
Full Story
Tags: MulticoreInfo