News Interview: OpenMP 4.0

Exploring the New OpenMP Specification

We talked with Michael Wong, OpenMP CEO, and Matthijs van Waveren, Marketing Coordinator, about the status of the upcoming OpenMP version 4.0 specification and some of its features and enhancements, as well as how to participate in its development. By Joe Casad

The OpenMP consortium is the collection of companies and organizations behind the OpenMP standard, which the group calls "the de-facto standard for parallel programming on shared memory systems." Many scientists and high-performance computing specialists are familiar with OpenMP as a standard for building efficient and easily scalable HPC applications, and a new generation of enterprise programmers is starting to become familiar with OpenMP through the recent popularity of parallel programming and big data techniques. Last year, we talked to OpenMP CEO Michael Wong and Marketing Coordinator Matthijs van Waveren on plans for OpenMP and the path ahead [1].

A year later, OpenMP has added many new consortium members and has rolled out a draft of the new OpenMP 4.0 specification, which, according to the project, promises "significant enhancements to parallelism." We checked in with Wong and van Waveren at the recent SC12 supercomputing conference in Salt Lake City, Utah, USA, for an update on the new OpenMP 4.0 specification.

Joe Casad, ADMIN Magazine: How has the year gone for OpenMP?

Michael Wong: We've had a fantastic year. Mind you, we've been working hard to bring forward a new release to OpenMP called 4.0, with significant new leaps forward for intensive parallelism, and just as important as that, we also released a new technical report, one on accelerators, that will help us compete in this exciting market, with all the different GPUs and digital signal processors. And this is something that the marketplace has been looking forward to for a number of years: a high-level language way of doing accelerator programming without having to shift down into lower level code or some proprietary language. Everyone has been looking for this, and we finally have been able to come together after three long years of hard work – because putting this together has not been easy or simple, nor can it be done quickly, because there are just so many different accelerators out there, starting with commodity to high-performance computing uses. And even embedded technologies, like drones flying over Iran. They have accelerators – not the typical kind of NVidia accelerators, in that they are small-package, low-power, one-pound, as opposed to much more weight. There are different worlds out there, so it's an exciting time.

Matthijs van Waveren: The strategy of OpenMP is to extend from the pure HPC market that it traditionally has been used in – the shared-memory systems. Now we want to extend into accelerators, DSPs, and embedded systems, so that there's development of new directives and also reference implementations. One of the strengths of OpenMP has always been that it's been building on existing implementations. So we're not talking about paperwork directives, but real, useful directives.

ADMIN: And what is the status of the new 4.0 release?

MW: It's not officially ratified, but it's now out for comment. The process is transparent and democratic, in that we allow people outside of the membership group to comment. This is what we call the first release candidate. There will probably be a second release candidate in February, when we'll add additional content. Most notable is the accelerator, right now it is just a technical report, but we're aiming to include it in the final draft of 4.0.

ADMIN: Could you summarize the other new developments in 4.0?

MW: Our group has always been trying to be the best at shared-memory parallelism, and right now the major features are things like the ability to do what they call CPU affinity support. Affinity allows you to be able to specify which of your work items goes on which particular CPU and whether it stays there. Many OpenMP users want to be able to control when they're doing precise types of high-performance computing. Improved error handling ability is another major feature, which allows you to cancel threads. This is something our C++ community has asked for. And, by the way, we work on three languages, not just one or two. We work on C, C++, and Fortran. That has always been our strength.

One of our community sectors has asked about something called user-defined reductions – the ability to do reductions on user types, not just on what the language gives you, but built-in types: so, user-defined types. Another very exciting addition, I think, is the ability to finally address probably one third of what's available on the CPU. Up till now, unless you used some proprietary language method or OpenCL, or maybe something built in that's supplied by your vendor, you couldn't get at the other eight SIMD lanes, or really seven – let's say it had eight. You could only work through one of those SIMD lanes at a time. Now imagine the ability to be able to access the other eight SIMD lanes. That is approximately 250 gigaFLOPS of power you're leaving on the table. If you went out and bought a CPU, a commodity one, like an i7, let's say, it probably gives you about 80GFLOPS. Well, this SIMD unit gives you another 250GFLOPS. But you're not able to program that SIMD unit in a high-level language without breaking down into some low-level code. So you're only getting, at best, maybe 80 plus change of gigaFLOPS access to the SIMD. And then, if you bought a GPU unit on top, maybe that's coming from NVidia, or AMD, or DSP, that adds another whopping 2,500GFLOPS that you're not able to access unless you drop down into some proprietary code. OpenMP 4.0 will let you address the entire machine, allowing you to stay in a single language.

ADMIN: How would someone in the community recommend or request a feature or propose a change to your committee? Take us through that process of how a feature gets from an idea into the specification itself.

MW: The main way is that, usually a customer or some company would request a feature. That company might either suggest the feature directly through OpenMP channels, because they're part of the OpenMP consortium, or they might approach OpenMP from the outside to work to create that feature. A company could also create their own implementation of a feature that we'd probably call a company extension. And then after they test drove it a bit, they might take it to OpenMP and say, Look, this works. Ideas pass through the ecosystem from customer, to company, to OpenMP. Sometimes an idea for a new feature might come from our forum; a comment might request a specific feature, at which time our members would answer and say, "Oh, that's not a bad idea – nobody has actually brought that up." Now, one other thing that is required, and this is where membership is important. A new feature needs some driving force, pushing it forward, and there has to be representation for the feature within the community, Being a member gives you the representation to drive something you care about forward into the specification.