Matlab-like tools for high-performance computing
Numbers Game
A common question from people who build both large and small HPC clusters is, "What applications can I run on my HPC system?"
One of the most popular applications is Matlab [1], which many people use in their everyday work and research – either Matlab or Matlab-like tools. For example, a fairly recent blog posting from Harvard University's Faculty of Arts and Sciences, Research Computing Group [2] showed that the second most popular Environment Module [3] was Matlab.
People are using Matlab for a variety of tasks that range from the humanities, to science, to engineering, to games, and more. Some researchers use it for parameter sweeps by launching 25,000 or more individual Matlab runs at the same time. Needless to say, Matlab is used very heavily at a number of places, so it is a very good candidate for running on an HPC system.
I don't want to take anything away from MathWorks, the creator of Matlab, because their product is a wonderful application, but for a number of reasons, Matlab might not be the answer for some people (e.g., they can't afford Matlab or can't afford 25,000 licenses, they just want to try a few Matlab features, or they want or need access to the source code). This brings up the category of tools typically called "Matlab-like"; that is, they try to emulate the concept of Matlab with compatible syntax so that moving back and forth is relatively easy. When people ask what tools or applications they can try on their shiny new cluster, I tend to recommend one of these Matlab-like tools, even though they aren't strictly parallel right out of the box (so to speak).
A few tools that are somewhat Matlab-like – some still surviving and some defunct – include RLaB, RLaB+, JMathlab, and O-Matrix (commercial). A whole host of other tools exists if you want to stray from Matlab compatibility even further, such as R or SciPy; however in this article, I will talk about the open source tools Scilab, GNU Octave, and FreeMat. These tools try to be as close as possible to Matlab syntax so that Matlab code will transfer over easily, with the possible exception of Simulink [4] and Matlab GUI code. They have varying degrees of success with Matlab compatibility, and all are inherently serial applications. Serial in this case means that the vast majority of the code is executed on a single core, although some of the programs have the ability to do a small amount of parallel execution. For parallel code execution, you usually need some add-ons, such as Message Passing Interface (MPI) [5], and a code rewrite to allow multiple instances of the tool on different nodes that communicate over a network.
I won't be comparing or contrasting the tools; rather, I'll briefly present them with some pointers on how to install and use the tool, and I'll leave the final determination of which tool is "better" for your case up to you.
Scilab
Scilab is one of the oldest Matlab-like tools. It was started in 1990 in France, and in May 2003, a Scilab Consortium was formed to better promote the tool. In June 2012, the Consortium created Scilab Enterprises, which provides a comprehensive set of services around Scilab. Currently, it also develops and maintains the software. Scilab is released under a GPL-compatible license called CeCILL (see Table 1 for Scilab resources).
Tabelle 1: Scilab Resources
Resource |
Location |
---|---|
Scilab |
|
Scilab Enterprises |
|
Xcos |
|
GUI API |
http://www.scilab.org/scilab/features/scilab/application_development |
ATOMS |
|
sciGPGPU |
|
OpenCL code |
|
Wiki |
|
Matlab to Scilab |
|
Intro to Scilab (PPT) |
|
Linalg performance |
|
Compiling |
http://wiki.scilab.org/Compiling%20Scilab%205.x%20under%20GNU-Linux%20Unix |
Parallel computing |
http://wiki.scilab.org/Documentation/ParallelComputingInScilab |
|
|
Parallel programing |
http://my.opera.com/muksitsyahlan/blog/2011/01/05/parallel-programming-with-scilab-2 |
MPI code |
http://gitweb.scilab.org/?p=scilab.git;a=shortlog;h=refs/heads/MPI |
Prepackaged versions of Scilab exist for Linux (32-bit and 64-bit); Mac OS X; and Windows XP, Vista, and Windows 7, along with, of course, the source code. These packages include all of Scilab, including something called Xcos, which corresponds to Simulink from MathWorks. Scilab is the only open source Matlab-like tool to include something akin to Simulink. Scilab also comes with both 2D and 3D visualization, extensive optimization capability, statistics, control system design and analysis, signal processing, and the ability to create GUIs by writing code in Scilab. You can also interface Fortran, C, C++, Java, or .NET code to Scilab. Installing Scilab on Linux is easy with either one of the precompiled binaries: 32- or 64-bit. I downloaded the 64-bit binary (a tar.gz
file), and untarred it into /opt
. This produces a subdirectory /opt/scilab-5.4.0
(the latest version as I wrote this). To run Scilab, I used:
/opt/scilab-5.4.0/bin/scilab
which brought up the Scilab GUI tool (Figure 1). The console in the middle of the figure accepts commands; a file browser is on the left, a variable browser at top right, and a command history on the bottom right. It also has a very nice built-in text editor called "SciNotes" (Figure 2), which can be used to write code.
Scilab's innovative Variable Browser lets you edit variables, including those in matrices, using something like a spreadsheet tool. When you first bring up the editor, it displays a list of the variables in the current workspace (Figure 3).
When you double-click on a variable, you call up the variable editor to edit the values. For example, double-clicking on variable A brought up the spreadsheet-like view shown in Figure 4. At this point, I can edit any value for any entry of A.
A "Modules" capability adds extra functionality to Scilab. Much like the "toolboxes" of Matlab, Scilab keeps modules at a website called ATOMS (AuTomatic mOdules Management for Scilab). One of the most critical modules for HPC is probably sciGPGPU, which provides GPU computing capabilities. Using sciGPGPU within Scilab is relatively straightforward, but you need to know something about GPUs and CUDA [6] or OpenCL [7] to use it effectively. Listing 1 shows a code snippet taken from the main sciGPGU site that illustrates how to use the cuBLAS library [8]. (You can also use the cuFFT library [9], but sample code for it is not shown.)
Listing 1: Scilab GPU Code Using sciGPGPU
01 stacksize('max'); 02 // Init host data (CPU) 03 A = rand(1000,1000); 04 B = rand(1000,1000); 05 C = rand(1000,1000); 06 07 // Set host data on the Device (GPU) 08 dA = gpuSetData(A); 09 dC = gpuSetData(C); 10 11 d1 = gpuMult(A,B); 12 d2 = gpuMult(dA,dC); 13 d3 = gpuMult(d1,d2); 14 result = gpuGetData(d3); // Get result on host 15 16 // Free device memory 17 dA = gpuFree(dA); 18 dC = gpuFree(dC); 19 d1 = gpuFree(d1); 20 d2 = gpuFree(d2); 21 d3 = gpuFree(d3);
Scilab has a vibrant community, and the excellent Scilab wiki has a very good section on migrating from Matlab to Scilab. At this site, an extensive PDF discusses differences between Matlab and Scilab and how to change your Matlab code, if it needs to be changed, to run on Scilab.
An additional excellent Scilab resource is a PowerPoint presentation by Johnny Heikell of 504 slides (at last count), which introduces Scilab and how to use it. Heikell also shows how to convert Matlab files to Scilab files. Keep in mind that the downloadable Scilab binaries are built to be as fast as possible, yet still be transportable.
Because performance is extremely important in HPC, you might want to build Scilab yourself . This process would allow you to include Intel's MKL library [10], to get the fastest possible BLAS and FFT operations for Intel processors, or ACML (AMD Core Math Library) [11], which is used to tune AMD processors. Be sure to read all of the details on building Scilab at the wiki site; the GUI portion of Scilab requires Java.
GNU Octave
The GNU Octave project was conceived by John W. Eaton at the University of Wisconsin-Madison as a companion to a chemical reactor course he taught. Serious design of Octave, as it was first called, began in 1992, with the first alpha release on January 4, 1993, and the 1.0 release on February 17, 1994. In 1997, Octave became GNU Octave (starting with version 2.0.6). From the beginning, it was published under the GNU GPL license – initially, the GNU GPLv2 license but later switched to the GNU GPLv3 license.
For the rest of this article, I will refer to GNU Octave as just Octave. Like Scilab and Matlab, Octave is a high-level interactive language for numerical computations. Its language is very similar to, but slightly different from, Matlab. It comes with a large number of functions and packages and uses Gnuplot for plotting and visualization.
Octave is popular and widely used, perhaps partly because it is part of GNU, so it is commonly built for Linux distributions. However, I also think it is widely used because the basic syntax is close to Matlab, and it is open source. Some differences between Octave and Matlab are explained in the Octave wiki, a FAQ on porting, a table of key differences, and a wikibook (see Table 2 for Octave resources).
Tabelle 2: GNU Octave Resources
Resource |
Location |
---|---|
GNU Octave |
|
Gnuplot |
|
Wiki |
|
FAQ |
|
Matlab/Octave differences |
http://www.ece.ucdavis.edu/%7Ebbaas/6/notes/notes.diffs.octave.matlab.html |
Programming differences between Matlab and Octave |
http://en.wikibooks.org/wiki/MATLAB_Programming/Differences_between_Octave_and_MATLAB |
SourceForge |
|
Toolkits |
|
HDF5 |
|
Introduction |
http://www-mdp.eng.cam.ac.uk/web/CD/engapps/octave/octavetut.pdf |
JIT |
|
Build with MKL |
http://software.intel.com/en-us/articles/using-intel-mkl-in-gnu-octave |
Build with ACML |
|
Parallel toolbox |
|
|
http://octave.sourceforge.net/general/function/parcellfun.html |
|
A huge number of additional toolkits for Octave (the same concept as a Matlab toolbox) are available at Octave-Forge. One thing you do need to note is that files from Matlab Central's File Exchange [12] cannot be used in Octave, as explained in the Octave FAQ.
Octave is easy to install because your favorite distribution probably has it available. In my case, I use Scientific Linux 6.2 (Listing 2). After installing Octave, I had one small problem to solve: The HDF5 libraries couldn't be found. I added a line to my .bashrc
file so the library was in LD_LIBRARY_PATH
:
Listing 2: Excerpt of Octave Install on SL6.2
[root@test1 laytonjb]# yum install octave ... Dependencies Resolved ===================================================================================== Package Arch Version Repository Size ===================================================================================== Installing: octave x86_64 6:3.4.3-1.el6 epel 9.1 M Installing for dependencies: GraphicsMagick x86_64 1.3.17-1.el6 epel 2.2 M GraphicsMagick-c++ x86_64 1.3.17-1.el6 epel 103 k blas x86_64 3.2.1-4.el6 sl 320 k environment-modules x86_64 3.2.7b-6.el6 sl 95 k fftw x86_64 3.2.2-14.el6 atrpms 1.6 M fltk x86_64 1.1.10-1.el6 atrpms 375 k glpk x86_64 4.40-1.1.el6 sl 358 k hdf5-mpich2 x86_64 1.8.5.patch1-7.el6 epel 1.4 M mpich2 x86_64 1.2.1-2.3.el6 sl 3.7 M qhull x86_64 2010.1-1.el6 atrpms 346 k qrupdate x86_64 1.1.2-1.el6 epel 79 k suitesparse x86_64 3.4.0-2.el6 epel 782 k texinfo x86_64 4.13a-8.el6 sl 667 k Transaction Summary ===================================================================================== Install 14 Package(s) Total download size: 21 M Installed size: 81 M Is this ok [y/N]: y ... Installed: octave.x86_64 6:3.4.3-1.el6 Dependency Installed: GraphicsMagick.x86_64 0:1.3.17-1.el6 GraphicsMagick-c++.x86_64 0:1.3.17-1.el6 blas.x86_64 0:3.2.1-4.el6 environment-modules.x86_64 0:3.2.7b-6.el6 fftw.x86_64 0:3.2.2-14.el6 fltk.x86_64 0:1.1.10-1.el6 glpk.x86_64 0:4.40-1.1.el6 hdf5-mpich2.x86_64 0:1.8.5.patch1-7.el6 mpich2.x86_64 0:1.2.1-2.3.el6 qhull.x86_64 0:2010.1-1.el6 qrupdate.x86_64 0:1.1.2-1.el6 suitesparse.x86_64 0:3.4.0-2.el6 texinfo.x86_64 0:4.13a-8.el6 Complete!
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/lib64/mpich2/lib/"
To run Octave, I simply enter octave
at the command prompt.
Right now, Octave is a command-line-driven tool without a standard GUI. Several attempts have been made at a GUI, but none have been successful enough to be included with Octave. You can read more about it in the Octave FAQ. Octave also can use gnuplot to plot results and visualize data. Figure 5 is an example of a 3D plot borrowed from "Introduction to GNU Octave" [13] that shows the commands used to create a plot. Octave creates a new window with the resulting plot, as shown in Figure 6.
A number of sites have introductions and examples of Octave, and a good place to start is the Octave wiki or a slightly dated Introduction to Octave PDF (see Table 2), which is nevertheless still a valuable resource for help getting started with Octave.
Recently, an effort has been made to create a JIT (Just In Time) compiler for Octave. It is a work in progress and not quite ready for production, but you can read about the goals and possibly experiment with it. Be warned that work on the JIT has not progressed for a few months, but I'm hoping it doesn't become another dead Octave project.
As with Scilab, the downloadable binaries for Octave that come with your distribution are likely to be the least common denominator in terms of performance, but building Octave is fairly easy.
Intel provides a set of instructions on how to build Octave using MKL, and a blog post tells you how to build Octave with ACML for AMD processors (it's for Ubuntu, but the principles are the same). To make things a little more generic, you can also use OpenBLAS [14] to build Octave. Some efforts have been made to run some Octave functions on GPUs; however, adding GPU capability to Octave is not likely to happen any time soon.
To be honest, I don't completely understand the issues, but it involves license issues because the GPU GPLv3 licenses are not compatible with licenses for various GPU tools and languages (CUDA in particular). I hope this will be resolved in the future, but in my opinion, it really hurts Octave's applicability in HPC.
FreeMat
A more recent development effort for a Matlab-like tool is called FreeMat. The intention is to develop an interactive numerical environment that is similar to both Matlab and IDL. FreeMat has prebuilt binaries for Windows, Mac OS X, and Linux and is released under the GPL.
FreeMat follows the same lines as Scilab and Octave, and the language is fairly close to Matlab. The FreeMat FAQ has a short section on the differences between FreeMat and Matlab that should help you take Matlab code and run it with FreeMat (see Table 3 for FreeMat resources).
Tabelle 3: FreeMat Resources
Resource |
Location |
---|---|
FreeMat |
|
FAQ |
|
Primer |
|
Numerical methods |
|
Parallelization plans |
|
Threads |
I tried installing an FC14 (Fedora Core 14) version of FreeMat 4.x on my Scientific Linux 6.2 system using rpm
to install it and yum
to help resolve dependencies, but I received errors that I could not resolve, and it failed, so I tested FreeMat on a Windows 7 system. Figure 7 shows the FreeMat console with a few commands. The window looks similar to Scilab and, to some degree, Matlab.
A console appears on the right, and the stacked windows on the left are the file browser, history, variable list, and debug windows. The figure shows that the simple AC=B works just the same as in Matlab, Scilab, and Octave. FreeMat can also do some reasonable graphics. Figure 8 shows a plot of the simple 3D plot example taken from the FreeMat help site.
The FreeMat site has a good introduction to the software, and you can find a FreeMat Primer on the FLOSS for Science website. A good introduction to FreeMat is combined with a discussion of basic numerical methods, as well. The PDF is incomplete by a few pages, but it does get you started with FreeMat.
Going Parallel
Matlab-like tools are extremely useful in HPC, even though they are serial applications. As I mentioned previously in this article, Matlab and Matlab-like tools can be used for tasks such as parameter sweeps by running something like 25,000 simultaneous instances of the application. However, in other situations, you might want to run the underlying functions in parallel.
For example, you might want to perform a large FFT or a large SVD (single-value decomposition) as quickly as possible by running the application using all of the cores in the node, or even by running the computations across several distributed nodes.
Several parallel processing options for Scilab are summarized in the Scilab parallel computing documentation. The first option is to use the inherent multicore capabilities in the functions used in Scilab.
For example, certain libraries perform the linear algebra computations in Scilab, and these libraries could perform the computations using all of the cores in the system. Intel's MKL library can use all of the cores for performing matrix multiplications or other functions. Typically, this is done using OpenMP but not necessarily. However, these computations are limited to intrinsic functions, so you can't parallelize Scilab code such as a for
loop.
Scilab also has the capability of running more explicit parallel applications on multicore systems (i.e., cores on the same node). A function called parallel_run
allows parallel calls to a function. This allows you to parallelize function calls on the system – but remember that the execution is on a single node (but with four-socket AMD systems, you can get 64 cores on a single system).
For parallel distributed applications on Scilab, you can also use PVM (Parallel Virtual Machine). PVM is a rather old approach to parallel programming and has given way to MPI (Message Passing Interface) for the most part, but it is still used in some areas. A good blog post discusses how to use PVM within Scilab (but it is two years old by now). A Git repository holds some early code developed by Scilab Enterprises to create MPI capability for Scilab.
In a manner similar to Scilab, Octave can also use numerical libraries that have been parallelized to run on a single node, such as Intel's MKL or something similar, perhaps using OpenMP. You just have to build Octave yourself and use the appropriate libraries.
Octave also has a parallel toolbox to use for running applications on a cluster or a distributed system, and with the parcellfun
command, you can execute parallel function calls on the same node. This is very similar to Scilab's parallel_run
command.
The openmpi_ext
toolbox uses MPI to allow Octave instances on different nodes to communicate and share data. It requires the use of Open MPI [15], but if you have experience in HPC, it isn't difficult to build and install.
Parallel coding in FreeMat is a little more difficult. Evidently, early versions of FreeMat could use MPI for parallel coding; however, it appears this work has not been continued in the current versions of FreeMat.
One interesting FreeMat feature is the use of threads within the language. FreeMat threads can communicate with each other through the use of global variables. Although I have not tested this feature, it appears to be in the current versions.
Summary
In this article, I briefly reviewed three Matlab-like tools: Scilab, Octave, and FreeMat. All three have pluses and minuses that can be debated, but the one you choose ultimately depends on your requirements. For further comparison of these tools, check out the technical report from the University of Maryland [16].
If you need a general-purpose numerical tool for HPC, any one of these tools is a good candidate. If you are willing to stray further from Matlab compatibility, other candidates could work as well, but that is the subject of another article and likely another series of debates. In the meantime, give one of these applications a whirl – I think you'll like what you see.