Tools Modern Fortran Lead image: Lead Image © homestudio, 123RF.com

Modern Fortran for today and tomorrow

Up To Date

Fortran development is still progressing, with a new version scheduled for release in 2018. We look at Fortran's evolution into a modern language for HPC. By Jeff Layton

Fortran has been one of the key languages for technical applications since almost forever. With the rise of C and C++, the popularity of Fortran waned a bit, but not before a massive library of Fortran code had been written. The rise of scripted languages such as Perl, Python, and R also dented Fortran a bit, but it is still in use today for many excellent reasons and because of a HUGE library of active Fortran code.

Fortran 90 took Fortran 77 from the dark ages by giving it new features that developers had wanted for many years and by deprecating old features – but this was only the start. Fortran 95 added new features, including High Performance Fortran (HPF), and improved its object-oriented capabilities. Fortran 2003 then extended the object-oriented features; improved C and Fortran integration, standardizing it, and making it portable; and added a new range of I/O capabilities. Features and capabilities still on the developers' wish lists led to Fortran 2008 and concurrent computing.

I still code in Fortran for many good reasons, not the least of which is performance and readability.

Origins

Fortran was originally developed as a high-level language so code developers didn't have to write in assembly. It was oriented toward "number crunching," because that was the predominant use of computers at the time (Facebook and YouTube weren't around yet). The language was fairly simple, and compilers were capable of producing highly optimized machine language. Fortran had data types that were appropriate for numerical computations, including a complex data type (not many languages have that), and easily dealt with multidimensional arrays, which are also important for number crunching. For all of these reasons, Fortran was the language of numerical computation for many years.

The advent of C caused a lot of new numerical code to be written in C or C++, and web and systems programming also moved to other languages, such as Java, Perl, and Python. However, Fortran is definitely not dead, and a lot of numerical code is still being written today using modern Fortran.

Fortran 77

Fortran began in about 1957 with the advent of the first compiler. It was used a great deal because it was much simpler than assembly language. In my experience, Fortran 77 (F77) brought forth the features that made the language easier to use to create many algorithms. As a result, a lot of older Fortran code was rewritten for F77.

In the past, Fortran was not case sensitive, so a lot of code was written in uppercase, creating a precedent followed for many years. You will find that Fortran coders of some lineage code in uppercase. However, in this article I mix upper- and lowercase in the examples.

In F77, a very useful way to share data across functions and subroutines was the use of common blocks, in which you define multiple "named" blocks that contain variables. (They could be of mixed type, including arrays.) A typical scenario was to put the common blocks in an include file and put each function or subroutine into its own file. Each file that needed access to the common block "included" the file of common blocks.

Another feature, or limitation, of Fortran is that all arrays have to be fixed in size at compile time (no dynamic memory), so if you declare an array x(100,100), you cannot change the dimensions or size after the code has been compiled. One trick was to define one very large vector and then "give" parts of it to various portions of the code. Although a little messy, you could simulate dynamic memory if the array was large enough.

Fortran also had fixed formatting. Some languages sort of have this today, such as Python, which requires indentation. In Fortran, the first column of each line could contain a C or c, indicating the beginning of a comment line, although you couldn't put comments just anywhere in the code. It could also be used for a numerical label. Column 6 was reserved for a continuation mark, so that lines that were longer than one line could be continued. The code began in column 7. Columns 1 to 5 could be used for statement labels (Listing 1; note that the first and last three lines are not code, but guides to show how code aligns in columns).

Listing 1: Sample Fortran 77 Code

         11111111112222222222
12345678901234567890123456789
-----------------------------
      SUM = 0.0
      D0 100 I=1,10
         SUM = SUM + REAL(I)
 100  CONTINUE
...
      Y = X1 + X2 + X3 +
     1    X4 + X5 + X6
-----------------------------
         11111111112222222222
12345678901234567890123456789

By default, F77 defines variables starting with (upper- or lowercase) i, j, k, l, m, and n as integers, so they are used as do loop counters. Variables that use any other letter of the alphabet are real unless the code writer defines them to be something else (they could even define them to be integers). You could turn off this implicit definition with IMPLICIT NONE just after the program, subroutine, or function name, which forces defining each and every variable (not necessarily a bad thing).

Fortran 90

The next Fortran standard, referred to as Fortran 90 (F90), was released in 1991 as an ISO standard, and in 1992 as an ANSI standard. Fortran 90 is a huge step up from F77, with a number of new features added and a number of older features deprecated. Fortran 90 was the first big step in creating modern Fortran.

The first new feature of importance was free-form source input. You were now allowed to put the code anywhere you wanted, and you could label statements, for example:

sum = 0.0 all: do i=1,10 sum = sum + real(i) enddo all

The next big feature in F90 is my personal favorite – allocatable arrays. In Listing 2, line 3 shows how to define an allocatable array. In this case, array a is a 2D array defined by (:,:). The allocation does not occur until line 7. Along with the array allocation is a "status" that returns a nonzero value if the allocation was unsuccessful or 0 if it was successful.

Listing 2: Allocatable Arrays

01   PROGRAM TEST1
02   IMPLICIT NONE
03   REAL, ALLOCATABLE :: a(:,:)
04   INTEGER :: n
05   INTEGER :: allocate_status
06   n=1000
07   ALLOCATE( a(n,n), STAT = allocate_status)
08   IF (allocate_status /= 0) STOP "Could not allocate array"
09 ! Do some computing
10   DEALLOCATE( a )
11   END PROGRAM TEST1

Allocatable arrays use heap memory and not stack memory, so you can use a lot more memory. For almost any large arrays, I always use allocatable arrays to make sure I have enough memory.

After performing some computations, you then deallocate the array, which returns the storage to the system. Line 9 is a comment line. Fortran 90 allows you to put a comment anywhere by prefacing it with an exclamation mark.

Another feature added to F90 was array programming, which allowed you to use array notation rather than do loops. For example, to multiply two, 2D arrays together, use c = matmul(a, b), where a, b, and c are compatible arrays. You could also use portions of an array with d(1:4) = a(1:4) + b(8:11).

Unlike CPU-specific code, typically with loop unrolling to achieve slightly better performance, Fortran coders used array notation because it was so simple to write and read. The code was very portable, because compiler writers created very high performance array intrinsic functions that adapted to various CPUs. Note that this included standard intrinsic mathematical functions such as square root, cosine, and sine.

Fortran 90 also introduced derived data types (custom data types). The simple example in Listing 3 shows how easy it is to create several derived data types (lines 8-15). The derived (customer data) type has variables (e.g., i, r, and r8), a fixed-size array (array_s, which is allocated on the stack), an allocatable array (array_h, which is allocated on the heap), and another derived type, meta_data. Using derived types within derived types allows you to create some complex and useful custom data types.

Listing 3: Derived Data Type

01 program struct_test
02 type other_struct
03    real :: var1
04    real :: var2
05    integer :: int1
06 end type other_struct
07
08 type my_struct ! Declaration of a Derived Type
09    integer :: i
10    real :: r
11    real*8 :: r8
12    real, dimension(100,100) :: array_s           ! Uses stack
13    real, dimension(:), allocatable :: array_h    ! Uses heap
14    type(other_struct), dimension(5) :: meta_data ! Structure
15 end type my_struct
16
17 !  Use derived type for variable "a"
18    type(my_struct) :: a
19 ! ...
20    write(*,*) "i is ",a%i
21
22 !  Structures (variables) of the the derived type my_struct
23    type(my_struct) :: data
24    type(my_struct), dimension(10) :: data_array
25
26 end program struct_test

To access a specific part or member of a derived type, you simply use a percent sign (%), as shown in line 20. You can also make an allocatable derived type (lines 23-24).

A convenient feature called modules, and generically referred to as "modular programming" [1], was added to F90. This feature allows you to group procedures and data together. Code can take advantage of a module, and you can control access to it through the use of simple commands (e.g., public, private, contains, use). For all intents and purposes, it replaces the old common blocks of F77.

Modules are extremely useful. They can contain all kinds of elements, such as parameters (named constants), variables, arrays, structures, derived types, functions, and subroutines. The simple example in Listing 4 defines the constant pi and then uses the module (line 7) to make the contents available to the program.

Listing 4: Modules

01   module circle_constant
02   real, parameter :: pi = 3.14159
03   end module circle_constant
04
05   program circle_comp
06 ! make the content of module available
07   use circle_constant
08   real :: r
09 !
10   r = 2.0
11   write(*,*) 'Area = ', pi * r**2
12   end program circle_comp

The example in Listing 5 uses contains within a module to access content outside of the module. You can go one step further and denote public components (the default) that can be used outside the module and private components that can only be used within the module, giving you a lot more control over what can be accessed by other parts of the code.

Listing 5: Accessing External Content

01   module circle_constant
02   real, parameter :: pi = 3.14159
03
04   type meta_data
05     character(len=10) :: color
06     real :: circumference
07     real :: diameter
08     real :: radius
09     real :: area
10   end type meta_data
11
12   contains
13     subroutine meta_comp(r, item)
14        type(meta_data) :: item
15        item%diameter = 2.0 * r
16        item%area = pi * r **2
17        item%circumference = 2.0 * pi * r
18     end subroutine
19
20   end module circle_constant
21
22   program circle_comp
23 ! make the content of module available
24   use circle_constant
25   real :: r
26   integer :: iloop
27   type(meta_data), dimension(10) :: circles
28 !
29   r = 2.0
30   circles(1)%radius = 4
31   circles(1)%color = "red"
32   call meta_comp(r, circles(1))  ! Call the module function
33
34   circles(2:10)%color = "blue"   ! array operation
35   r = 4.0
36   circles(2:10)%radius = r       ! array operation
37   do iloop=2,10
38      call meta_comp(r,circles(iloop))
39   end do
40
41   end program circle_comp

POINTER was a new type of variable in F90 that references data stored by another variable, called a TARGET. Pointers are typically used as an alternative to allocatable arrays or as a way to manipulate dynamic data structures, as in linked lists.

A pointer has to be defined as the same data type and rank as the target and has to be declared a pointer. The same is true for the target, which has to have the same data type as the pointer and be declared a target. In the case of array pointers, the rank, but not the shape (i.e., the bounds or extent of the array), has to be specified. The following are simple examples of a declaration

INTEGER, TARGET :: a(3), b(6), c(9)INTEGER, DIMENSION(:),POINTER :: pt2

and multidimensional arrays:

INTEGER, POINTER :: pt3(:,:)
INTEGER, TARGET :: b(:,:)

You cannot use the POINTER attribute with the following attributes: ALLOCATABLE, EXTERNAL, INTRINSIC, PARAMETER, INTENT, and TARGET. The TARGET attribute is not compatible with the attributes EXTERNAL, INTRINSIC, PARAMETER, and POINTER.

Using pointers is very simple, having only two operators (Listing 6). In some code, pointers are also used with blocks of dynamic memory (lines 13-17), for which you still have to use the allocate statement (function). In this example, PTR2 points to a single real value, and PTRA points to a block of dynamic memory for 1,000 real values. After you're done using the block of memory, you can deallocate it. In this regard, the pointers behave very much like allocatable arrays. One other common use for pointers is to divide arrays into smaller sections with the pointer pointing to that subsection (Listing 7).

Listing 6: Pointers

01   PROGRAM PTR_TEST1
02   INTEGER, POINTER :: PTR1
03   INTEGER, TARGET :: X=42, Y=114
04   INTEGER :: N
05   REAL, POINTER :: PTR2:wq:w
06   REAL, POINTER :: PTRA(:)
07 ! ...
08   PTR1 => X    ! PTR1 points to X
09   Y = PTR1     ! Y equals X
10   PTR1 => Y    ! PTR1 points to Y
11   PTR1 = 38    ! Y equals 38
12
13 ! Dynamic memory blocks
14     N = 1000
15     ALLOCATE( PTR2, PTRA(N) )
16 ! Do some computing
17     DEALLOCATE(PTR2, PTRA)
18
19   END PROGRAM PTR_TEST1

Listing 7: Pointers and Arrays

01   program array_test
02   implicit none
03   real, allocatable, target :: array(:,:)
04   real, pointer :: subarray(:,:)
05   real, pointer :: column(:)
06   integer :: n
07   integer :: allocate_status
08 !
09   n = 10
10   allocate( array(n, n), stat  = allocate_status )
11   if (allocate_status /= 0) stop "Could not allocate array"
12 !
13   subarray => array(3:7,3:7)
14   column => array(:,8)
15 !
16   nullify(subarray)
17   nullify(column)
18   deallocate(array)
19
20   end program array_test

A linked list [2], a sequence of "nodes" that contain information and point to the next node in the sequence, is a very useful data structure. Variations allow the node to point to the previous node. Although it's not difficult to write for specific cases, generic linked lists [3] are more difficult. One instructive example online creates a linked list manager [4].

Fortran 90 created a way of specifying precision to make the code more portable across systems. The generic term used is kind. Real variables can be 4-byte (real*4), 8-byte (real*8, usually referred to as double precision), or 16-byte (real*16). In this example, I assign various kinds of precision to variables.

real*8 :: x8  ! kind=8 or doule precision
real*4 :: x4  ! kind=4 or single precision
real(kind=4) :: y4   !
integer*4 :: i4 ! integer single precision (kind=4)
integer(kind=8) :: i8 ! Integer double precision

By using kind or the equivalent *<n> notation, you make your code much more portable across systems.

At a high level, in addition to the features mentioned, F90 also introduced in-line comments, operator overloading, unlabeled do loops (constructs such as do, if, case, where, etc. may have names.), and the case statement.

In F90, the language developers also decided to deprecate, or outright delete, some outdated F77 features [5], including the arithmetic IF, non-integer DO parameters for control variables, the PAUSE statement, and others. You can easily discover what features have been deprecated or deleted by taking some F77 code, switching the extension to .f90, and trying to compile it with a F90 compiler.

Fortran 90 was only the start. The next two iterations – Fortran 95 and 2003 – pulled Fortran into a new era of programming languages.

Fortran 95

Very quickly after F90 was released, the Fortran standards group started working on the next evolution: Fortran 95 (F95) [6]. The standard was officially published in 1997 and was oriented toward resolving some outstanding issues in F90. However, it did add a new extension for HPF [7]. Also, features that were noted as "obsolete" in F90 were removed in F95.

The HPF extension of F90 focused on parallel computing [8]. HPF uses a data parallel model to spread computations across multiple processors, allowing the use of multicore processors in a node that would otherwise be idle. This model works well for both single-instruction, multiple-data (SIMD) and multiple-instruction, multiple-data (MIMD) techniques.

A few HPF additions were the FORALL statement, compiler directives for array data distribution, a procedure interface for non-HPC parallel procedures, and additional library routines, including data scattering and gathering.

The FORALL construct allows writing array functions that look more like the original mathematical formula and makes it a bit easier to change the code for different ranges without worrying that the arrays are conformable. A quick example is the basic portion of the four-point stencil for the 2D Poisson equation problem [9] (Figure 1).

Figure 1: Four-point stencil formula [10]

Assume the domain ranges from 1 to n for both i and j. Therefore the iterations are over i = 2 to n-1 and j = 2 to n-1. You can write the iteration over the domain using array notation as follows:

a(2:n-1,2:n-1) = 0.25 *  (a(1:n-2,2:n) + a(3:n,2:n) + a(2:n,1:n-2) + a(2:n,3:n))

Using forall, the same can be written as:

forall (i=2:n-1, j=2:n-1) a(i,j) = 0.25*(a(i-1,j) + a(i+1,j) + a(i,j-1) + a(i,j+1))

Using forall makes the code a little easier to read. Moreover, it's very easy to change the domain without having to modify the formula itself.

Also introduced as part of HPF were compiler directives that deal with how data is distributed for processing. They are entered into the code as comments (e.g., !HPF$). Anything immediately after !HPF$ is considered an HPF directive [11]. The nice thing about HPF directives is that if the compiler is not HPF-aware, it views the directives as comments and just moves on. This is the same approach taken by OpenACC [12] and OpenMP [13].

The HPF extensions never seemed to be popular with users, and, in fact, not many compilers offered them. While the compilers were being written, a Message Passing Interface (MPI) standard for passing data between processors, even if they weren't on the same node, was becoming popular and provided a way to utilize other processors in the same system or processors on other systems.

Additionally, OpenMP was released, providing a threading model for applications. As with HPF, OpenMP allowed coders to insert directives that told the compiler how to break up the computations into threads and exchange data between threads. These directives were code comments, so compilers that were not OpenMP-capable just ignored them.

Some observed that these two developments led to the decline of HPF, although only a few F95 compilers implemented the extensions; for example, GFortran (GNU Fortran) [14] is not a true F95-compliant compiler because it does not implement HPF, although it implements almost all of the other F95 features.

Fortran 95 added some additional features outside of HPF [15], including FORALL and nested WHERE constructs to aid vectorization, user-defined PURE and ELEMENTAL procedures, and initialization of pointers to NULL(), among others.

In F95, some features or statements were deprecated (i.e., highly recommended not to be used before the next release of the standard) [15], including banning REAL and DOUBLE PRECISION index variables in DO statements and the removal of PAUSE, ASSIGN, the controversial assigned GOTO statement, and the H descriptor.

Fortran 2003

Both F90 and F95 introduced features that allowed object-oriented programming (OOP). Fortran 2003, the next standard, added even more features to enhance the object-oriented capability.

Before Fortran 2003, you could define the basics of classes in F90 [16]. Private and public functions could be used as needed, including constructors and destructors. Several resources are available online about OOP in F90 [17] and F95 [18]. One article even compares F90 and C++ [19], so you can understand the similarities between the two.

In Fortran 2003, the ability to create type-bound procedures [20] was introduced, which makes calling functions within a class much easier. Fortran 2003 also introduced type extensions and inheritance (key aspects of OOP), as well as polymorphism and dynamic type allocation. This collection of additions really completed support for abstract data types, making OOP much easier.

To round out Fortran 2003 [21], derived type enhancements (parameterized derived types, improved control of accessibility, improved structure constructors, and finalizers) and procedure pointers were added.

Classic object-oriented languages such as C++ or Python allow you to define classes that include data and methods (functions) that operate on that data. You can instantiate (create an instance) of the class that has its own data and methods or several instances of the class that each have their own unique data and methods.

In Fortran, modules are roughly equivalent to classes. Modules can contain data and methods (functions and subroutines), but it really doesn't support the concept of separate instances of a module. To get around this, you create a module to contain the methods for the specific class with a derived type that contains the data. From this combination, you can create separate "instances" of the type that pass data to the methods in the modules.

I'll look at a quick F90 example [16] for creating and using classes by creating a module that has two public functions: One computes the area of a circle given the radius, and the other prints out the area of the circle. In the main program, I can create a derived type using the module and then use methods in the module using the derived type (Listing 8).

Listing 8: Classes and Modules

01  MODULE CLASS_CIRCLE
02     IMPLICIT NONE
03     PRIVATE
04     PUBLIC :: CIRCLE, CIRCLE_AREA, CIRCLE_PRINT
05
06     REAL :: PI = 3.1415926535897931d0 ! Class-wide private constant
07
08     TYPE CIRCLE
09         REAL :: RADIUS
10     END TYPE CIRCLE
11
12 CONTAINS
13
14     FUNCTION CIRCLE_AREA(THIS) RESULT(AREA)
15         TYPE(CIRCLE), INTENT(IN) :: THIS
16         REAL :: AREA
17         AREA = PI * THIS%RADIUS**2.0
18     END FUNCTION CIRCLE_AREA
19
20     SUBROUTINE CIRCLE_PRINT(THIS)
21         TYPE(CIRCLE), INTENT(IN) :: THIS
22         REAL :: AREA
23         AREA = CIRCLE_AREA(THIS)   ! Call the circle_area function
24         PRINT *, 'Circle: r = ', this%radius, ' area = ', AREA
25     END SUBROUTINE CIRCLE_PRINT
26 END MODULE CLASS_CIRCLE
27
28 PROGRAM CIRCLE_TEST
29     USE CLASS_CIRCLE
30     IMPLICIT NONE
31
32     TYPE(CIRCLE) :: C    ! Declare a variable of type Circle.
33     C = CIRCLE(1.5)      ! Use the implicit constructor, radius = 1.5
34     CALL CIRCLE_PRINT(C) ! Call a class subroutine
35
36 END PROGRAM CIRCLE_TEST

Notice (lines 32-24) that the derived type is used to create the variable C; then, I can "store" the radius in C by calling the "constructor" of the class (all it does in this case is store the value), and the program calls the "method" to print out the value by passing the derived type instance C.

In Fortran 2003, you can take advantage of some new features to make the same code look more object oriented. In the example in Listing 9, type-bound procedures can be used within the module.

Listing 9: Type-Bound Procedures

01 MODULE CLASS_CIRCLE
02     IMPLICIT NONE
03     PRIVATE
04     PUBLIC :: CIRCLE, CIRCLE_AREA, CIRCLE_PRINT
05
06     REAL :: PI = 3.1415926535897931d0 ! Class-wide private constant
07
08     TYPE, PUBLIC :: CIRCLE
09         REAL :: RADIUS
10         CONTAINS
11             PROCEDURE :: AREA => CIRCLE_AREA
12             PROCEDURE :: PRINT => CIRCLE_PRINT
13     END TYPE CIRCLE
14
15 CONTAINS
16
17     FUNCTION CIRCLE_AREA(THIS) RESULT(AREA)
18         CLASS(CIRCLE), INTENT(IN) :: THIS
19         REAL :: AREA
20         AREA = PI * THIS%RADIUS**2.0
21     END FUNCTION CIRCLE_AREA
22
23     SUBROUTINE CIRCLE_PRINT(THIS)
24         CLASS(CIRCLE), INTENT(IN) :: THIS
25         REAL :: AREA
26         AREA = THIS%AREA()    ! Call the type_bound function
27         PRINT *, 'Circle: r = ', this%radius, ' area = ', AREA
28     END SUBROUTINE CIRCLE_PRINT
29 END MODULE CLASS_CIRCLE
30
31 PROGRAM CIRCLE_TEST
32     USE CLASS_CIRCLE
33     IMPLICIT NONE
34
35     TYPE(CIRCLE) :: C ! Declare a variable of type Circle.
36     C = CIRCLE(1.5)   ! Use the implicit constructor, radius = 1.5
37     CALL C%PRINT      ! Call a type-bound subroutine
38
39 END PROGRAM CIRCLE_TEST

In the module definition, I type-bound the procedures CIRCLE_AREA and CIRCLE_PRINT to the module (lines 11-12). In the main program, once I instantiate the derived type C = CIRCLE(1.5) (line 36), I can call methods (functions) as I would a derived type (CALL C%PRINT) (line 37). This looks a bit more like OOP.

In addition to object-oriented characteristics, Fortran 2003 introduced new features to enhance its ability to interact with C and C++ libraries or code. If you have tried to integrate C and Fortran in the past, you understand the fundamental differences between the two. For one, multidimensional arrays in C/C++ are in the opposite order in Fortran. Additionally, all arguments in Fortran are passed by reference and not by value, which means C must pass Fortran arguments as pointers. Finally, by default, C arrays start with 0 (zero) and Fortran arrays start with 1. Modern Fortran allows you to define arrays starting with 0 (zero) – or any integer, including negative numbers.

Integrating Fortran and C has been notoriously difficult, although possible, and linkers will blissfully create a binary for you. Moreover, it has never been easy to create portable C/Fortran integration. Sometimes, you have to test the compilers to determine exactly how they integrate.

Fortran 2003 added a standardized way to generate procedures, derived type declaration, and global variables that are interoperable with C. Because of standardization, Fortran 2003-compliant compilers make portability of C and Fortran integration much better.

The basic model for Fortran/C integration comprises three basic parts: (1) Define Fortran "kinds" that map to C types. Remember that in F90, data types started defining the precision of variables via the kind definition. (2) Use an intrinsic module to define names: USE, INTRINSIC :: ISO_C_BINDING. (3) Use a BIND(C) attribute to specify C linkage. Discussing the details is beyond the scope of this article, but look around the web if C and Fortran integration is important.

Another set of big changes in Fortran 2003 has to do with I/O. Before Fortran 2003, all I/O was record based, which worked fine for data that had been produced by other Fortran code or for reading ASCII files. However, it doesn't work well when reading data from a file that was created by an instrument or a spreadsheet or a database. Fortran I/O had to become much more general, including the ability to do "stream" I/O. With Fortran 2003, you could now do both formatted and unformatted stream I/O.

Using stream file access has several benefits. The first is that the file will likely be smaller because there are no record markers. However, this depends on how you were performing I/O before you started using streams. Second, streams allow you to read data from files you normally would not be able to read, such as databases. Third, new I/O functions allow you to move to different "positions" within the file to perform I/O.

Unformatted stream writes have no record markers. Subsequent writes just append data to the file [22]. For example, consider two WRITE() statements that write first and then second. The first write adds 5 bytes to the file, and the second write adds 6 bytes to the file. Therefore, the next write should start at byte 12. To get the current position in the file, use INQUIRE(UNIT=<fn>, POS=mypos).

If the code did not use streams, then the writes would add a record marker after each write. Therefore, the position would be something greater than 12 bytes. If you compile the code with a compiler that supports streams, the resulting position will be 12.

Using streams, you can specify a position within a file for a READ() or WRITE() function. For example, READ(11, POS=12) string reads a string at position 12 (i.e., 12 bytes into a file). The start of a file is position 0.

A couple of quick pieces of advice when using stream output: If you use it to create an output file, older Fortran code will not be able to read it (no record markers). Only code built with compilers that implement streams will be able to read the file. Conversely, if you are reading a file written by an older version of Fortran, you can't read it using streams – you have to read it as classic Fortran with record markers.

Other features added in Fortran 2003 [21] include (1) data manipulation enhancements (allocatable components, deferred type parameters, VOLATILE attribute, explicit type specification in array constructors and allocate statements, pointer enhancements, extended initialization expressions, and enhanced intrinsic procedures). (2) I/O enhancements: asynchronous transfer, stream access (previously discussed), user-specified transfer operations for derived types, user-specified control of rounding during format conversions, named constants for preconnected units, the FLUSH statement, regularization of keywords, and access to error messages. (3) Procedure pointers. (4) Support for IEEE floating-point arithmetic and floating-point exception handling. (5) Support for international usage: access to ISO 10646 4-byte characters and choice of decimal or comma in numeric-formatted I/O. (6) Enhanced integration with the host operating system: access to command-line arguments, environment variables, and processor error messages.

The Fortran developers were not done, however. Fortran 2008 incorporates some new features to improve performance, and Fortran 2015, which, although still under development, shows some serious promise.

Fortran 2008

Fortran 2003 is to Fortran 2008 much as F90 is to F95. The revision added some corrections and clarifications to Fortran 2003 while introducing new capabilities. Probably the biggest added feature was the idea of concurrent computing [23] via Coarray Fortran [24]. Concurrent computing executes several computations during overlapping periods of time, allowing users to run sections of code at the same time. In contrast, sequential computations occur one after the other and have to wait for the previous computations to finish.

As a point of a clarification, concurrent computing is related to, but different from, parallel programming, which is so prevalent in HPC. In parallel computing, various processes run at the same time. On the other hand, concurrent computing has processes that overlap in terms of duration, but their execution doesn't have to happen at the same instant. A good way to explain the differences between serial, sequential, concurrent, and parallel is shown in Table 1 [23], which assumes you have two tasks, T1 and T2.

Tabelle 1: Serial, Sequential, Concurrent

Order of Execution	Model
T1 executes and finishes before T2	Serial and sequential
T2 executes and finishes before T1	Serial and sequential
T1 and T2 execute alternately	Serial and concurrent
T1 and T2 execute simultaneously	Parallel and concurrent

"Simultaneous" means executed in the same physical instant, and "sequential" is an antonym of "concurrent" and "parallel." In this case, sequential typically means something is performed or used serially in a particular order.

Coarray Fortran (CAF) is a set of extensions for Fortran 95/2003 that was developed outside of the Fortran standard so that experimentation could take place quickly. The extensions were adopted in Fortran 2008, with the syntax varying a little bit relative to the original CAF definition. CAF is an example of a Partitioned Global Address Space (PGAS) [25], which assumes a global address space that is logically partitioned. Each process has a portion of the address space, which can have an affinity for a particular process.

CAF uses a parallel execution model, so code performance can be improved. A draft of the Fortran 2008 standard [26] stated that "A Fortran program containing coarrays is interpreted as if it were replicated a fixed number of times and all copies were executed asynchronously. Each copy has its own set of data objects and is called an image. The array syntax of Fortran is extended with additional trailing subscripts in square brackets to give a clear and straightforward representation of access to data on other images."

Data references without square brackets are local data (local to the image). If the square brackets are included, then the data might need to be communicated between images. CAF uses a single-program, multiple data (SPMD) model.

When a coarray-enabled application starts, a copy of the application is executed on each processor. However, the images (each copy of the application) are executed asynchronously. The images are distinguished from one another by an index between 1 and n, the number of images. Notice it starts with 1 and not 0, which is perhaps influenced by the Fortran roots.

In the application code, you define a coarray with a trailing []. The coarray then exists in each image, with the same name and having the same size. They can be scalar, array, static, or dynamic and of intrinsic or derived type. Synchronization statements can be used to maintain program correctness. Table 2 shows examples of the ways you can define a coarray. Notice that scalars can be coarrays, fixed-size arrays, allocatable arrays, and derived types, and you can have coarrays with different coranks. However, you cannot have a coarray that is a constant or a pointer.

Tabelle 2: Defining Coarrays

Coarray Type	Definition
Scalar coarray	`integer :: x[*]`
Array coarray	`real, dimension(n) :: a[]real, dimension(n), codimension[] :: a`
Scalar coarray with corank 3	`integer :: cx[10,10,*]`
Array coarray with corank 3 and different cobounds	`real :: c(m,n) :: [0:10,10,*]`
Allocatable coarray	`real, allocatable :: mat(:,:)[:]` or `allocate(mat(m,n)[*])`
Derived type scalar coarray	`type(mytype) :: xc[*]`

What is a "corank"? Variables in Fortran code can have rank, bounds, extent, size, and shape, which are defined inside the parentheses of an array declaration, as specified from F90 onward. For coarrays, the corank, cobounds, and coextents are given by data in square brackets. The cosize of a coarray is always equal to the number of images specified when the application is executed. A coarray has a final coextent, a final upper cobound, and a coshape that depends on the number of images.

A simple example declaration is integer :: a(4)[*]. Here, each image ("process") is an integer array of 4. You can assemble a 2D (rank = 2) coarray data structure from these 1D (rank = 1) arrays. If using four images with the previous declaration, the coarray looks like the illustration in Figure 2. The arrays are stacked on top of each other because of the 1D coarray specification [*].

Two-dimensional array from 1D arrays: integer :: a(4)*. — Figure 2: Two-dimensional array from 1D arrays: integer :: a(4)`*`.

If you specify the coarray in a different manner, but still use four images, a 2D coarray can be declared as integer :: a(4)[2,*]. Each image is again a 1D integer with length (extent) 4. However, the coarray specifies that the 2D array is assembled differently (as in Figure 3). Remember that each image has it's own array that is the same as the others but can be combined using coarrays.

Two-dimensional array from 1D arrays: integer :: a(4)2,*. — Figure 3: Two-dimensional array from 1D arrays: integer :: a(4)`2,*`.

The use of coarrays can be thought of as opposite the way distributed arrays are used in MPI. With MPI applications, each rank or process has a local array; then, the process needs to be mapped from the local array to the global array so that local data can be mapped to the larger global array. The starting point is with global arrays and then to local arrays.

Coarrays are the opposite. You can take local arrays and combine them into a global array using a coarray. You can access the local array (local to the image) using the "usual" array notation. You can also access data on another image almost in the same way, but you have to use the coarray notation.

In another simple example, each image has a 2D array that is combined into a coarray to create a larger 2D array. Again, assume four images with the coarray declaration integer :: a(4,4)[2,*]. Each image has a 2D integer array a. With the coarray definition given by the square bracket notation, the "local" arrays are combined into a coarray (globally accessible) that looks like Figure 4.

Two-dimensional array from 2D arrays: integer :: a(4,4)2,*. — Figure 4: Two-dimensional array from 2D arrays: integer :: a(4,4)`2,*`.

If the array is local to the image, you access the array as you normally would. For example, for image 3 to access element (2,2) from the array, the statement would be something like b = a(2,2). You can always use coarray notation if needed, but in this case, you know the data is local, so you can access it using local notation. If image 1, 2, or 4 wanted to access that element (global access), then the statement would have to be b = a(2,2)[1,2].

You still have to pay attention to what image holds what data, but writing the statements to access the data is fairly easy. The key is the coarray subscripts. The declaration integer, dimension(10,4) :: a is a simple local variable with rank 2 (i.e., two indices) and has lower bounds 1 and 1. The upper bounds are 10 and 4. The shape is [10,4].

The declaration integer :: a(10,4)[3,*] would convert the array to a coarray, which adds a corank of 2 (two indices). The lower cobounds are 1 and 1. The upper cobounds are 3 and m, and the coshape is 3,m, where m = ceiling(num_images()/3).

Using coarrays, you no longer have to use MPI_SEND() and MPI_RECV() or their non-blocking equivalents for sending data from one process to another. You just access the remote data using the coarray syntax and let the underlying coarray library do the work, making multiprocess coding easier.

CAF for Fortran 2008 can be implemented in different ways. A common way is to implement it using MPI, as in GFortran [27]. Except for a small number of functions, GFortran provides coarray functionality starting in Fortran 2008 v5.1 using the coarray capability in OpenCoarrays [28]. The compiler translates the coarray syntax into library calls that then use MPI functions underneath.

OpenCoarrays

I decided to build and test the latest GFortran and OpenCoarrays on CentOS 7.2, which comes with an older version of GFortran, so my first step was to install the latest and greatest version, GCC 6.2.0. Believe it or not, building and installing a new GCC is not as difficult as it would seem. If you install as many dependencies as possible using the packages of the distribution, it's much easier. In general, I follow the directions in the GCC wiki [29] and GNU GCC installation page [30].

I installed GCC 6.2.0 into my home directory, /home/laytonjb/bin/gcc-6.2.0, and then modified the environment variables $PATH and $LD_LIBRARY_PATH so the new compilers were used instead of the older ones. The command used to build and install GCC is:

$PWD/../gcc-6.2.0/configure --prefix=$HOME/gcc-6.2.0 --enable-languages=c,c++,fortran,go --disable-multilib

The next step was to build and install an MPI library using the GCC 6.2.0 compilers. The OpenCoarray website recommended MPICH first, so I built and installed MPICH-3.2 in /home/laytonjb/bin/mpich-3.2. Again, the environment variables $PATH and $LD_LIBRARY_PATH were modified in the .bashrc file to point to MPICH binaries and libraries.

After installing the MPI library, the next step was to build and install OpenCoarray (OCA). The latest stable version as of the writing of this article was 1.7.2:

./install.sh -i /home/laytonjb/bin/opencoarray-1.7.2

The OCA build didn't take too long and was much shorter than building GCC. At this point, I'm ready to start compiling and executing some CAF code!

The first example is very simple (Listing 10). This proverbial "hello world" program for Fortran coarrays was taken from the Wikipedia page for coarrays [24].

Listing 10: Hello World with Coarrays

01 program Hello_World
02   implicit none
03   character(len=20) :: name[*] ! scalar coarray, one "name" for each image.
04   ! Note: "name" is the local variable while "name[]" accesses the
05   ! variable in a specific image; "name[this_image()]" is the same as "name".
06
07   ! Interact with the user on Image 1; execution for all others pass by.
08   if (this_image() == 1) then
09      write(*,'(a)',advance='no') 'Enter your name: '
10      read(*,'(a)') name
11   end if
12   ! Distribute information to all images
13   call co_broadcast(name,source_image=1)
14
15   ! I/O from all images, executing in any order, but each record written is intact.
16   write(*,'(3a,i0)') 'Hello ',trim(name),' from image ', this_image()
17 end program Hello_world

Using GCC, particularly from GFortran, you have two ways to build and execute this code. The first way is to use the compile and run scripts provided by OCA. If the name of the source file is hello.f90, then you should compile and execute with:

$ caf hello.f90 -o hello
$ cafrun -np 4 ./hello
Enter your name: Jeff
Hello Jeff from image 1
Hello Jeff from image 4
Hello Jeff from image 3
Hello Jeff from image 2

The first line compiles the code and the second line executes it. The -np 4 in the cafrun command tells the application to use four images. The second way of building and executing coarray code is to build it with the mpif90 script and execute it using the mpirun script,

mpif90 -fcoarray=lib yourcoarray.f90 -L/path/to/libcaf_mpi.a-lcaf_mpi -o yourcoarray
$ mpirun -np <n> ./yourcoarray

where <n> is the number of images. The -fcoarray=lib compilation command adds in the all-important libraries.

Other, more complicated coarray Fortran code examples [31] are floating around the web, although not as many as for F90 or Fortran 2003. Because Fortran 2008 is so new, it looks like coarray examples haven't quite caught up yet.

A great place to find out what aspects of Fortran 2008 your compiler supports and does not support is the Fortran 2008 wiki [32]. In addition to CAF, Fortran 2008 has implemented submodules (additional structuring facilities for modules; supersedes ISO/IEC TR 19767:2005), Coarray Fortran, the DO CONCURRENT construct for loop iterations with no interdependencies, the CONTIGUOUS attribute to specify storage layout restrictions, the BLOCK construct (which can contain declarations of objects with construct scope), and recursive allocatable components as an alternative to recursive pointers in derived types, among other features.

Fortran 2015

The next evolution of Fortran, even though many compilers have yet to implement all of Fortran 2008, is Fortran 2015. The standard has been under discussion for several years and is still undergoing work, with the goal for a 2018 release [33].

In general, Fortran 2015 is intended to include minor revisions, some clarifications, and corrections to inconsistencies, as well as some new features. In general, Fortran 2015 has two "thrusts." The first is around improved C and Fortran integration, and the second is around enhancing coarrays.

Many of the Fortran/C interoperability standards for Fortran 2015 were aimed to help MPI 3.0. An article from "Dr. Fortran" [34] (see the "Dr. Fortran's Retirement" box) contains a discussion of the proposed Fortran/C changes and even has a couple of examples.

A second feature is new additions to coarrays. The first addition is called "teams," which are collections of images in coarray applications that work together on a particular task. An application can have several teams working on separate tasks that then communicate their results to their parent.

From teams, you can also create subteams that can be dissolved and reformed dynamically. With subteams, if one image fails, you can capture the problem, recreate a new team, and proceed with your computations. Contrast this with MPI: If a rank fails, the entire application hangs or crashes.

The definition of a "failed image" reveals a problem of much discussion (or argument). For example, an image really might not be failed but just very slow for some reason. The complication is how to define "slow" (or stalled) and how to have Fortran detect and recover from these (slow or stalled) images. Initially, the Fortran team decided to add some synchronization constructs to detect that an image has failed. Although the final determination is not yet settled, the discussion is proceeding in the correct direction.

The second part of the coarray additions is called "events." This addition is very much as it sounds; that is, events allow an image to notify another image that it has completed a task and that it can proceed.

The third part of the coarray additions is called "atomics." Created in Fortran 2008, the concept of "atomic" objects allow for invisible operations. However, they have had limited support. In Fortran 2015, the following atomic operations were added: ADD, AND, CAS (compare and swap), OR, and XOR.

The fourth and last part of the coarray additions is called "collectives," which are intrinsic procedures for performing operations across all images of the current team. The new routines that have been defined (but are subject to change) are: CO_MAX, CO_MIN, CO_SUM, CO_BROADCAST, and CO_REDUCE.

As with previous versions of Fortran, some features are targeted for deprecation in Fortran 2015, including labeled DO loops, EQUIVALENCE, COMMON, and BLOCK DATA.

The features that are finally deleted from Fortran are the arithmetic IF and the non-block DO construct, which doesn't end in a CONTINUE or END DO.

Summary

Fortran 90 catapulted Fortran from a perceived "old" language to a modern language on equal footing with any other. It retained Fortran's history of simplicity and performance, but it added features that developers had been wanting, such as free-form input, dynamic arrays, pointers, derived types, modular programming (modules), and portable precision definitions (kind).

Furthermore, a number of older features were deprecated from the Fortran standard, a refreshing process that I believe allows the language to stay simple, with a focus on numerical performance. Although taking an older Fortran 77 program and updating it to Fortran 90 involved a bit of work, the result was definitely worth the effort, and you could even find tools, including some online tools, for autoconverting code from Fortran 77 to Fortran 90.

Fortran 95 and 2003 really pushed Fortran into the modern era by dropping old and unused features and adding newer, modern, and needed features. Fortran could now handle object-oriented programming and had much better and standardized integration with C. It also incorporated some features from HPF that improved code readability. Fortran was still oriented toward numerical computations, but now it was much easier to express programming concepts and write portable code.

The 2008 revision added concurrent computing via Coarray Fortran, and Fortran 2015, although still under development, hovers just within sight, with a projected release date of 2018.