FortranFormat

Description

FortranFormat is an open source Java library for parsing input and formatting output using Fortran Format specification. FortranFormat is written by Kevin Theisen.

Utility

Input and output processes are some of the most common and important functions performed in a program. However, writing a parsing program each time new data needs to be analyzed can be a tedious experience. Having a wide assortment of libraries available significantly reduces the amount of work necessary.

Why did you write FortranFormat?
We needed Fortran Formatting to read and write various chemical filetypes, namely Standard Molecular Data (SMD). Most other programming languages have libraries for parsing Fortran Format, but in all of our searching, we could not find a credible library in Java, so we wrote it ourselves. The FortranFormat library is now used to parse/write all fixed length filetypes in ChemDoodle.

Why use Fortran Format in Java?
Fortran Format may be quite old, but it is still commonly used in many places, especially in academic settings where Fortran is still widely used. If you want to upgrade your system to use Java for any reason, you can use this library to ease the transition. Also, many file formats are easily parsed and written using Fortran Format, so it is a welcome addition to any Java parser’s library.

Why not use a protocol such as XML rather than Fortran Format?
The FortranFormat library does not compete with parsers of other protocols like XML; it is just another library to be used when the need arises. Many file formats were created years before XML and other markup languages were developed and therefore require different parsing libraries to manipulate. FortranFormat aids in that aspect, especially in cases where output and input are text based and restricted to specific column widths (older formats are usually restricted to 80 columns to fit correctly in a console window).

Ease of Use

FortranFormat is Java 1.5+ compatible and adheres to the standards of Java 1.5. The entire library was written by Kevin Theisen, so it is very easy to read and follow. It is commented with standard JavaDoc and is packaged with a JUnit testing set.

Usage

FortranFormat is typically used in two ways, statically and as an object.

You may call FortranFormat using the static read and write methods that should be familiar to those using Fortran.

1
ArrayList<Object> FortranFormat.read(String input, String specificationString);

The static read method allows you to quickly obtain a ArrayList of Objects (of correct class as described below), given an input string and a format specification string.

1
String FortranFormat.write(ArrayList<Object> input, String specificationString);

The static write method allows you to quickly obtain a formatted string, given an input ArrayList of Objects (of correct class as described below) and a format specification string.

You may also use FortranFormat as an object for more extended use of a single format specification string.

You may use a FortranFormat Object to read and parse a large number of lines from a given file.

1
2
3
4
5
6
7
...
FortranFormat formatter = new FortranFormat(specificationString);
while((line = br.readLine())!=null){
     ArrayList<Object> inputObjects = formatter.parse(line);
     // use the objects
}
...

You may also use a FortranFormat Object to format a large number of data structures to an output file. The additional beauty of this procedure is that the same FortranFormat Object used for the parsing, may also be used for formatting!

1
2
3
4
5
6
7
8
9
...
FortranFormat formatter = new FortranFormat(specificationString);
for(DataStructure ds : dataStructures){
     ArrayList<Object> outputObjects = new ArrayList<Object>();
     // place in objects to be formatted
     String formatted = formatter.format(outputObjects);
     filewriter.write(formatted);
}
...

For much more complex examples, read this story about using FortranFormat to read and write chemical filetypes.

Options

There is an Options object associated with every instance of the FortranFormat class. The following options are available.

setAddReturn(boolean b) Will add ‘\n’ character to the end of all formatted output.
setLeftAlignCharacters(boolean b) Strings will be left aligned when formatting content for the A edit descriptor.
setPositioningChar(char c) Change the character used for spacing by the X edit descriptor.
setReturnFloats(boolean b) Float objects will be returned instead of double after parsing input.
setReturnZerosForBlanks(boolean b) If a blank is detected for numbers during parsing, a zero will be returned of the appropriate class, instead of a null.

Runtime Exceptions

All exceptions are thrown by FortranFormat, as should be. You must account for thrown exceptions in your code. Usually exceptions will occur when parenthesis are not used correctly, or when data cannot be parsed or casted according to the Java Class relationships described below. Parse exceptions are used when further information about the position of the exception during parsing is available.

Extent and Limitations

The format specification string is parsed forgivingly. FortranFormat supports grouping and multiplication and does not require commas, although you may want to include commas to resolve any ambiguities. FortranFormat supports the following Fortran Format edit descriptors as listed.

For clarity, the ArrayList<Object> that is returned after parsing a string will contain Objects with classes corresponding to the class listed in the Java Object Created column. You should then cast the Objects appropriately when using them. When formatting a ArrayList<Object> using FortranFormat, you should place in Objects with classes corresponding to class listed in the Java Object Formatted column. The Objects that appear in the ArrayList<Object> are (and should be) in the same order as the repeatable edit descriptors they correspond to. Non-repeatable edit descriptors are not represented (and nothing should be included for them) in the ArrayList<Object>.

Fortran Format Edit Descriptors

Repeatable Edit Descriptors

Edit Descriptor Descriptor Code Read Supported Write Supported Java Object Created Java Object Formatted
Integer I Yes Yes Integer Integer
Real Decimal F Yes Yes Double Double or Float
Real Exponent E Yes Yes Double Double or Float
Real Scientific ES Yes Yes Double Double or Float
Real Engineering EN Yes Yes Double Double or Float
Locical L Yes Yes Boolean Boolean
Characters A Yes Yes String String

Non-repeatable Edit Descriptors

Edit Descriptor Descriptor Code Read Supported Write Supported
Horizontal Progression X Yes Yes
Horizontal Tab T, TL, TR No Yes
Vertical Progression / Yes Yes
Format Scanning Control : Yes Yes
Sign Control S, SP, SS No No
Blank Control BN, BZ No No

Download

FortranFormat v1.1 Source, Binaries and Documentation
MD5 checksum: 4978e0ae373fbf6dc1499713b92f2b09

License

FortranFormat is licensed under the liberal open source Berkeley Software Distribution (BSD) License. Basically, you may freely use this code for your utility or profit so long as you retain the copyright notices as required by the BSD license and that you do not use the name of iChemLabs or Kevin Theisen to endorse your work as is described by the BSD license.

ChangeLog

1.1

November 17, 2009

Refactoring

  1. Java enums are now used to handle the edit descriptors.
  2. The code is better organized and more object oriented.
  3. All options are now provided in an Options class for simple access.
  4. Added more formatting options: add a return line to the end of strings, return Float objects instead of Double, change the spacing character associated with the X edit descriptor, left or right align strings associated with the A edit descriptor, return zeros instead of nulls when numbers are expected for empty strings.
  5. ArrayLists are used instead of Vectors.
  6. A JUnit testing set is now provided with the library for easy testing. The previous driver has been removed.

Fixes

  1. Fixed bug where sending an empty string to the parser would cause an error.
  2. Fixed several issues related to interpreting the Fortran Format specification string. Expanding parenthesis, multiplying out descriptors and finding missing comma functions are now much more robust and well tested.

1.0

March 15, 2009

Initial release.

Reference

Fortran 77 Specification
Fortran 90 Format Specification Tutorial, by Professor Ching-Kuang Shene of Michigan Technological University
Peter Murray-Rust Discusses this library

Polls

How often do you use ChemDoodle/chemical drawing tools?

View Results

Loading ... Loading ...

Copyright © 2008-2014 iChemLabs, LLC. All rights reserved.