John Levesque, levesque@apri.com
Dennis Goodrow, dgoodrow@mindscape.com
Fortran-90 to Fortran-77 translator
Working solo for a year, I analyzed, designed and implemented a translator for converting all the features of Subset High Performance Fortran (HPF) to Fortran-77 including array syntax, WHERE statements, FORALL statements, Fortran-90 array manipulation, query and reduction intrinsics, array-valued functions, array constructors, attributed declarations and numerous other F90 features.
Because the translator:
- Re-used temporary array variables (required far less memory at run-time),
- Created only LHS (left-hand-side) temporaries so that a five-point stencil assignment required only one temporary array instead of four,
- Internally used a fairly elegant and highly recursive method of representing most Fortran-90 constructs as FORALL statements,
- And used array decomposition and alignment information from HPF directives to select the best outermost loop generated from a sequence of FORALLs fused into the same loop nest,
APR was able to terminate a joint marketing arrangement with Kuck and Associates (saving $48K/year plus royalties) whose KAP precompiler had been used previously for the Fortran-90 to Fortran-77 translation. The translator is now marketed as zAPR
Thinking Machines Contracts
Implemented a "firewall" feature that freed the TMC CMAX programmer from worrying about whether an array resided on the FE (Front-End) or CM (Connection Machine) by creating routine wrappers for calls out of context and copying the needed arrays to (and from if necessary) their new locations ("homes" in TMC-lexicon) in memory.
Studied and wrote a 40-page report/proposal about the features of a CM-Fortran (CMF) to High Performance Fortran (HPF) translator and later implemented the deliverable CMF2HPF utility.
IBM Power-4 Contract Phase II
Designed a more precise (efficient) algorithm for controlling cache-flushing on the Power-4.
IBM Power-4 Contract Phase I
As part of a three-person team, I designed and implemented Fortran code restructuring for the IBM Power 4, a shared-memory (except for a small "local memory"), modestly parallel architecture with software-controlled cache flushing. The challenge was to delegate the minimum number of loop iterations possible to each processor that would still assure no cache-line conflicts for array and scalar variables written (assigned) within a parallelizable loop. When iterations were unknown at compile-time, a "red-black" ordering of loop iteration chunks was used.