This post will summarise history of the early programming languages as written in Knuth’s article “The early development of programming languages”. Most of the work in this post was done by him.

It is interesting to study the history of computer science, to see how many ideas which are now internalised and taken for granted were then formed in great pains. Even today most accounts on early computer languages skip from assembly languages to FORTRAN, which first appeared in a primitive form in 1954, leaving almost a decade of research in obscurity.

Throughout our journey I will present an implementation of the following simple program (called TPK) in each of those languages as it would have been implemented by their authors. Often when an author would have used subscript for their variables, e.g. $b_i$, I will write b[i] due to limitations of my typesetting engine.

Most languages won’t be able to express this program exactly. Some only handled integers, if so then we will assume that all functions take integers as input and produce them as output. For example sqrt(x) gives the largest integer which square is equal or less than x. If the language does not handle strings then it will output 999 instead of “TOO LARGE”. If the languages does not provide I/O at all then we will assume that input is stored in table a and output will be stored in b[0..21], where b[0], b[2], ... will store i values and odd indexes will store f(a[i]) or 999.

## Theoretical languages

### Plankalkül - Karl Zuse

Have you ever noticed a growth in your productiveness when your internet access is cut off? I guess that what must have happened to Konrad Zuse, German computer pioneer and engineer, who had been building a relay computer since 1936. Due to heavy Allied bombing of Germany in the ’40s he and his team had lost most of their equipment. He had miraculously saved a Z4 computer and moved to a small Alpine village where he continued his work, but couldn’t really do much with his computer. Therefore he spent his time on theoretical studies leading to design of Plankalkül, a high-level language for a computer, but without much regard whether that language would ever be implemented and how. His language was very comprehensive and contained a lot of ideas that would not appear for a long time afterwards. Using it’s high-level nature Zuse was able to write many complicated programs that were never before written. A notable, not purely algorithmic, example would be a program that played chess. Its source code took 49 pages of manuscript.

Our little algorithm would look like this:

Line 1 of this code is the declaration of a compound data type. This is one of the greatest strengths of the Plankalkül. None of the languages we will present had this perceptive notion of data. At the foundation of Zuse’s data system stands a single bit $SO$ whose value was either “-“ or “+”. Given any sequence of types we could define a compound data type $(\sigma_0, \ldots, \sigma_k)$. Arrays are also available and are defined as $m \times \sigma$ meaning array of size $m$ with elements of type $\sigma$. Furthermore $m$ could be $\square$ meaning a list of variable length. Integer variables are represented as $A9$ and floating points as $A \triangle 1$.

Lines 2 through 7 define function $f(t)$ and lines from 8 to 22 define the main program. In Zuse’s language each operation spans multiple lines, for example 11 through 14. The second line of each group identifies subscripts for quantities on the top line. Operations are done mostly on output variables $R_k$, input variables $V_k$ and intermediate variables $Z_k$. The K line denotes components of a variable so for example

denotes i’th component of $R_0$.

Zuse used $\Rightarrow$ symbol for assignment operation. Natural today, this was an important leap from mathematical thinking at the time, where most operations were represented by function composition. The language also has integer for-loops ($W2(11)$ on 11’th line) and conditionals $\underset{\cdot}{\rightarrow}$. Zuse made it a point to state mathematical relations between the variables, which we would now call invariants.

Many of Zuse’s ideas were far ahead of his times and would not appear until late ’50s or early ’60s, especially expressive and hierarchical data structures. The reasons for this are twofold. Firstly the hardware at the time wasn’t powerful enough to implement this language in an efficient manner. Zuse tried doing so:

but this project necessarily foundered the expense of implementing and designing compilers outstripped the resources of my small firm. [Konrad Zuse]

Lastly his work lived in relative obscurity and few language designers were aware of it.

### Flow diagrams

On the other side of the Atlantic, two famous mathematician’s, Goldstine and von Neumann, tackled the problem or representing algorithms in a precise and readable way. They used a pictorial representation, which would later be dubbed a flowchart. Despite some differences it should be familiar to modern computer scientists.

Flowcharts consisted of four parts. Operation boxes, marked with Roman numerals, contained operations in a form of assignments to memory locations. Alternative boxes, also marked with Roman numerals, represented a branch operation with two exits $+, -$ chosen based on the quantity in the box. Substitution boxes, marked with a # and using a $\rightarrow$ symbol, meant a change in notation. This is an important remark, they did not represent any physical machine operation. Lastly there were also assertion boxes connected to arrows with a dashed line, they contained invariants which were guaranteed by the algorithm.

The flowchart was meant to be a tool used in designing a correct program, which would later be translated to a machine code by a programmer. Therefore it could contain some curiosities coming from machine design. For instance, the example program uses scaling. It assumes that a machine contains 40-bit words which represent a number $% $, therefore to represent a different range it is necessary to scale it by multiplication by $2^{i}$.

Unlike Zuse’s Plankalkül, Flowchart was well-known, thanks to von Neumann’s prestigious name and their effort in typing and distributing their work among computer scientist at the time. Notice that it didn’t contain any idea of a function (our $f(t)$ function had to be inlined) or for-loops or different data types.

### Curry - a logician’s approach

Haskell B. Curry, whose name was used to name the Haskell language, worked in Naval Ordnance Laboratory and wrote two lengthy memoranda, which were never published, about his approach to program representation. His experience in writing large programs for ENIAC suggested him a more compact notation than Flowchart uses.

His aims, which today would be similar to aims of structural programming, were quire laudable:

The first step in planning the program is to analyse the computation into certain main parts, called here divisions, such that the program can be synthesised from them. Those main parts must be such that they, or at any rate some of them, are independent computations in their own right, or are modifications of such computations. [Haskell Curry]

Unfortunately in practice his proposal was not successful, because the way he factored the problem was not very natural and his solutions tended to be very complicated.

One of the complexities of this language is that components may have multiple entries and exits.

$\{E:x\}$ means compute value of E and put it into location x.

A denotes the accumulator of the machine.

$\{x = L(E)\}$ means compute value of E and substitute it into all appearances of x.

$X \rightarrow Y$ means substitute instruction group Y for the first exit of X.

$I_j, O_j$ denote the j’th entrance and exit of given instruction group.

$% y\} \rightarrow O_1 \& O_2 %]]>$ means if $x > y$ go to $O_1$, otherwise go to $O_2$.

$% $ means decrease $i$ by 1 and if $i \geq m$ then go to $O_1$ or else go to $O_2$.

What’s unique about his work is that he gave recursive description of a procedure to convert generic arithmetic expressions into a machine code. In other words he was the first person to describe the code-generation phase of a compiler. For example the program he would have generated for $F(t)$ would like this:

## Practical languages

### Short Code

The previously discussed languages were purely theoretical in nature. They were never implemented until recent times and existed purely as help in program design. It was the programmer’s job to translate them into machine code. We now turn our attention to languages which were practically used.

The title of the first high-level language to be actually implemented goes to Short Code. It was suggested by John W. Mauchly, the engineer behind ENIAC, in

1. It wasn’t an instant success, despite its historical significance, which is not surprising. Short code was implemented for UNIVAC, and there were very few UNIVAC users. Also short code was an algebraic interpreter, it interpreted the code at 50:1 performance ratio. In times when computer time was extremely valuable, while programmers were treated like Hitchcock treated his actors, this was unacceptable to many.

Let’s get to see some code:

Surprisingly it is very readable, but a few quirks need to be explained. First of the short code interpreter used coded representation form, a kind-of byte code, and UNIVAC used twelve 6-bit bytes. Therefore the line 01 had to be split into two. Also there were no subscripted variables, but it had a shift operation which performed a cyclic shift in a specified block of memory. For instance, line 07 means:

temp = T0, T0 = T1, ..., T9 = U0, U0 = temp;


### Intermediate Programming Language - Burks

Independently from Short Code, as many of this stuff was done, Arthur Burks and his colleagues tried to simplify the job of coding. Their effort was to transform “Ordinary Business English” description of a data-processing problem to the more precise “Internal Program Language”.

This has two principal advantages. First, smaller steps can more easily be mechanised than larger ones. Second, different kinds of work can be allocated to different stages of the process and to different specialists. [Arthur W. Burks]

Like Plankalkül it uses right-hand assignment.

The ‘ symbol that appears in line 30. meant that the computer was to save this intermediate result.

Again, even the author feels it is necessary to write an apology about its inefficiency and suggests that it might be useful for design.

It should be emphasized, however, that even if it were not efficient to use a computer to make the translation, the Intermediate PL would nevertheless be useful to the human programmer in planning and constructing programs. [Burks]

It is worth noting that Burks was a mathematician and he has developed a mathematical notation for writing programs, however, as most mathematicians do, he resigned after discovering huge reluctance to even use mathematical symbols.

I used to be mathematics professor. At that time I found there were a certain number of students who could not learn mathematics. I then was charge with a job of making it easy for businessmen to use our computers. I found it was not a question of whether they could learn mathematics or not, but whether they would. … They said, “Throw those symbols out – I do not know what they mean, I have not time to learn symbols.” I suggest a reply to those who would like data processing people to use mathematical symbols that they make them first attempt to teach those symbols to vice-presidents or a colonel or admiral. I assure you that I tried it.

### Rutishauser’s first compiler

As it is usually the case with Europe, it was more concerned with research than with business. Z4 relay computer, the same model that was built by Konrad Zuse, had been rebuilt and worked in the Swiss Federal Institute of Technology. Heinz Rutishauser was working with that machine, and although his main functions were not connected to programming, as he himself said:

I had to do other work – to run practically single-handed a fortunately slow computer as mathematical analyst programmer, operator and even troubleshooter. [Rutishauser]

he published a treatise in 1952, describing a hypothetical computer and a simple algebraic language together with two compilers for that language.

The language is restrictive. The only control structure it contains is a very crude for loop. Since there are no unconditional jumps, let alone if branches, to implement TPK’s decision Rutishauser’s code would have to use mathematical functions like Max, Sgn (lines 4 and 6). Another problem is that there is no easy mechanism to switch between indices and other variables. Indices were completely tied to Für-Ende loops. Therefore the example program invokes a trick on line 5. Z O[i] is intended to use the Z instruction which transfered an indexed address to the accumulator in Rutishauser’s machine.

As with Short Code, the source code had to be transliterated and the programmer had to allocate storage for variables and constants.

### Böhm dissertation - first compiler written in compiled language

Corrado Böhm, an Italian graduate student, developed his own machine, language and translation mechanism in 1950 and published in 1952. It must be noted that he had only known about Zuse’s and von Neumann’s results and had developed in complete independence from Rutishauser. Böhm’s distinctive contribution is that he wrote his compiler in his own language.

The language itself has a special elegance to it, because every operation is a form of an assignment. The compiler is also remarkable for its brevity. It took only 114 lines of code, 59 for decoding expressions with parentheses, 51 for decoding and 4 for deciding between those two cases. His parsing technique was also faster, it took around $\left(O(n)\right)$ time as opposed to Rutishauser’s $O\left(n^2\right)$

As said earlier every operation is an assignment. π is a program counter so B → π means go to B. Statement π' → B means this is label B. A loading routine preprocesses the code to set the value of this variable. The symbol ? is used for I/O in an obvious manner. ↓ is used for indirect addressing so ?→↓i means read input to memory location pointed by i.

Böhm machine only had non-negative integers of 14 decimal digits length. Therefore he used logician’s subtraction operator ∸:

Set intersection operator $\cap$ meant minimum operation. Although Böhm’s language had a goto operation, still there was no branching as in Rutishauser’s, therefore one had to resort to mathematical tricks to get it.

### AUTOCODE - first real compiler

Notice that so far we have seen no compiler that has actually been implemented on a real machine. Such a thing would be created by Alick E. Glennie for the Manchester Mark I machine in late 1950’s. Perhaps the reason why it was that this particular machine saw the first real compiler is due to its complexities and intricacies. Its machine language made programming by hand hard enough that an engineer though the benefits of a compiler outweigh the cost of performance. Here is how Glennie stated his motivations at a lecture in 1953:

The difficulty of programming has become the main difficulty in the use of machines. Aiken has expressed the opinion that the solution of this difficulty may be sought by building a coding machine, and indeed he has constructed one. However it has been remarked that there is no need to build a special machine for coding, since the computer itself, being general purpose, should be used. … To make it easy, one must make coding comprehensible. This may be done only by improving the notation of programming. Present notations have many disadvantages: all are incomprehensible to the novice, they are all different (one for each machine) and they are never easy to read. It is quite difficult to decipher coded programmes even with notes, and even if you yourself made the program several months ago.

As we can see the problems with readability and programmer’s short memory are ubiquitous.

A look at this code and the fact that this was done to simplify Mark 1’s machine language can give a sense of how complicated the machine language really was. This language is still very machine dependent. Lines 1-10 represent a subroutine for calculation of f. CLOSE WRITE 1 says the the preceding lines constitute subroutine number 1. WRITE 2 START 2 says that the preceding line constitute subroutine 2 and program should start its execution from that subroutine.

Programmer had to assign storage for variables, as with most previous languages, this is done in lines 1 and 11. Glennie shied from using constants so his language has been extended here with INTEGERS instruction. Originally there was only FRACTIONS for constants in range $\left[-\frac{1}{2}, \frac{1}{2}\right]$, but since scaling was so complicated it was omitted from the code.

At the beginning of subroutine 1 its argument is in the lower accumulator. Line 3 assigns it to variable t. Line 4 is an if branch and means: “Go to label Z if t is positive”. Line 5 puts -t in the accumulator. Line 6 defines label Z. Line 7 applies subroutine 6 (square root) to lower accumulator and stores it in z. On line 8 we compute the product of t by itself which fills both upper and lower accumulators and the upper part, which is assumed to be zero, is stored in y. Finally line 10 completes the calculation of $f(t)$ by leaving $z+5x$ in the accumulator. The CLOSE operator causes the compiler to forget the meaning of label Z, but the machine addresses of c, x, y, and z remain.

Line 11 introduces new storage assignments, and reassigns addresses of c and x. Line 13 through 18 form the input loop, n being an index register of Mark I; other indexes are: k, l, n, o, q, r. Loops are always done for decreasing values up to and including 0. Lines 14 through 16 set q to $20 - n$. The compiler recognised conversions between index and normal variables by insisting that all other algebraic statements begin with + or - sign. Line 17 says to store the result of subroutine 5 to $a_q$.

The main loop is between lines 20 and 31. Line 21 applies function $f(t)$ to $a_n$ any puts the result into accumulator. Lines 23,24,27, and 28 print the result. CONTROL X is an unconditional jump to label X.

ENTRY A CONTROL A define an infinite loop, this was a sort of dynamic stop used to terminate a computation in Mark 1.

That is all that is required to understand essentials in Glennie’s AUTOCODE. One noteworthy thing about his compiler is that according to him the loss of efficiency is in the range of 10%. Narrowing the performance gap between a compiler and human programmer.

Unfortunately his papers were never published due to secrecy of British atomic weapons project and few of Manchester’s users used his work. Also he wasn’t a resident at Manchester, so he couldn’t advertise well enough, and automatic programming wasn’t an issue of importance back then. In 1965 Glennie remarked:

[The compiler] was a successful but premature experiment. Two things I believe were wrong: (a) Floating-point hardware had not appeared. This meant that most of a programmer’s effort was in scaling his calculation, not in coding. (b) The climate of thought was not right. Machines were too slow and too small. It was a programmer’s delight to squeeze problems into the smallest space. …

I recall that automatic coding as a concept was not a novel concept in the early fifties. Most knowledgeable programmers knew of it, I think. It was a well known possibility, like the possibility of computers playing chess or checkers. … [Writing the compiler] was a hobby that I undertook in addition to my employers’ business: they learned about it afterwards. The compiler … took about three months of spare time activity to complete.

### The idea of compilers gets some traction

So by the early 1950’s we already have witnessed individual discoveries and inventions which brought interpreters, compilers and some basic structures like subroutines, control statements etc. People were aware that automatic programming existed (nobody called it compiling), but programmers were mostly shying away from it. Assemblers (programs which translate from assembly mnemonics) and libraries (mostly floating-point functions) were enough for most. Ability to express programs more succinctly was the main issue.

However the idea that programs could not just be interpreted, but also translated into machine coded was getting some traction and advocates. Grace Hopper was a particularly active spokesperson for automatic programming during the 1950’s and helped organise two conferences in 1954 and 1956 under the sponsorship of the Office of Naval Research. Although it must be mentioned that the contributions of Zuse, Curry, Burks, Mauchly, Böhm and Glennie were not mentioned at either symposium.

Today we would say the the biggest event at either symposium was the announcement of a system implemented by Laning and Zierler for the Whirlwind computer at M.I.T in 1954. Though you wouldn’t be able to tell by looking at the symposium’s proceedings.

We will skip detailed explanation of the language and rather focus on its impact and its features. First of the language was defined independently from the machine and publisheds manual were written for a novice for the first time.

Their language featured subroutines, basic control structures (no for though), subscripts for variables, operator precedence. They even included a mechanism for integrating a system of differential equations.

Unfortunately all those features turned out to be expensive and the resulting compiler did not optimise well enough, causing too many calls to slow memory subsystem (“the drum”) compared to a program done by hand. The authors reflect that the performance in such scenarios was ten to one. However the system was still used in case a fast solution was required.

### FORTRAN 0 - efficient and simple

In early 1954 John Backus assembled a group at IBM destined to work on improved systems of automatic programming. They have learned about Laning and Zierler system and even witnessed its operation. They realised that the big problem facing them was implementing a language with efficiency.

At that time, most programmers wrote symbolic machine instructions exclusively (some even used absolute octal or decimal machine instructions). Almost to a man, they firmly believed that any mechanical coding method would fail to apply that versatile ingenuity which each programmer felt he possessed and constantly needed in his work. Therefore, it was agreed, compilers could only turn out code which would be intolerably less efficient than human coding (intolerable, that is, unless that inefficiency could be buried under larger, but desirable, inefficiencies such as the programmed floating-point arithmetic usually required then). …

[Our development group] had one primary fear. After working long and hard to produce a good translator program, an important application might promptly turn up which would confirm the views of the sceptics: … its object program would run at half the speed of a hand-coded version. It was felt that such an occurrence, or several of them, would almost completely block acceptance of the system.

By the end of 1954, Backus’s group had specified “The IBM Mathematical FORmula TRANslating system, FORTRAN”. Almost all languages after that were named after acronyms. The aim of this specification was to provide a language that would combine the best of both worlds: easy of coding and efficiency.

The DO statement means do statements 3 through 8 and then go to 11. IF statement was originally only a two-way branch and the only acceptable relations were =, >, >=. On 5th line we see subroutines calls, their name had to be at least 3 characters long, conversely variable names could be take up to 2 characters, which was an improvement; in all previous languages variables took only a single letter. The GO TO statement did not mean an unconditional jump, but merely indicated a next iteration.

The FORTRAN 0 was also the first language to define its syntax rigorously.

1954 was the date specification and basic implementation release. It took the team 2.5 more years to complete the compiler.

### Brooker’s Autocode

Europe was also busy developing practical language implementation. R. Brooker, knowing about AUTOCODE limitations, introduced a much cleaner Autocode, which would be nearly machine independent and would use floating-point arithmetic. Unfortunately, probably due to focus on efficiency, there was no way for user to define functions, only one operation per line was permitted and there were few mnemonic names.

The language was actually realised in 1956 and the Autocode should be easily readable.

The instruction in parenthesis (j1) was not to be compiled, but rather executed immediately. Here it starts the program at label 1.

This language is not really high-level, but one has to admit that it looks cleanly, especially compared to its predecessor. Also its main focus was efficiency of compiled code, especially desire to keep the information flowing between the electrostatic high-speed memory and the drum.

The language was a huge success.

Since its completion in 1955 the Mark I Autocode has been used extensively for about 12 hours a week as the basis of a computing service for which customers write their own programs and post them to us. [Brooker]

### FORTRAN I - The end of our journey

During the ’50s the ongoing effort on FORTRAN was widely publicised and although still the were doubts whether such a thing was feasible the following quote perhaps sums up the attitude:

As far as automatic programming goes, we have given it some thought and in the scientific spirit we intend to try out FORTRAN when it is available. [Goldstein, 1956]

On October 1956 the first manual was published. Beautifully typeset and written, it announced that the hand-written programs will be as efficient as those done manually and the release of FORTRAN will happen in late 1956.

Obviously “Late 1956” was the date for someone sucked in a wormhole as most big software projects go, because the first release didn’t see the daylight until April 1957. It was bug-ridden, had unfinished important parts and often crashed during compilation. Eventually though, it was polished and it became clear that FORTRAN I was working. While there were still critics, who cautioned against using “automatic programming”, as they have decreased utility and efficiency, in reality the share of high-level programming would soar in short time.

FORTRAN I had some improvements from FORTRAN 0, mostly: ability to add comments, arithmetic statement functions, formats for I/O, return code (octal number displayed at the STOP instruction).

## Summary

This concludes our foray into the programming language archaeology. I have skipped some developments, like early declarative languages or what happened behind the Iron Curtain. We only got the look into the imperative languages path. The path that saw multiple rediscoveries of similar ideas and the struggle with the mindset at the time. An engineer or scientist had to somehow reconcile easiness of coding with the slow machine speed and do this while not knowing what would work and what wouldn’t, which ideas are worth pursuing and which aren’t. It was a very hard task since nobody had ever done this before and what they had seemed good enough then. When complacency is common it is difficult to innovate.