Last night, I attended a “class reunion” of sorts, a reunion of the first Stanford CS108A/B class, Fundamentals of Computer Science, that Brian Reid taught in 1982-83. I was one of the teaching assistants. It was intended to be the beginning of an undergraduate Computer Science curriculum at Stanford. There still is a CS108 at Stanford, but it’s now called Object Oriented System Design. (Hmmm.)
Obviously a lot has changed in 30 years: personal computing, the internet, immense increases in computing power and storage, and so forth. But one thing that struck me about how much has changed is programming languages. I learned to program in BASIC in the 1970s. It was Wang 2200 BASIC, not one of the microcomputer-based BASICs of the time. Probably it was gratuitously different from them but fundamentally the same. From what I recall it had the following characteristics:
- The only types were numbers and strings and arrays of them.
- Variables were all globals.
- Variable names were A, A0 through A9, …, Z9 and corresponding string variables A$, A0$ through A9$, … Z9$. At least one could have both a numeric variable Q7 and a string variable Q7$. (I seem to recall some BASICs prohibiting that.) But I don’t think you could have both scalar and array variables with the same name.
- The control structures were IF, GOTO, GOSUB, and ON ERROR. You couldn’t put a statement in the IF statement, just a line number. There was no ELSE, so you had to GOTO around the then-block.
- Oh yeah, line numbers.
- GOSUB took line numbers. Wang BASIC had a variant of GOSUB that would pass parameters, but there were no return values from subroutines.
Despite all these limitations, we all had a lot of fun programming BASIC, didn’t we?
When I arrived at Stanford in the early 1980s, the programming language was Pascal. After BASIC, Pascal was a breath of fresh air. It had:
- The ability to give names for things, and for names to reside within scopes.
- The ability to put a group of statements into a begin/end block and use them in an if-then-else statement.
- Structured programming constructs like while, repeat, and case.
- It did have goto, but it was rarely used, and in fact I didn’t actually know how it worked. (The obvious cases worked as expected, of course, but what happens if you goto a label located in a nested scope? Or goto a label that’s in an outer scope but that’s not on your call stack?)
- Data structures!
- Local variables.
- User-defined types.
- A real boolean type.
- Dynamic memory allocation and pointers.
Compared to BASIC, Pascal was a huge step forward, freed from restrictive variable names and line numbers. Dynamic allocation of instances of user-defined types enabled one to create data structures like linked lists and trees, which were almost impossible in BASIC.
All the CS108 programming assignments were in Pascal. We were writing really big programs. They must have have been hundreds or even thousands of lines long. And as one of the TAs, I had to read a bunch of them.
People have dumped on Pascal a lot, unfairly so in my opinion. It had its share of problems, but it had huge advantages over BASIC, and I was also able to write some useful programs in Turbo Pascal on the PC in the mid 1980s.
Some of its problems did prove quite irritating though and possibly fatal. Among them are:
- A string is simply an array of char, and the length of an array is part of its type. This made string handling incredibly cumbersome. BASIC’s substring and appending operations were fluid by comparison. I always wrestled with Pascal strings and never understood why they were so difficult until, many years later, I read a commentary by one of the C guys (Kernighan or Ritchie) that pointed out Pascal’s mistake was including the length of an array in its type.
(I can’t find a reference to this though.)(See update below.)
- Inability to terminate a loop in the middle. The typical practice was to use boolean flags to handle loop termination (with an embedded if-test) but this was cumbersome and error-prone. Of course, you could goto out of a loop, but that was frowned upon.
- No short-circuit expression evaluation. This made loop termination even harder.
- Weird lookahead-based I/O.
- Lexical nesting.
UPDATE: In the comments, Joe Darcy pointed me to Kernighan’s article Why Pascal is Not My Favorite Language. It seems to be available in a bunch of places around the web. One particularly convenient place is here. Kernighan explains quite well not only the issue of array length, but also the problems with loop termination and lack of short-circuit evaluation. Money quote: “Early exits are a pain, almost always requiring the invention of a boolean variable and a certain amount of cunning.”
That point on lexical nesting deserves some further discussion. Lexical nesting is of course very useful: it allows certain details of internal data structures and code to be hidden from the outside. The problem with Pascal is that lexical nesting is the only means for creating large-scale program structures. (Of course some Pascal systems had separate compilation and libraries, but those were extensions.) Some of the toughest problems I helped students was with nesting problems. In one particular case I was trying to diagnose a compiler error where a variable wasn’t declared. Those were usually pretty easy: just look up towards the top of the program to find where the missing declaration needed to go. In this particular case there was a declaration that seemed to be in the right place. I was mystified.
After a lot of analysis, we discovered that the nesting was off. The begins and ends were balanced, but they were shifted in a way that caused an entire section of the program to be nested a level deeper than it seemed it was. Thus the declaration was in a nested scope and wasn’t visible where it was needed. The problem was that this was probably a 20- or 30-page program. The declaration was at one point, the compiler error (undeclared variable) was at a different point very far away, and the actual error (misplaced begin) was in still another location, also far away from the other two. Thus, to diagnose the problem, one had to read and understand the structure of the entire program.
Not long after I TA’d CS108, I started working for Brian in the Stanford Computer Systems Lab. This was a Unix/C shop, so I quickly switched from Pascal to C. Reminiscing last night, Brian said Pascal was like a nanny who always was saying “No, you can’t do that!” Programming in C was like Pascal with the training wheels taken off. (I’m sure I stole that line from somewhere.) Sure, there were weirdnesses (the * pointer dereference is a prefix operator? What’s with this weird array/pointer duality?) But most of the hassles of everyday programming in Pascal were gone:
- The break statement.
- Short-circuit expression evaluation.
- Reasonable library-based I/O.
- Reasonable library-based memory allocation.
- Files-as-modules program structure.
- Unchecked array length (length not part of array type).
This last is of course a blessing and a curse. As in Pascal, a C string is (mostly) just an array of char, but of variable length. This made string processing much easier. Actual string length was determined by a trailing zero byte. Of course, since the language didn’t keep track of the actual array capacity, programs would have to keep track of it themselves. Or not. But the programs worked anyway. Usually. This gave rise to a bunch of sloppy programming practices, leading to memory smashes, buffer overruns, security breaches, and so forth.
Nevertheless, C has proven to be incredibly useful and is still one of the most popular programming languages today. It’s important to understand, though, that the C we used in the 1980s (“K&R C”) isn’t the same C we program in today. I’ll have more to say about that in a separate article.