CS312 Lecture 1: Course Overview, Background on ML

What is CS312 About?

CS312 is the third programming course in the Computer Science curriculum, following CS100 and CS211. The primary goal of the course is to give students a firm foundation in the fundamental principles of programming and computer science. Consequently, CS312 covers a broad set of topics including

alternative programming paradigms (beyond imperative and object-oriented programming)
key data structures and algorithms
reasoning about program behavior and complexity
type systems and data abstraction
the design and implementation of programming languages

A major goal in CS312 is to teach you how to program well. Just about anyone can learn how to throw code together and get simple programs running, but it takes a deep understanding of the principles of computer science to write truly elegant and efficient programs with lasting value. We will try to give you that understanding and teach you some of the craft of programming as well. And practice makes perfect.

We use the Standard ML (SML) programming language throughout the course. SML is a modern functional programming language with an advanced type and module system. The course is not about programming in SML. Rather, SML provides a convenient framework in which we can achieve the objectives of the course. Like the object-oriented model of Java, the functional paradigm of SML is an important programming model with which all students should be familiar, as it underlies the core of almost any high-level programming language. In addition the SML type and module systems provide frameworks for ensuring code is modular, correct, reusable, and elegant. The lessons you learn in programming with SML will be applicable to other programming languages such as Java. By studying alternative ways to write programs, you will be better equipped to use, implement or even design future programming environments.

Another important reason we use SML is that it has a relatively clean and simple model that makes it easier to reason about the correctness of programs. Indeed, SML was one of the first major programming languages to have a formal semantic definition. In our studies, we will see that we can reason formally about the functional correctness of code, and also about the space, time, and other resources used in a computation.

Lectures and Recitations

Lectures are Tuesday and Thursday, 10:10 to 11am in Kimball B11. Recitations are Monday and Wednesday at four times. In Hollister 306: 12:20-1:10, 2:30-3:20, 3:35-4:25. In Thurston 202, 7:30-8:20. You are expected to attend both lectures and recitations. You may attend any recitation you want to.

Course Materials

There is no official textbook for the course. The following books are useful and on reserve at the Engineering Library:

The Little MLer, Matthias Felleisen and Daniel P. Friedman, MIT Press, 1998. ISBN 0 262 56114 X
ML for the Working Programmer, L. C. Paulson, 2nd ed., Cambridge Univ. Press, 2000. ISBN 0 521 56543 X
Elements of ML Programming, ML97 Edition, Jeffrey D. Ullman, Prentice Hall, 1998. ISBN 0 13 790387 1

Two convenient online sources that we will be using from time to time are:

Programming in Standard ML, Robert Harper
Notes on Programming in SML/NJ, Riccardo Pucella

Communication

Course web site

The course web site is at http://www.cs.cornell.edu/courses/cs312. You should keep a close eye on this web page. We will post announcements about the course there. The programming assignments will all be posted there too.

Newsgroup

The best way to reach the course staff is by posting questions or comments to the course newsgroup, cornell.class.cs312. There are many members of the course staff reading the newsgroup who can answer your questions. Read the guidelines on the web page for some tips about the newsgroup etiquette.

Email

For questions that would be inappropriate to post to the newsgroup, you can also reach the course staff by sending mail to cs312@cs.cornell.edu. The newsgroup is preferred, however.

Consulting hours

The TAs have regular office hours during the day, consultants have evening consulting hours. Office hours are on the web. Consulting hours are 7-10pm Sunday through Wednesday in Upson 304A, unless otherwise announced. The night before every project is due (not the night that it is due), we will hold extended consulting hours from 7pm-12 midnight. Consulting hours will not be held the day after a problem set is due.

Coursework

Problem Sets

The work in this class will consist of six problem sets. The first of these problem sets is already available on the course web site. It is due in one week: midnight next Tuesday. Some problems sets will have written exercises as well as programs to write. The written exercises will in general be due at 4pm on the due date.

Software

You can download a copy of SML of New Jersey from the course web site. This include the Emacs editing environment that you will use to interact with SML and do your programming and debugging.

We will have four sessions demoing this environment, Tuesday (today) and Wednesday at 7 and 8 PM in Upson B7 (basement). Keep your eye on the course web site for updates about the demos.

Prelims

There will be two prelims, March 7 and April 16, held in the evenings. Location to be announced.

Final

The final is May 14.

Background on ML

Our first order of business in this course is to learn how to use ML. Why learn another language?

We use a zillion different programming languages to communicate with machines and each other:

general purpose programming: Fortran, Lisp, Basic, C, Pascal, C++, Java, etc.
scripting: Visual Basic, awk, sed, perl, tcl, sh, csh, bash, REXX, Scheme, etc.
search: regular expressions, browser queries, SQL, etc.
display and rendering: PostScript, HTML, XML, VRML, etc.
hardware: CCS, VHDL, Esterelle
theorem proving and mathematics: Mathematica, Maple, Matlab, NuPRL, Coq
others?

Though there are only a handful of general-purpose languages that you will learn and use, you'll be learning and using special-purpose languages for the rest of your life. Even general-purpose languages come and go. Today, it's Java and C++. Yesterday, it was Pascal and C, before that Fortran and Lisp. Who knows what it will be like tomorrow? You have to learn how to learn new languages.

In addition, many projects will require that you build "little" languages for gluing things together.

Javascript grew out of a little language to make web pages interactive
protocols, like HTTP or TCP are little languages that allow devices to talk to one another
the command prompt of DOS handles a little shell language
search engines on the web accept queries in a little language
others?

We gain a lot of leverage by having good notation and good language support for a given domain.

perl is extremely useful for searching through documents because of its built-in support for regular expressions
SQL is a very high-level language that makes it easy to do database transactions in a scalable way.

So it's important to understand programming models and programming paradigms because in this fast changing field, you need to be able to rapidly adapt.

It's crucial that you understand the principles behind programming that transcend the specifics of today.

There's no better way to get at these principles than to approach programming from a completely different perspective.

This is one reason why we're using ML -- it's very different from what most of you will have seen.

A great general-purpose programming language:

lets you say things concisely and understandably at the right level of abstraction
lets you extend the language with new features that are specific to a domain but blend in well with the rest of the language.
makes it easy to write correct code, with good performance
makes it easy to change the code when you find out the specification has changed
makes it easy to re-use code
is easy to learn

Fact: there are thousands of general purpose languages.

Corollary: there are no great programming languages.

But there are some pretty good ones. Java and ML are pretty good general-purpose languages (at least when compared to their predecessors.)

SML is a functional programming language.

genealogically, it fits in to the Lisp, Scheme, Miranda, Hope, Haskell, etc. line of programming languages.
Lisp vs. FORTRAN: functional vs. imperative
the key linguistic abstraction of this family: programmers can build new functions
forms the core of almost any general-purpose language
casting everything in terms of functions has its benefits: uniform, simple
functions are first-class: you can pass them to other functions, return "new" functions from functions, put functions in data structures, compose new functions out of old ones, etc.
you don't need to build in loops (e.g., while-loops, for-loops, do-loops, iterators, etc.) because these can be coded easily using functions.
constructing models of and reasoning about functional languages is generally easier than for other languages (since you have to at least model the functional subset)
SML does support imperative programming, but doesn't encourage it.
SML is not object-oriented, although there are versions of ML

SML is a statically typed, type-safe programming language.

a type-safe language ensures that you don't apply the wrong operations to the wrong data.
In practice, this prevents a lot of silly errors (e.g., treating an integer as a function) and also prevents a lot of security problems -- over half of the reported break-ins at CERT were due to buffer overflows -- something that's impossible in a type-safe language.
Functional languages like Scheme and Lisp are type-safe, but dynamically typed. That is, type-errors are caught only at run-time.
C and C++ are statically typed but not type-safe. There's no guarantee that a type-error won't occur.
Java and SML are type-safe and statically typed. This means that most errors are caught before running the program.
Fact: statically determining whether a program will have a type-error is impossible.
Corollary: all statically-typed languages are conservative and may reject some programs that are perfectly okay.
A good statically-typed language rules out lots of bad code, while admitting lots of good code.

SML (and SML/NJ in particular) supports a number of advanced features.

garbage collection: as in Java, the automatic memory management of SML lifts the burden of having to worry about memory management -- a common source of bugs in languages such as C or C++.
type inference: you do not have to write type information down everywhere. The compiler automatically figures out most types. This makes the code a bit more terse which can make it easier to read and maintain. (But this is a double-edged sword. Too little type information can make code harder to read.)
parametric polymorphism: ML lets you write functions and data structures that can be used with any type. This is crucial for being able to re-use code. Java provides a form of subtype polymorphism which also lets you re-use code. We'll learn more about parametric and subtype polymorphism and their relative strengths and weaknesses in class.
algebraic datatypes: you can build sophisticated data structures in ML very easily, without fussing with pointers and memory management. Pattern matching makes them even more convenient.
exceptions, threads, and continuations: as in Java, SML/NJ supports exceptions and threads, which are crucial for building real systems. The thread model of SML/NJ is radically different from that of Java, however. In addition, SML/NJ supports continuations, which are an advanced control construct out of which you can build things like loops, exceptions, and threads.
advanced modules: SML makes it easy to structure large systems through the use of modules. Modules (called structures) are used to encapsulate implementations behind interfaces (called signatures). SML goes well beyond the functionality of most languages with modules by providing functions that manipulate modules (functors), module variables, multiple interfaces per module, and nested modules.

Some history

(see Paulson's book for more info):

Robin Milner and others at the Edinburgh (Scotland) Laboratory for Computer Science were working on theorem provers in the late '70s and early '80s.

Traditionally, theorem provers were implemented in languages such as Lisp.

Milner kept running into the problem that the theorem provers would sometimes put incorrect "proofs" (i.e., non-proofs) together and claim that they were valid.

So he tried to develop a language that only allowed you to construct valid proofs.

"ML" which stands for "Meta Language" was the result of his (and others') work. The type system of ML was carefully constructed so that you could only construct valid proofs in the language. A theorem prover was then written as a program that constructed a proof.

Milner also formulated the type-inference system of ML, and proved its soundness.

(It should be noted that Milner also worked on concurrent programming languages, such as CCS, CSP, and the pi-Calculus and later went to receive the Turing Award -- the computer science equivalent of a Nobel Prize -- in large part for his work on ML

Eventually, this Classic ML evolved into a full-fledged programming language.

In the early '80s, there was a schism in the ML community with the French on one side and the British and US on another. The French went on to develop CAML and later Objective CAML (O'caml) while the Brits and Americans developed Standard ML. The two languages are actually quite similar.

What is ML used for today?

theorem provers (e.g., NuPRL, HOL, Coq, etc.)
compilers (e.g., SML/NJ, O'caml, C-kit, Twelf, Lambda-Prolog, Pict, etc.)
mathematics
hardware verification
advanced protocols (Ensemble, Fox, PLAN)
financial systems
genealogical database
signal processing
bioinformatics
scripting
latex to HTML translation
smartcards

In truth, not a lot when compared to something like C, C++, or Java. ML's real strength lies in language manipulation (i.e., compilers, analyzers, verifiers, provers, etc.) This is not surprising since ML evolved from the domain of theorem proving.