What is CS 3110 About?

Course overview information can be found here.

Background on ML

Our first order of business in this course is to learn how to use ML. Why learn another language?

We use a zillion different programming languages to communicate with machines and each other:

general purpose programming: Fortran, Lisp, Basic, C, Pascal, C++, Java, etc.
scripting: Visual Basic, awk, sed, perl, tcl, sh, csh, bash, REXX, Scheme, etc.
search: regular expressions, browser queries, SQL, etc.
display and rendering: PostScript/PDF, HTML, XML, VRML, etc.
hardware: CCS, VHDL, Esterelle
mathematics: Mathematica, Maple, Matlab
others?

Though there are only a handful of general-purpose languages that you will learn and use, you'll be learning and using special-purpose languages for the rest of your life. Even general-purpose languages come and go. Today, it's Java and C++. Yesterday, it was Pascal and C, before that Fortran and Lisp. Who knows what it will be like tomorrow? You have to learn how to learn new languages.

In addition, some projects will require that you build "little" languages for gluing things together.

Javascript grew out of a little language to make web pages interactive
protocols, like HTTP or TCP are little languages that allow devices to talk to one another
the command prompt of DOS handles a little shell language
search engines on the web accept queries in a little language
others?

We gain a lot of leverage by having good notation and good language support for a given domain.

perl is extremely useful for searching through documents because of its built-in support for regular expressions
SQL is a very high-level language that makes it easy to do database transactions in a scalable way.

So it's important to understand programming models and programming paradigms because in this fast changing field, you need to be able to rapidly adapt.

It's crucial that you understand the principles behind programming that transcend the specifics of today.

There's no better way to get at these principles than to approach programming from a completely different perspective.

This is one reason why we're using ML -- it's different from what most of you will have seen.

A great general-purpose programming language:

lets you say things concisely and understandably at the right level of abstraction
has good support for functional-style programming -- programs without the use of state or assignment
supports paradigms that are widely used in concurrent and massively parallel programming such as map-reduce
lets you extend the language with new features that are specific to a domain but blend in well with the rest of the language.
makes it easy to write correct code, with good performance
makes it easy to change the code when you find out the specification has changed
makes it easy to re-use code
is easy to learn

Fact: there are thousands of general purpose languages.

Corollary: there are no great programming languages.

But there are some pretty good ones. Java and ML are pretty good general-purpose languages (at least when compared to their predecessors.)

ML is a functional programming language.

genealogically, it fits in to the Lisp, Scheme, Miranda, Hope, Haskell, etc. line of programming languages.
Lisp vs. FORTRAN: functional vs. imperative
the key linguistic abstraction of this family: programmers can build new functions
forms the core of almost any general-purpose language
all computation is done with functions, no state (locations with mutable values), which has the benefits of being uniform and simple
functions are first-class: you can pass them to other functions, return "new" functions from functions, put functions in data structures, compose new functions out of old ones, etc.
you don't need many special constructs in the language, such as for iteration (e.g., while-loops, for-loops, do-loops, iterators, etc.) because these can be coded easily using functions (uniformity).
constructing models of and reasoning about functional languages is generally easier than for other languages
ML does support imperative programming, but doesn't encourage it. Though initially we will use a subset that is (nearly) purely functional.
Some versions of ML, such as OCaml are object-oriented.

OCaml is a statically typed, type-safe programming language.

a type-safe language ensures that you don't apply the wrong operations to the wrong data (e.g., dividing two strings).
In practice, this prevents a lot of silly errors (e.g., treating an integer as a function) and also prevents a lot of security problems -- over half of the reported break-ins at CERT were due to buffer overflows -- something that's impossible in a type-safe language.
Functional languages like Scheme and Lisp are type-safe, but dynamically typed. That is, type-errors are caught only at run-time.
C and C++ are statically typed but not type-safe. There's no guarantee that a type-error won't occur.
Java and OCaml are type-safe and statically typed. This means that most errors are caught before running the program.
Fact: statically determining whether a program will have a type-error is impossible.
Corollary: all statically-typed languages are conservative and may reject some programs that are perfectly okay.
A good statically-typed language rules out lots of bad code, while admitting lots of good code.

ML (and OCaml in particular) supports a number of advanced features.

garbage collection: as in Java, the automatic memory management of OCaml lifts the burden of having to worry about memory management -- a common source of bugs in languages such as C or C++.
type inference: you do not have to write type information down everywhere. The compiler automatically figures out most types. This makes the code a bit more terse which can make it easier to read and maintain. (But this is a double-edged sword. Too little type information can make code harder to read.)
parametric polymorphism: ML lets you write functions and data structures that can be used with any type. This is crucial for being able to re-use code. Java provides a form of subtype polymorphism which also lets you re-use code. We'll learn more about parametric and subtype polymorphism and their relative strengths and weaknesses in class.
algebraic datatypes: you can build sophisticated data structures in ML very easily, without fussing with pointers and memory management. Pattern matching makes them even more convenient.
exceptions and threads: as in Java, OCaml supports exceptions and threads, which are crucial for building real systems.
advanced modules: ML makes it easy to structure large systems through the use of modules. OCaml has a module language that is used to encapsulate implementations behind interfaces. ML goes well beyond the functionality of many languages with modules by providing functions that manipulate modules (functors).

Some history

(see Paulson's book for more info):

Robin Milner and others at the Edinburgh (Scotland) Laboratory for Computer Science were working on theorem provers in the late '70s and early '80s.

Traditionally, theorem provers were implemented in languages such as Lisp.

Milner kept running into the problem that the theorem provers would sometimes put incorrect "proofs" (i.e., non-proofs) together and claim that they were valid.

So he tried to develop a language that only allowed you to construct valid proofs.

"ML" which stands for "Meta Language" was the result of his (and others') work. The type system of ML was carefully constructed so that you could only construct valid proofs in the language. A theorem prover was then written as a program that constructed a proof.

Milner also formulated the type-inference system of ML, and proved its soundness.

(It should be noted that Milner also worked on concurrent programming languages, such as CCS, CSP, and the pi-Calculus and later went to receive the Turing Award -- the computer science equivalent of a Nobel Prize -- in large part for his work on ML

Eventually, this Classic ML evolved into a full-fledged programming language.

In the early '80s, there was a schism in the ML community with the French on one side and the British and US on another. The French went on to develop CAML and later Objective CAML (O'caml) while the Brits and Americans developed Standard ML. The two languages are actually quite similar.

What is ML used for today?

theorem provers (e.g., NuPRL, HOL, Coq, etc.)
compilers (e.g., SML/NJ, O'caml, C-kit, Twelf, Lambda-Prolog, Pict, etc.)
mathematics
hardware verification
advanced protocols (Ensemble, Fox, PLAN)
financial systems
genealogical database
signal processing
bioinformatics
scripting
latex to HTML translation
smartcards

There's a nice paper about using ML (OCaml) in the financial industry (must be accessed from inside Cornell): Minsky et al.. It explains how the features of ML make it a good choice for quickly building complex software that works.

ML is used for a variety of purposes, but it's nowhere near as popular as C, C++, and Java. ML's real strength lies in language manipulation (i.e., compilers, analyzers, verifiers, provers, etc.) This is not surprising since ML evolved from the domain of theorem proving.