Homework 1: Lexical Analysis
due: Wednesday, January 31, in class
When writing regular expressions, you may use the syntax [^chars] to indicate the possibility of any character except the characters
listed in chars.
- Write a regular expression for http URLs. An http URL consists of four parts: the protocol (http://), the DNS name or the IP address of a host, an
optional port number, and the pathname for a file. For simplicity, let's assume:
- A DNS name is a list of non-empty alphabetical strings separated by periods.
- An IP address consists of four non-negative integers separated by periods.
- A port number is a positive integer following a colon. (e.g. ":8080")
- The pathname part is a unix-style absolute pathname. The allowed symbols
are letters, digits, period and slash. The sequence "//" is forbidden, i.e. no empty directory name. A pathname may end with a slash.
- A comment in the C language begins with the two-character sequence
"/*", followed by the body of the comment, and then the sequence
"*/". The body of the comment may not contain the sequence "*/",
although it may contain the "*" and "/" characters. Write regular
expressions for each of the following, or explain why there is no
regular expression defining it.
- exactly C comments.
- C comments, permitting a contained "*/" as long
as it is contained inside quotes.
- C comments, permitting nested comments.
- Appel, problem 2.3
- Appel, problem 2.5 (a,b)
- Appel, problem 2.6
- Appel, problem 2.9