due: Monday, September 9
Update 09/02: Clarified chess field format. It should be a letter and a digit
Update 09/06: Clarified intermediate steps for DFA simplification. Those should be the intermediate state partitions as shown in the lecture
You may find the Dot and Graphviz packages helpful for generating drawings of DFAs and other graphs. You can get these packages for multiple OSes from the Graphviz downloads page. An example of a DFA drawn using Graphviz may be useful.
Living cells on earth encode their genetic information in a chemical code made from repetitions of four basic compounds abbreviated as A, C, G, and T.
These constituents are arranged in long linear sequences of triplets (such as GCT, or TGA) referred to as codons. The meaning of a short sub-strip of such a sequence is often not immediately clear, since there are different frames in which to interpret the information.
For instance, the partial sequence CTTCTCCGCAATTTAC might specify the codons
where the question marks indicate missing data. Certain codons have a special meaning, which fortunately resolves the reading frame ambiguity: for example, the initiation codon ATG denotes the beginning of a gene-encoding region, and one of the three termination codons TAA, TAG, and TGA mark the end of such a region.
A commonly asked question is whether a strip of a sequence acquired from a biological sample contains one or more complete gene-coding regions. Design a nondeterministic finite automaton that can be used to identify such a subsequence based on an arbitrary reading frame. Ensure that the initiation and termination codons are aligned with respect to each other.
Create a deterministic version of the following finite automaton. Make sure you show the initial and terminal states. Label each DFA state with the set of NFA states to which it corresponds.
A popular way of encoding chess games is the Portable Game Notation (PGN), which is essentially a numbered sequence of moves in algebraic format with optional comments between moves. Write a regular expression that describes this format as closely as possible using the regular expression syntax discussed in class. You may ignore spaces and newlines. Since the resulting regular expression may become large, you should feel free to define pieces of it and later reference them. (be careful to ensure that your language is regular in this case.)
A (simplified) format description of PGN is given by the following set of rules:
For clarification purposes, here is a PGN encoding of a game between Robert Fischer and Boris Spassky.
1. e4 e5 2. Nf3 Nc6 3. Bb5 {This opening is called the Ruy Lopez.} a6 4. Ba4 Nf6 5. O-O Be7 6. Re1 b5 7. Bb3 d6 8. c3 O-O 9. h3 Nb8 10. d4 Nbd7 11. c4 c6 12. cxb5 axb5 13. Nc3 Bb7 14. Bg5 b4 15. Nb1 h6 16. Bh4 c5 17. dxe5 Nxe4 18. Bxe7 Qxe7 19. exd6 Qf6 20. Nbd2 Nxd6 21. Nc4 Nxc4 22. Bxc4 Nb6 23. Ne5 Rae8 24. Bxf7+ Rxf7 25. Nxf7 Rxe1+ 26. Qxe1 Kxf7 27. Qe3 Qg5 28. Qxg5 hxg5 29. b3 Ke6 30. a3 Kd6 31. axb4 cxb4 32. Ra5 Nd5 33. f3 Bc8 34. Kf2 Bf5 35. Ra7 g6 36. Ra6+ Kc5 37. Ke1 Nf4 38. g3 Nxh3 39. Kd2 Kb5 40. Rd6 Kc5 41. Ra6 Nf2 42. g4 Bd3 43. Re6 1/2-1/2