CS100 M Spring 2001
Project 2: Getting Down to Numbers
Due Thursday, 2/08/2001

0. Objective

Completing all tasks in this assignment should teach you:

1. Structure is the essence of order

Topics: Structures, swapping.

Setup: You were given the mission of storing research data about two super secret substances, temporarily named A and B, which recent space explorers brought back to Earth. Because the lab experiments have not concluded yet, there is very little data available about them. All we know is summarized below:

A B
atomic weight 256.74 107.98
state liquid solid
color blue silver
transparency 100% 0%

Tasks:

Turn in the printout of your code and its output. Don't display any values other than A and B before and after the swap.

2. Eeeee!

Topics: Numerical accuracy, accumulation, vectors

Setup: The number e = 2.71828182845905... is one of the most important in mathematics. The value of ex can be expressed as the sum of an infinite series:

The notation n! represents the factorial of number n; n! = 1*2*3...*n, 0!=1. Use the Matlab function factorial to compute it.

We can use Matlab to compute an approximate value of ex, by adding up the first N terms (N being a given number) of this series.

Tasks:

Turn in the Matlab code that computes and displays the values of exact, approx1, approx2, exact-approx1, exact-approx2, approx1-approx2; also turn in the output and your comment on the results.

Important: To show more significant decimals make sure that you put format long at the beginning of your program (type help format in Matlab for more details). You can revert to showing fewer decimals at any time after you are done with this exercise by typing format short.

HINT: You may write loops, but it is possible to do this part without writing loops. The function sort should prove very helpful for this part.

Bonus:

Turn in the code for plotting the two graphs and the superimposed plots themselves.

3. The Double Helix

Topics: Loops, conditionals, strings, counting and matching

Setup: DNA (Deoxyribonucleic acid) is the basis of inheritance in the living world. Structurally, it consists of long sequences of complementary base sequences, and has the form of a coiled up double helix. DNA consists of four types of bases: adenine (A), guanine (G), cytosine (C) and thymine (T). In the double helix two bases "face" each other at each position. These bases must be complementary. A is only complementary to T, and vice versa; G is only complementary to C, and vice versa. Within one strand of the double helix the bases can follow in any order.

If one strand of DNA looks like this:

T-A-C-A-G-C-G-A-T-A-C-G-T-G (top sequence)

then its complementary strand must be the following:

A-T-G-T-C-G-C-T-A-T-G-C-A-C (bottom sequence).

In this problem you are a scientist who is given two strands of DNA of equal length, and who must determine the number of positions in which those strands are complementary. For example, strands T-A-C-G and A-T-A-C are complementary in the 3 positions [1 2 4], while strands T-A-C-G and T-A-G-G are complementary in only the single position 3. Use strings consisting of letters C, G, T, and A to encode the structure of DNA strands. Lower and upper case letters will be considered equivalent. For example, T-A-C-A could be encoded by any of the following strings: 'TACA', 'Taca', TaCa' (there are 24-3=13 other combinations of upper and lowercase that are equivalent).

Tasks:

Write a Matlab program that

Turn in the listing of the program, together with its output for input top sequence 'taCaGCgATAcgtg' and bottom sequence 'agTActTgATgtgC'.

HINT: Functions lower and/or upper might make this part of the assignment easier.

Bonus (but strongly recommended!): Print the top sequence and the bottom sequence one under the other, with complementary bases in the same position shown in upper case, and all the other bases shown in lower case. For example, given top sequence TACG and bottom sequence atac, print:
TAcG
ATaC

4. Small Life

Topics:Iteration, conditionals, arrays

Setup: Assume that you have an isolated bacterium floating around in a nutrient solution, where there are no impediments to free replication. This species of bacterium normally divides at the end of each hour of its life. After a bacterium divides, we end up with an "old" bacterium, which will divide at the end of the next hour, and a "newborn" one, which will divide only at the end of its second hour of life. For the length of the experiment no bacteria will die. Our isolated initial bacterium is "newborn."

The diagram below illustrates number and age distribution of bacteria at the END of each hour, for the first five hours of the experiment. Note that only the "big" (i.e. mature) bacteria are old enough to reproduce.

Tasks:

Write a program to

For the situation illustrated above, which corresponds to end of the N=5th hour, the number of bacteria is 5, while the array of their ages is [5 1 2 3 1]. Any permutation of the age array (e.g. [5 3 2 1 1] or [1 2 3 5 1]) is acceptable.

Hint: for each bacterium, you must age it and also, if it is mature, replicate it (producing a newborn bacterium).

2/02 Clarification

the replication process is instantaneous, so at the instant of birth,
there are bacteria that are 0 hours old.  do not compute/include/show
them in your code!

however, in case seeing them helps you understand what is going on,
here is the picture from above redrawn to show all bacteria:

time    all bacteria            bacteria you compute and show
        (shown by age)          (shown by age)
0       0
        |
1       1                       1
        |
2       2---->0                 2
        |      \
3       3->0    1               3 1       (in any order)
        |   \    \
4       4->0 1    2--->0        4 1 2     (in any order)
        |   \ \    \    \
5       5->0 1 2->0 3->0 1      5 1 2 3 1 (in any order)

5. Coincidental Questions: The Birthday Problem revisited

Topics: Arrays, loops, computer simulation, random numbers

Setup: Asking questions in lecture is very important. It is an opportunity for students to clear up any potential misunderstandings. It is an opportunity for the professor to get feedback on how well lecture is going. One reason students hesitate to ask question is they are worried others don't have the same question. We can try to estimate how valid this concern is. During past editions of CS100, some information has been collected about the number of students who had questions during lecture:

Class A Class B
Size of the class 200 270
Number who asked questions…
… once every five lectures 14 12
… once every other lecture 5 4
… once per lecture 7 3
… multiple times per lecture 2 4
Number who wanted to but didn't ask…
… once every five lectures 32 33
… once every other lecture 28 23
… once per lecture 19 13
… multiple times per lecture 26 18

Assumptions: Let's make the following simplifying assumptions:

Tasks:

Write a Matlab computer simulation to estimate how frequently students have the same question by modifying the code from the 2/1 lecture for the birthday-problem. For each class A and B, follow the steps outlined below:

Turn in your code, together with a printout of its output.

Example:

6. Submitting Your Work

Follow the submission guidelines stated on the Projects page for CS100M.