Given random variables X and Y on a sample space S, we can combine apply any of the normal operations of real numbers on X and Y by performing them pointwise on the outputs of X and Y. For example, we can define X+Y:S→R by (X+Y)(k)::=X(k)+Y(k). Similarly, we can define X2:S→R by (X2)(k)::=(X(k))2.
We can also consider a real number c as a random variable by defining C:S→R by C(k)::=c. We will use the same variable for both the constant random variable and for the number itself; it should be clear from context which we are referring to.
Claim: If X and Y are random variables, then E(X+Y)=E(X)+E(Y)
Proof: E(X+Y)=∑k∈S(X+Y)(k)Pr({k})by definition of E=∑k∈S(X(k)+Y(k))Pr({k})by definition of X+Y=(∑k∈SX(k)Pr({k}))+(∑k∈SY(k)Pr({k}))algebra=E(X)+E(Y)definition of E
Claim: If c∈R and X:S→R then E(cX)=cE(X).
Note: we are using the number c as a random variable as discussed above.
Proof: E(cX)=∑k∈S(cX)(k)Pr({k})=∑k∈Sc(X(k))Pr({k})=c∑k∈SX(k)Pr({k})=cE(X)
These two properties (E(X+Y)=E(X)+E(Y) and E(cX)=cE(X)) are summarized by saying "expectation is linear".
We can summarize the probability distribution of two random variables X and Y using a "joint PMF". The joint PMF of X and Y is a function from R×R→R and gives for any x and y, the probability that X=x and Y=y. It is often useful to draw a table:
Pr | y | ||
---|---|---|---|
1 | 10 | ||
x | 1 | 1/3 | 1/6 |
10 | 1/6 | 1/3 |
Note that the sum of the entries in the table must be one (Exercise: prove this). You can also check that summing the rows gives the PMF of Y, while summing the columns gives the PMF of X.
Recall that events A and B are independent if Pr(A∩B)=Pr(A)Pr(B). We say that random variables X and Y are independent if for all x,y∈R, the events (X=x) and (Y=y) are independent.
Example: Variables X and Y with the joint PMF given in the above table are not independent. For example, Pr(X=1∩Y=1)=1/3≠Pr(X=1)Pr(Y=1)=(1/2)⋅(1/2).
Informally, you can think of independence as indicating that knowing the value of one of the variables does not give any information about the value of the other. For example, height and weight are not independent, because knowing that someone is taller increases the likelyhood that they are heavier.
Example: variables X and Y with the following joint PMF are independent:Pr | y | ||
---|---|---|---|
1 | 10 | ||
x | 1 | 1/4 | 1/4 |
10 | 1/4 | 1/4 |
Given any x and y, Pr(X=x∩Y=y)=1/4. Moreover, Pr(X=x)=1/4+1/4=1/2 and Pr(Y=y)=1/4+1/4=1/2. Therefore Pr(X=x)Pr(Y=y)=1/4=Pr(X=x∩Y=y).
Unlike sums, the expectation of XY is not in general equal to E(X)E(Y). However, if X and Y are independent, they are equal.
Claim: If X and Y are independent random variables, then E(XY)=E(X)E(Y).
Proof: We will use the alternative definition of expectation given in the last lecture because it makes the proof a bit easier. In lecture, I started by simplifying the left-hand side to show it is equal to the RHS; here I simplify the right-hand side.
\begin{aligned} E(X)E(Y) &= \left(\sum_{x \in ℝ} x Pr(X = x)\right)\left(\sum_{y \in ℝ} y Pr(Y = y)\right) && \text{by alternative definition of $E$} \\ &= \sum_{x \in ℝ} \sum_{y \in ℝ} xy Pr(X = x) Pr(Y = y) && \text{algebra} \\ &= \sum_{x \in ℝ} \sum_{y \in ℝ} xy Pr(X = x \cap Y = y) && \text{independence of $X$ and $Y$} \\ \end{aligned}
We can group together the terms of this sum that have the same product xy. For example, suppose the joint PMF of x and y was the following:
Pr(X=x\cap Y=y) | y | ||||
---|---|---|---|---|---|
1 | 2 | 3 | 4 | ||
x | 1 | \cdots | \cdots | \cdots | 2/5 |
2 | \cdots | 1/8 | \cdots | \cdots | |
3 | \cdots | \cdots | \cdots | \cdots | |
4 | 1/4 | \cdots | \cdots | \cdots |
There would be three terms in the sum with xy = 4; the x = 1, y = 4 term, the x = 2, y = 2 term, and the x = 4, y = 1 term:
\begin{aligned} \sum_{x \in ℝ} \sum_{y \in ℝ} xy Pr(X = x \cap Y = y) = &\cdots + 1 \cdot 4 \cdot Pr(X = 1 \cap Y = 4) + {} \\ &\cdots + 2 \cdot 2 \cdot Pr(X = 2 \cap Y = 2) + {} \\ &\cdots + 4 \cdot 1 \cdot Pr(X = 4 \cap Y = 1) + \cdots \\ = &\cdots + 1 \cdot 4 \cdot (2/5) + 2 \cdot 2 \cdot (1/8) + 4 \cdot 1 \cdot (1/4) + \cdots \\ = &\cdots + 4 \cdot \left(\frac{2}{5} + \frac{1}{8} + \frac{1}{4}\right) + \cdots \\ = &\cdots + 4 \cdot Pr(XY = 4) \end{aligned}
We can combine these into a single term: 4 \cdot Pr(XY = 4). By grouping all of the terms with the same product, we convert the above expression to
\begin{aligned} E(X)E(Y) &= \sum_{x \in ℝ} \sum_{y \in ℝ} xy Pr(X = x \cap Y = y) && \text{from above} \\ &= \sum_{a \in ℝ} a Pr(XY = a) && \text{combining terms as just described} \\ &= E(XY) && \text{alternative definition of $E$} \end{aligned}
which is what we were trying to prove.
Another useful property of a random variable is its variance. The variance of X is a measure of how "spread out" the distribution is. If one selects an outcome k at random and computes the distance from X(k) to E(X), you are likely to get a large number if the distribution is very spread out, and a small number if the distribution is not very spread out.
This suggests the following definition (which is wrong):
Var(X) \stackrel{?}{=} E(X - E(X))
Next lecture, we will repair this definition.