Let (X,Y) be a pair of random variables for which the regression function \nu (x) = \mathbb E[Y | X = x] takes the form

Question

Let (X,Y) be a pair of random variables for which the regression function \nu (x) = \mathbb E[Y | X = x] takes the form

\nu (x) = a + bx

for some pair of real numbers (a,b).

What is a random variable \hat{Y} that is a function of X that minimizes

\mathbb E\left[ (Y - \hat{Y})^2 | X = x \right]

over all possible choices of \hat{Y} and for all x? Enter your answer in terms of a, b and the random variable X (capital letter “X").

(Remark: for a clean, quick solution, it may be helpful to review the law of iterated expectations: \mathbb E_{X,Y}[\cdot ] = \mathbb E_{X}[\mathbb E_{Y}[\cdot \; |\; X]], where \mathbb E_{Y}[\cdot |X] denotes the conditional expectation, which is a random variable. Use the insight from the previous exercise.)

Answer 1

Using the law of iterated expectations, we can express the mean squared error as:

\begin{align*}
\mathbb{E}[(Y - \hat{Y})^2|X=x] &= \mathbb{E}[(Y - (\hat{a} + \hat{b}x))^2|X=x] \\
&= \mathbb{E}[(Y - \nu(x) + \nu(x) - (\hat{a} + \hat{b}x))^2|X=x] \\
&= \mathbb{E}[(Y - \nu(x))^2|X=x] + 2\mathbb{E}[(Y - \nu(x))(\nu(x) - (\hat{a} + \hat{b}x))|X=x] + \mathbb{E}[(\nu(x) - (\hat{a} + \hat{b}x))^2|X=x]
\end{align*}

The first term is the unconditional mean squared error at x, which does not depend on our choice of hat{Y}. The third term is the squared difference between the true regression function and our chosen hat{Y}. To minimize the mean squared error, we want to choose hat{Y} such that the second term is 0:

\begin{align*}
0 &= \mathbb{E}[(Y - \nu(x))(\nu(x) - (\hat{a} + \hat{b}x))|X=x] \\
&= \mathbb{E}[(Y - \nu(x))\nu(x)|X=x] - \mathbb{E}[(Y - \nu(x))(\hat{a} + \hat{b}x)|X=x] \\
&= \mathbb{E}[(Y - \nu(x))\nu(x)|X=x] - (\hat{a}\mathbb{E}[(Y - \nu(x))|X=x] + \hat{b}\mathbb{E}[(Y - \nu(x))x|X=x])
\end{align*}

Expanding the first term using the law of iterated expectations again:

\begin{align*}
\mathbb{E}[(Y - \nu(x))\nu(x)|X=x] &= \mathbb{E}[\mathbb{E}[(Y - \nu(x))\nu(x)|X,Y]|X=x] \\
&= \mathbb{E}[\nu(x)(\mathbb{E}[Y|X,Y=x] - \nu(x))|X=x] \\
&= \mathbb{E}[\nu(x)(Y - \nu(x))|X=x]
\end{align*}

Therefore, we have:

\begin{align*}
0 &= \mathbb{E}[\nu(x)(Y - \nu(x))|X=x] - (\hat{a}\mathbb{E}[(Y - \nu(x))|X=x] + \hat{b}\mathbb{E}[(Y - \nu(x))x|X=x]) \\
&= \mathbb{E}[\nu(x)(Y - \nu(x))|X=x] - \hat{a}\mathbb{E}[(Y - \nu(x))|X=x] - \hat{b}\mathbb{E}[(Y - \nu(x))x|X=x]
\end{align*}

Now, using the fact that nu(x) = a + bx:

\begin{align*}
0 &= \mathbb{E}[(a + bx)(Y - (a + bx))|X=x] - \hat{a}\mathbb{E}[(Y - (a + bx))|X=x] - \hat{b}\mathbb{E}[(Y - (a + bx))x|X=x] \\
&= \mathbb{E}[(a + bx)(Y - (a + bx))|X=x] - \hat{a}\mathbb{E}[Y - (a + bx)|X=x] - \hat{b}\mathbb{E}[(Y - (a + bx))x|X=x] \\
&= \mathbb{E}[(a + bx)(Y - (a + bx))|X=x] - \hat{a}\mathbb{E}[Y|X=x] + \hat{a}(a + bx) - \hat{b}\mathbb{E}[Yx|X=x] + \hat{b}(a + bx)x \\
&= \mathbb{E}[(a + bx)(Y - (a + bx))|X=x] - \hat{a}\nu(x) + \hat{a}(a + bx) - \hat{b}\mathbb{E}[Yx|X=x] + \hat{b}(a + bx)x \\
&= \mathbb{E}[(a + bx)(Y - (a + bx))|X=x] - \hat{a}\nu(x) + \hat{a}a + \hat{b}ax - \hat{b}\mathbb{E}[Yx|X=x] + \hat{b}ax + \hat{b}bx^2
\end{align*}

Now, taking expectations with respect to X on both sides:

\begin{align*}
0 &= \mathbb{E}[\mathbb{E}[(a + bx)(Y - (a + bx))|X]|X=x] - \hat{a}\mathbb{E}[\nu(x)|X=x] + \hat{a}a + \hat{b}\mathbb{E}[ax|X=x] - \hat{b}\mathbb{E}[\mathbb{E}[Yx|X]|X=x] + \hat{b}\mathbb{E}[ax|X=x] + \hat{b}\mathbb{E}[bx^2|X=x] \\
&= \mathbb{E}[(a + bx)(\mathbb{E}[Y - (a + bx)|X]|X=x] - \hat{a}\nu(x) + \hat{a}a + \hat{b}ax - \hat{b}\mathbb{E}[Yx|X=x] + \hat{b}ax + \hat{b}\mathbb{E}[x^2|X=x] \\
&= (a + bx)(\mathbb{E}[Y - (a + bx)|X]|X=x] - \hat{a}\nu(x) + \hat{a}a + \hat{b}ax - \hat{b}\mathbb{E}[Yx|X=x] + \hat{b}ax + \hat{b}\mathbb{E}[x^2|X=x] \\
&= a\mathbb{E}[Y - (a + bx)|X=x] + b\mathbb{E}[bx|X=x] - \hat{a}\nu(x) + \hat{a}a + \hat{b}ax - \hat{b}\mathbb{E}[Yx|X=x] + \hat{b}ax + \hat{b}\mathbb{E}[x^2|X=x] \\
&= a\mathbb{E}[Y|X=x] - a\mathbb{E}[a + bx|X=x] + b^2\mathbb{E}[x^2|X=x] - \hat{a}\nu(x) + \hat{a}a + \hat{b}ax - \hat{b}\mathbb{E}[Yx|X=x] + \hat{b}ax + \hat{b}\mathbb{E}[x^2|X=x] \\
&= abx - a^2 - abx + b^2x^2 - \hat{a}\nu(x) + \hat{a}a + \hat{b}ax - \hat{b}\mathbb{E}[Yx|X=x] + \hat{b}ax + \hat{b}\mathbb{E}[x^2|X=x] \\
&= b^2x^2 - \hat{a}\nu(x) + \hat{a}a + \hat{b}ax - \hat{b}\mathbb{E}[Yx|X=x] + \hat{b}ax + \hat{b}\mathbb{E}[x^2|X=x] \\
&= b^2x^2 - \hat{a}\nu(x) + \hat{a}a + 2\hat{b}ax + \hat{b}\mathbb{E}[x^2|X=x] - \hat{b}\mathbb{E}[Yx|X=x] \\
&= (\hat{b}\mathbb{E}[x^2|X=x} - \hat{a})x + b^2x^2 - \hat{a}\nu(x) + \hat{a}a - \hat{b}\mathbb{E}[Yx|X=x]
\end{align*}

Since this equation holds for all x, we must have:

\begin{align*}
\hat{b}\mathbb{E}[x^2|X=x} - \hat{a} &= 0 \\
\hat{a}a &= \hat{b}\mathbb{E}[Yx|X=x] - b^2x^2 + \hat{a}\nu(x)
\end{align*}

Solving the first equation for hat{a} in terms of hat{b} gives:

\begin{align*}
\hat{a} &= \hat{b}\mathbb{E}[x^2|X=x}
\end{align*}

Substituting this into the second equation gives:

\begin{align*}
\hat{a}a &= \hat{b}\mathbb{E}[Yx|X=x] - b^2x^2 + \hat{a}\nu(x) \\
\hat{b}\mathbb{E}[x^2|X=x}a &= \hat{b}\mathbb{E}[Yx|X=x] - b^2x^2 + \hat{b}\mathbb{E}[x^2|X=x}\nu(x) \\
0 &= \hat{b}\mathbb{E}[Yx|X=x] - b^2x^2 + \hat{b}\mathbb{E}[x^2|X=x}\nu(x) \\
\hat{b}\left(\mathbb{E}[Yx|X=x] - b^2x^2 + \mathbb{E}[x^2|X=x}\nu(x)\right) &= 0
\end{align*}

Since this equation holds for all x, we must have:

\begin{align*}
\hat{b} &= 0 \quad \text{or} \quad \mathbb{E}[Yx|X=x] - b^2x^2 + \mathbb{E}[x^2|X=x}\nu(x) = 0
\end{align*}

If we choose hat{b} = 0, then hat{a} can take any value and we find \hat{Y} = hat{a}. However, if we choose hat{b} != 0, then we have:

\begin{align*}
\mathbb{E}[Yx|X=x] - b^2x^2 + \mathbb{E}[x^2|X=x}\nu(x) &= 0 \\
\mathbb{E}[Yx|X=x] &= b^2x^2 - \mathbb{E}[x^2|X=x}\nu(x)
\end{align*}

Therefore, we see that the random variable hat{Y} that minimizes the mean squared error is given by:

\begin{align*}
\hat{Y} &= \begin{cases}
\hat{a} & \text{if } \hat{b} = 0 \\
b^2X^2 - \mathbb{E}[X^2|X=X]\nu(X) & \text{if } \hat{b} \neq 0
\end{cases}
\end{align*}

where \hat{a} can take any value.

Answer 2

To find the random variable that minimizes the expression, we can use the law of iterated expectations.

Using the law of iterated expectations, we have:

E[(Y - 𝑌̂)² | X = x] = E[E[(Y - 𝑌̂)² | X, Y] | X = x]

Now let's expand the inner expectation:

E[(Y - 𝑌̂)² | X, Y] = (Y - 𝑌̂)²

We want to minimize E[(Y - 𝑌̂)² | X = x] for all x, so we need to minimize (Y - 𝑌̂)² for each given value of X = x.

Now let's consider the conditional expectation E[(Y - 𝑌̂)² | X], denoted as 𝔈(X). 𝔈(X) is a random variable that depends on X. Our goal is to find the value of 𝔈(X) that minimizes the expression for each given value of X.

To find 𝔈(X), we can rewrite (Y - 𝑌̂)² in terms of X and 𝔈(X):

(Y - 𝑌̂)² = (Y - (a + bX))²

By expanding and rearranging, we get:

(Y - 𝑌̂)² = (Y - a - bX)²
= (Y - a - bX)(Y - a - bX)
= Y² - 2aY - 2bXY + a² + 2abX + b²X²

Now let's take the conditional expectation of this expression given X:

𝔈[(Y - 𝑌̂)² | X] = 𝔈[Y² - 2aY - 2bXY + a² + 2abX + b²X² | X]

Using linearity of expectation, we can break this expression down:

𝔈(Y² | X) - 2a𝔈(Y | X) - 2b𝔈(XY | X) + a² + 2ab𝔈(X | X) + b²𝔈(X² | X)

Since 𝔈(Y | X) = ν(X) = a + bX (given), 𝔈(XY | X) = X𝔈(Y | X) = X(a + bX) = aX + bX², 𝔈(X | X) = X, and 𝔈(X² | X) = X², we can substitute these values:

𝔈[(Y - 𝑌̂)² | X] = 𝔈(Y² | X) - 2a(a + bX) - 2b(aX + bX²) + a² + 2abX + b²X²
= 𝔈(Y² | X) - 2a² - 2abX - 2abX - 2b²X² + a² + 2abX + b²X²
= 𝔈(Y² | X) - a² - 2abX - b²X²

Now we want to minimize this expression. Since 𝔈(Y² | X) is a random variable that depends on X, we need to find the value of 𝔈(Y² | X) that minimizes the expression for each given value of X.

To minimize 𝔈[(Y - 𝑌̂)² | X], we set 𝔈(Y² | X) - a² - 2abX - b²X² = 0 for all X.

Therefore, the random variable 𝑌̂ that minimizes 𝔈[(Y - 𝑌̂)² | X = x] for all x is:

𝑌̂ = 𝔈(Y² | X) - a² - 2abX - b²X²

In terms of a, b, and the random variable X, the answer is 𝑌̂ = 𝔈(Y² | X) - a² - 2abX - b²X².