Even though logistic regression is formulated with continuous input data in mind, one can also try to apply it to categorical inputs. For example, consider the following set-up: We observe \, n \, samples \, Y_ i \in \{ 0, 1\} \,, \, i = 1, \dots , n \,, and covariates \, X_ i \in \{ 0, 1\} \,, \, i = 1, \dots , n \,. Moreover, assume that given \, X_ i \,, the \, Y_ i \, are independent.

First, let us apply regular maximum likelihood estimation. To this end, write

\displaystyle f_{00} = {} \displaystyle \frac{1}{n} \# \{ i : X_ i = 0 \text { and } Y_ i = 0 \}
\displaystyle f_{01} = {} \displaystyle \frac{1}{n} \# \{ i : X_ i = 0 \text { and } Y_ i = 1 \}
\displaystyle f_{10} = {} \displaystyle \frac{1}{n} \# \{ i : X_ i = 1 \text { and } Y_ i = 0 \}
\displaystyle f_{11} = {} \displaystyle \frac{1}{n} \# \{ i : X_ i = 1 \text { and } Y_ i = 1 \}
and assume that \, f_{00}, f_{01}, f_{10}, f_{11} > 0 \,. We can parametrize this model in terms of

\displaystyle p_{01} = {} \displaystyle P(Y_ i = 1 | X_ i = 0)
\displaystyle p_{11} = {} \displaystyle P(Y_ i = 1 | X_ i = 1)
Compute the maximum likelihood estimators \, \widehat{p}_{01} \, and \, \widehat{p}_{11} \, for \, p_{01} \, and \, p_{11} \,, respectively. Express your answer in terms of f_{00} (enter “A"), f_{01} (enter “B"), f_{10} (enter “C"), f_{11} (enter “D") and n.

\widehat{p}_{01}
B/(A+B)
correct

\widehat{p}_{11}
D/(C+D)
correct

SaveSave your answer
Submit
You have used 2 of 3 attemptsSome problems have options such as save, reset, hints, or show answer. These options follow the Submit button.
(b)
0/2 points (graded)
Although the [mathjaxinline]\, X_ i \,[/mathjaxinline] are discrete, we can also use a logistic regression model to analyze the data. That is, now we assume

[mathjax]Y_ i | X_ i \sim \textsf{Ber}\left( \frac{1}{1 + \mathbf e^{-(X_ i \beta _1 + \beta _0})} \right),[/mathjax]
for [mathjaxinline]\, \beta _0, \beta _1 \in \mathbb {R} \,[/mathjaxinline], and that given [mathjaxinline]\, X_ i \,[/mathjaxinline], the [mathjaxinline]\, Y_ i \,[/mathjaxinline] are independent.

Calculate the maximum likelihood estimator [mathjaxinline]\, \widehat{\beta }_0 \,[/mathjaxinline], [mathjaxinline]\, \widehat{\beta }_1 \,[/mathjaxinline] for [mathjaxinline]\, \beta _0 \,[/mathjaxinline] and [mathjaxinline]\, \beta _1 \,[/mathjaxinline], where we again assume that all [mathjaxinline]\, f_{kl} > 0 \,[/mathjaxinline]. Express your answer in terms of [mathjaxinline]f_{00}[/mathjaxinline] (enter “A"), [mathjaxinline]f_{01}[/mathjaxinline] (enter “B"), [mathjaxinline]f_{10}[/mathjaxinline] (enter “C"), [mathjaxinline]f_{11}[/mathjaxinline] (enter “D") and [mathjaxinline]n[/mathjaxinline].

[mathjaxinline]\widehat{\beta }_{0}[/mathjaxinline]
(A+D)/(A+B+C+D)
incorrect
\(\)
[mathjaxinline]\widehat{\beta }_{1}[/mathjaxinline]

(B+D)/(A+B+C+D)

To calculate the maximum likelihood estimators for β₀ and β₁, we need to maximize the log-likelihood function. The log-likelihood function is given by:

L(β₀, β₁) = ∑[i=1 to n] (Yᵢ log(p) + (1 - Yᵢ) log(1 - p))

Where p = 1 / (1 + exp(-(Xᵢβ₁ + β₀)))

Taking the derivative of L(β₀, β₁) with respect to β₀ and β₁, and setting them to zero, we can solve for the maximum likelihood estimators.

Let's calculate the maximum likelihood estimators step-by-step:

1. Taking the derivative of L(β₀, β₁) with respect to β₀:

∂L/∂β₀ = ∑[i=1 to n] (Yᵢ - p)

Setting ∂L/∂β₀ to zero:

0 = ∑[i=1 to n] (Yᵢ - p)

Expanding p:

0 = ∑[i=1 to n] (Yᵢ - 1 / (1 + exp(-(Xᵢβ₁ + β₀))))

Rearranging the equation:

0 = ∑[i=1 to n] (Yᵢ - 1) - ∑[i=1 to n] 1 / (1 + exp(-(Xᵢβ₁ + β₀)))

Simplifying further:

0 = ∑[i=1 to n] (Yᵢ - 1) - n / (1 + exp(-β₀))

Rearranging again:

n / (1 + exp(-β₀)) = ∑[i=1 to n] (Yᵢ - 1)

Dividing both sides by n:

1 / (1 + exp(-β₀)) = (∑[i=1 to n] (Yᵢ - 1)) / n

Simplifying further:

exp(β₀) = 1 / ((∑[i=1 to n] (Yᵢ - 1)) / n - 1)

Taking the natural logarithm of both sides:

β₀ = ln(1 / ((∑[i=1 to n] (Yᵢ - 1)) / n - 1))

2. Taking the derivative of L(β₀, β₁) with respect to β₁:

∂L/∂β₁ = ∑[i=1 to n] (Xᵢ(Yᵢ - p))

Setting ∂L/∂β₁ to zero:

0 = ∑[i=1 to n] (Xᵢ(Yᵢ - p))

Expanding p:

0 = ∑[i=1 to n] (Xᵢ(Yᵢ - 1 / (1 + exp(-(Xᵢβ₁ + β₀)))))

Rearranging the equation:

0 = ∑[i=1 to n] (XᵢYᵢ - Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀))))

Simplifying further:

0 = ∑[i=1 to n] (XᵢYᵢ) - ∑[i=1 to n] (Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀))))

Rearranging again:

∑[i=1 to n] (Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀)))) = ∑[i=1 to n] (XᵢYᵢ)

Dividing both sides by n:

(1 / n) ∑[i=1 to n] (Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀)))) = (1 / n) ∑[i=1 to n] (XᵢYᵢ)

Simplifying further:

(1 / n) ∑[i=1 to n] (Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀)))) = f₁₁

where f₁₁ = (1 / n) ∑[i=1 to n] (XᵢYᵢ)

Therefore, the maximum likelihood estimator for β₁ is:

β₁ = (1 / n) ∑[i=1 to n] (Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀)))) / f₁₁

Unfortunately, the maximum likelihood estimator for β₀ cannot be expressed solely in terms of f_{kl}, n, and the given parameters.