Even though logistic regression is formulated with continuous input data in mind, one can also try to apply it to categorical inputs. For example, consider the following set-up: We observe \, n \, samples \, Y_ i \in \{ 0, 1\} \,, \, i = 1, \dots , n \,, and covariates \, X_ i \in \{ 0, 1\} \,, \, i = 1, \dots , n \,. Moreover, assume that given \, X_ i \,, the \, Y_ i \, are independent.

Question

Even though logistic regression is formulated with continuous input data in mind, one can also try to apply it to categorical inputs. For example, consider the following set-up: We observe \, n \, samples \, Y_ i \in \{ 0, 1\} \,, \, i = 1, \dots , n \,, and covariates \, X_ i \in \{ 0, 1\} \,, \, i = 1, \dots , n \,. Moreover, assume that given \, X_ i \,, the \, Y_ i \, are independent.

First, let us apply regular maximum likelihood estimation. To this end, write

\displaystyle f_{00} = {} \displaystyle \frac{1}{n} \# \{ i : X_ i = 0 \text { and } Y_ i = 0 \}
\displaystyle f_{01} = {} \displaystyle \frac{1}{n} \# \{ i : X_ i = 0 \text { and } Y_ i = 1 \}
\displaystyle f_{10} = {} \displaystyle \frac{1}{n} \# \{ i : X_ i = 1 \text { and } Y_ i = 0 \}
\displaystyle f_{11} = {} \displaystyle \frac{1}{n} \# \{ i : X_ i = 1 \text { and } Y_ i = 1 \}
and assume that \, f_{00}, f_{01}, f_{10}, f_{11} > 0 \,. We can parametrize this model in terms of

\displaystyle p_{01} = {} \displaystyle P(Y_ i = 1 | X_ i = 0)
\displaystyle p_{11} = {} \displaystyle P(Y_ i = 1 | X_ i = 1)
Compute the maximum likelihood estimators \, \widehat{p}_{01} \, and \, \widehat{p}_{11} \, for \, p_{01} \, and \, p_{11} \,, respectively. Express your answer in terms of f_{00} (enter “A"), f_{01} (enter “B"), f_{10} (enter “C"), f_{11} (enter “D") and n.

\widehat{p}_{01}
B/(A+B)
correct

\widehat{p}_{11}
D/(C+D)
correct

SaveSave your answer
Submit
You have used 2 of 3 attemptsSome problems have options such as save, reset, hints, or show answer. These options follow the Submit button.
(b)
0/2 points (graded)
Although the [mathjaxinline]\, X_ i \,[/mathjaxinline] are discrete, we can also use a logistic regression model to analyze the data. That is, now we assume

[mathjax]Y_ i | X_ i \sim \textsf{Ber}\left( \frac{1}{1 + \mathbf e^{-(X_ i \beta _1 + \beta _0})} \right),[/mathjax]
for [mathjaxinline]\, \beta _0, \beta _1 \in \mathbb {R} \,[/mathjaxinline], and that given [mathjaxinline]\, X_ i \,[/mathjaxinline], the [mathjaxinline]\, Y_ i \,[/mathjaxinline] are independent.

Calculate the maximum likelihood estimator [mathjaxinline]\, \widehat{\beta }_0 \,[/mathjaxinline], [mathjaxinline]\, \widehat{\beta }_1 \,[/mathjaxinline] for [mathjaxinline]\, \beta _0 \,[/mathjaxinline] and [mathjaxinline]\, \beta _1 \,[/mathjaxinline], where we again assume that all [mathjaxinline]\, f_{kl} > 0 \,[/mathjaxinline]. Express your answer in terms of [mathjaxinline]f_{00}[/mathjaxinline] (enter “A"), [mathjaxinline]f_{01}[/mathjaxinline] (enter “B"), [mathjaxinline]f_{10}[/mathjaxinline] (enter “C"), [mathjaxinline]f_{11}[/mathjaxinline] (enter “D") and [mathjaxinline]n[/mathjaxinline].

[mathjaxinline]\widehat{\beta }_{0}[/mathjaxinline]
(A+D)/(A+B+C+D)
incorrect
\(\)
[mathjaxinline]\widehat{\beta }_{1}[/mathjaxinline]

Answer 1

(B+D)/(A+B+C+D)

Answer 2

To calculate the maximum likelihood estimators for β₀ and β₁, we need to maximize the log-likelihood function. The log-likelihood function is given by:

L(β₀, β₁) = ∑[i=1 to n] (Yᵢ log(p) + (1 - Yᵢ) log(1 - p))

Where p = 1 / (1 + exp(-(Xᵢβ₁ + β₀)))

Taking the derivative of L(β₀, β₁) with respect to β₀ and β₁, and setting them to zero, we can solve for the maximum likelihood estimators.

Let's calculate the maximum likelihood estimators step-by-step:

1. Taking the derivative of L(β₀, β₁) with respect to β₀:

∂L/∂β₀ = ∑[i=1 to n] (Yᵢ - p)

Setting ∂L/∂β₀ to zero:

0 = ∑[i=1 to n] (Yᵢ - p)

Expanding p:

0 = ∑[i=1 to n] (Yᵢ - 1 / (1 + exp(-(Xᵢβ₁ + β₀))))

Rearranging the equation:

0 = ∑[i=1 to n] (Yᵢ - 1) - ∑[i=1 to n] 1 / (1 + exp(-(Xᵢβ₁ + β₀)))

Simplifying further:

0 = ∑[i=1 to n] (Yᵢ - 1) - n / (1 + exp(-β₀))

Rearranging again:

n / (1 + exp(-β₀)) = ∑[i=1 to n] (Yᵢ - 1)

Dividing both sides by n:

1 / (1 + exp(-β₀)) = (∑[i=1 to n] (Yᵢ - 1)) / n

Simplifying further:

exp(β₀) = 1 / ((∑[i=1 to n] (Yᵢ - 1)) / n - 1)

Taking the natural logarithm of both sides:

β₀ = ln(1 / ((∑[i=1 to n] (Yᵢ - 1)) / n - 1))

2. Taking the derivative of L(β₀, β₁) with respect to β₁:

∂L/∂β₁ = ∑[i=1 to n] (Xᵢ(Yᵢ - p))

Setting ∂L/∂β₁ to zero:

0 = ∑[i=1 to n] (Xᵢ(Yᵢ - p))

Expanding p:

0 = ∑[i=1 to n] (Xᵢ(Yᵢ - 1 / (1 + exp(-(Xᵢβ₁ + β₀)))))

Rearranging the equation:

0 = ∑[i=1 to n] (XᵢYᵢ - Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀))))

Simplifying further:

0 = ∑[i=1 to n] (XᵢYᵢ) - ∑[i=1 to n] (Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀))))

Rearranging again:

∑[i=1 to n] (Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀)))) = ∑[i=1 to n] (XᵢYᵢ)

Dividing both sides by n:

(1 / n) ∑[i=1 to n] (Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀)))) = (1 / n) ∑[i=1 to n] (XᵢYᵢ)

Simplifying further:

(1 / n) ∑[i=1 to n] (Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀)))) = f₁₁

where f₁₁ = (1 / n) ∑[i=1 to n] (XᵢYᵢ)

Therefore, the maximum likelihood estimator for β₁ is:

β₁ = (1 / n) ∑[i=1 to n] (Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀)))) / f₁₁

Unfortunately, the maximum likelihood estimator for β₀ cannot be expressed solely in terms of f_{kl}, n, and the given parameters.