# Where can I find a proof for:

SST = SSM + SSE

## SST = ∑X^2 - (∑X)^2/N

...where ∑X^2 = sum of squared scores and (∑X)^2 = square of the summed scores. N = total sample size.

SSM = ∑A^2/n - (∑X)^2/N
...where A^2 = square of the sum of scores in each group and n = sample size per group.

SSE = ∑X^2 - ∑A^2/n

SST = ∑A^2/n - (∑X)^2/N + ∑X^2 - ∑A^2/n = ∑X^2 - (∑X)^2/N

I hope this will help.

## Wow. That's right, but how do you get from

SST = ∑((x - MEAN)^2)
to this
SST = ∑(x^2) - (∑(x))^2/N
?

The former is the definition I'm used to. The latter is what you used in your simple proof. I tested it out and they are equal, but can you prove that?

## Got it...

SST = ∑((x - MEAN)^2)
= ∑(x^2 - 2 * x1 * MEAN + MEAN^2)
= ∑(x^2) - 2 * MEAN * ∑(x) + N * MEAN^2
MEAN = ∑(x)/N
= ∑(x^2) - 2 * ∑(x)^2/N + ∑(x)^2/N
= ∑(x^2) - ∑(x)^2/N

Awesome! Thanks!

## OK, SST makes sense, but I can't see how to derive your SSM or SSE formulas:

I get this:

SSM = ∑((MODEL - MEAN)^2)
= ∑(MODEL^2 - 2 * MODEL * MEAN + MEAN^2)
= ∑(MODEL^2) - 2 * MEAN * SIGMA(MODEL) + N * MEAN^2

SSE = ∑((X - MODEL)^2)
= ∑(X^2 - 2 * X * MODEL + MODEL^2)
= ∑(X^2) - 2 * ∑(X * MODEL) + ∑(MODEL^2)

When I add those two, terms don't cancel out and I don't get SST. What am I missing?

## To find a proof for the equation SST = SSM + SSE, we need some context about what these variables represent. Typically, this equation is used in the analysis of variance (ANOVA) method to decompose the total sum of squares (SST) into two components: the sum of squares due to the model (SSM) and the sum of squares due to the error (SSE).

The equation can be derived by considering the following definitions:

1. Total sum of squares (SST): This represents the total variability in the response variable. It is calculated by summing up the squared differences between each data point and the overall mean of the response variable:

SST = ∑(Yi - Ȳ)²

where Yi is the observed value of the response variable for the i-th data point, and Ȳ is the overall mean of the response variable.

2. Sum of squares due to the model (SSM): This represents the variability explained by the model or the sum of squares accounted for by the independent variables. It is calculated by summing up the squared differences between the predicted values from the model (Ŷi) and the overall mean of the response variable:

SSM = ∑(Ŷi - Ȳ)²

In ANOVA, SSM accounts for the variation between groups or different levels of the independent variables.

3. Sum of squares due to the error (SSE): This represents the variability that is not explained by the model or the sum of squares attributable to random error. It is calculated by summing up the squared differences between the observed values (Yi) and the predicted values from the model (Ŷi):

SSE = ∑(Yi - Ŷi)²

In ANOVA, SSE accounts for the variation within groups or levels of the independent variables.

The proof of SST = SSM + SSE involves showing that the sum of squares for the model and the sum of squares for the error, added together, equals the total sum of squares. This can be done by expanding the equations for SSM and SSE, and then simplifying the expressions.

Since the proof involves mathematical computations and expressions, it may be best to refer to a statistics textbook or an online resource that provides a step-by-step explanation and derivation of the equation.