I'm trying to follow a research paper

The paper shows an equation to minimize. That makes perfect sense. Then, the paper says:

"The optimal solution to the least squares problem [above] is found by differentiation as a solution of a linear system of equations."

I am very familiar with traditional linear algebra least squares:

y = xb + e
solve for coefficients b by:
b = (x^T * x)^-1 * x^T * y

I understand the equation to be minimized but I don't understand the formulas that follow, and I think the paper is using some alternative least squares approach that I am not familiar with. It doesn't look like the paper is doing regular least squares (like I know). Any ideas what the paper is referring to? How do you use differentiation as a solution to a linear system of equations?

Differentiation is used when the equation is nonlinear. To parameterize the equation, partial derivatives are used to construct the design matrix (in your case, that would be the b matrix.) x would be the unknown vector, e would be the residual vector and y would be your observation vector.

Based on the information you provided, it seems that the paper is referring to a different approach to solving the least squares problem. Instead of directly using the formula you mentioned, the paper suggests using differentiation to find the optimal solution.

To understand this approach, let's break it down step by step:

1. Start with the least squares problem you mentioned:
y = xb + e

2. The goal is to find the coefficients b that minimize the sum of squares of the residuals (e). In traditional least squares, this is often done using the formula you mentioned:
b = (x^T * x)^-1 * x^T * y

3. However, the paper proposes an alternative approach using differentiation. The idea is to minimize the sum of squares of residuals by finding the values of b at which the derivative of the sum of squares error function is zero.

4. The error function in this case is the sum of squares of residuals, which can be represented as:
E = (y - xb)^T * (y - xb)
= y^T * y - 2 * b^T * x^T * y + b^T * x^T * x * b

5. To minimize this error function using differentiation, we take the derivative of E with respect to b and set it equal to zero. This yields a linear system of equations, specifically:
0 = -2 * x^T * y + 2 * x^T * x * b

6. Rearranging the equation, we get:
x^T * x * b = x^T * y

7. This is a linear system of equations, which can be solved to find the optimal coefficients b. It is often done using matrix algebra, similar to traditional least squares. In this case, you can solve for b using:
b = (x^T * x)^-1 * x^T * y

So, the paper is suggesting an alternative approach to finding the optimal solution of the least squares problem. Instead of using the direct formula, it utilizes differentiation to solve a system of linear equations.