Supplement 2.1: Point Clouds and Linear Regression Lines (2/3)

Best-fit straight lines passing through the origin

Our next step is to differentiate S(a,b) with respect to a.

For simplicity, we consider a point cloud with a centroid P ¯ ( x ¯ ', y ¯ ') in the origin of an (x',y') coordinate system. Since the centroid is a point on the regression line, one has b=0 and

S(a)= 1 n i=1 n ( y i ' a x i ' ) 2

Differentiated:             S'(a)= 2 n ( a i=1 n x i 2 ' i=1 n x i ' y i ' )

and setting S'(a)=0 yields:

a= i=1 n x i ' y i ' i=1 n x i 2 ' ,

where i=1 n x i 2 ' 0 is a necessary condition.

We consider n points P( x 1 ' , x 1 ' ),...P( x n ' , x n ' ) , their centroid P ¯ ( x ¯ ', y ¯ ') lying in the origin of the coordinate system. Then the best-fit line
y'=ax' is passing through the origin and has the slope:

                                 a= i=1 n x i ' y i ' i=1 n x i 2 '

Best-fit straight lines with intercept b

We set x i ' = x i x ¯ and y i ' = y i y ¯ which corresponds to a shift of coordinates, the new (x,y) coordinate axes being shifted by ( x ¯ , y ¯ ) with respect to the (x',y') coordinates used in the left column.

We consider n points P( x 1 , y 1 ),...P( x n , y n ) with a centroid at P( x ¯ , y ¯ ) .

It follows:

The slope of a best-fit line for a point cloud P( x 1 , y 1 ),...P( x n , y n ) having its centroid at point P( x ¯ , y ¯ ) is:

                            a= i=1 n ( x i x ¯ )( y i y ¯ ) i=1 n ( x i x ¯ ) 2

Since the centroid is lying on the best-fit line, once can use the point-slope form of straight lines to calculate:

                                  y y ¯ =a(x x ¯ )

The intercept follows from:

                                      b= y ¯ a x ¯
Zoom Sign
best-fit line
Data points, their centroid P( x ¯ , y ¯ ) and linear best-fit line f(x)=ax+b .