Understanding Lines of Regression and Coefficient of Correlation

Understanding Lines of Regression and Coefficient of Correlation

Regression and correlation are two fundamental statistical concepts used in data analysis.

  • Regression helps us find a relationship between dependent and independent variables. It provides an equation to predict values.
  • Correlation measures the strength and direction of the relationship between two variables.

1. Line of Regression

A regression line is the best-fit line that represents the relationship between two variables, usually denoted as:

y=a+bxy = a + bx

Where:

  • yy
    is the dependent variable (predicted value).
  • xx
    is the independent variable.
  • aa
    is the intercept (value of 
    yy
    when 
    x=0x = 0
    ).
  • bb
    is the slope (rate of change of 
    yy
    with respect to 
    xx
    ).

The slope (
bb
) is given by:

b=n∑xy−∑x∑yn∑x2−(∑x)2b = \frac{n \sum xy - \sum x \sum y}{n \sum x^2 - (\sum x)^2}

The intercept (
aa
) is given by:

a=∑y−b∑xna = \frac{\sum y - b \sum x}{n}

2. Coefficient of Correlation (rr)

The Pearson correlation coefficient (
rr
) measures how strong the relationship is between two variables.

r=n∑xy−∑x∑y[n∑x2−(∑x)2][n∑y2−(∑y)2]r = \frac{n \sum xy - \sum x \sum y}{\sqrt{[n \sum x^2 - (\sum x)^2] [n \sum y^2 - (\sum y)^2]}}
  • rr
    varies between -1 and 1:
    • r=1r = 1
      : Perfect positive correlation.
    • r=−1r = -1
      : Perfect negative correlation.
    • r=0r = 0
      : No correlation.

3. Sample Data & Computation

Dataset 1

xx yy
1 2
2 3
3 5
4 4
5 6

Step 1: Compute Needed Values

xx yy x2x^2 y2y^2 xyxy
1 2 1 4 2
2 3 4 9 6
3 5 9 25 15
4 4 16 16 16
5 6 25 36 30
∑x=15,∑y=20,∑x2=55,∑y2=90,∑xy=69\sum x = 15, \quad \sum y = 20, \quad \sum x^2 = 55, \quad \sum y^2 = 90, \quad \sum xy = 69

Step 2: Compute Regression Line

Using formulas:

b=5(69)−(15)(20)5(55)−(15)2=345−300275−225=4550=0.9b = \frac{5(69) - (15)(20)}{5(55) - (15)^2} = \frac{345 - 300}{275 - 225} = \frac{45}{50} = 0.9 a=20−0.9(15)5=20−13.55=6.55=1.3a = \frac{20 - 0.9(15)}{5} = \frac{20 - 13.5}{5} = \frac{6.5}{5} = 1.3

So, the regression equation is:

y=1.3+0.9xy = 1.3 + 0.9x

Step 3: Compute Correlation Coefficient

r=5(69)−(15)(20)[5(55)−(15)2][5(90)−(20)2]r = \frac{5(69) - (15)(20)}{\sqrt{[5(55) - (15)^2] [5(90) - (20)^2]}} =345−300(275−225)(450−400)= \frac{345 - 300}{\sqrt{(275 - 225)(450 - 400)}} =4550×50=4550=0.9= \frac{45}{\sqrt{50 \times 50}} = \frac{45}{50} = 0.9

Since 
r=0.9r = 0.9
, there is a strong positive correlation.


4. Second Sample Dataset

xx yy
10 40
20 30
30 20
40 10
50 5

Step 1: Compute Needed Values

xx yy x2x^2 y2y^2 xyxy
10 40 100 1600 400
20 30 400 900 600
30 20 900 400 600
40 10 1600 100 400
50 5 2500 25 250
∑x=150,∑y=105,∑x2=5500,∑y2=3025,∑xy=2250\sum x = 150, \quad \sum y = 105, \quad \sum x^2 = 5500, \quad \sum y^2 = 3025, \quad \sum xy = 2250

Step 2: Compute Regression Line

b=5(2250)−(150)(105)5(5500)−(150)2=11250−1575027500−22500=−45005000=−0.9b = \frac{5(2250) - (150)(105)}{5(5500) - (150)^2} = \frac{11250 - 15750}{27500 - 22500} = \frac{-4500}{5000} = -0.9 a=105−(−0.9)(150)5=105+1355=2405=48a = \frac{105 - (-0.9)(150)}{5} = \frac{105 + 135}{5} = \frac{240}{5} = 48

So, the regression equation is:

y=48−0.9xy = 48 - 0.9x

Step 3: Compute Correlation Coefficient

r=5(2250)−(150)(105)[5(5500)−(150)2][5(3025)−(105)2]r = \frac{5(2250) - (150)(105)}{\sqrt{[5(5500) - (150)^2] [5(3025) - (105)^2]}} =11250−15750(27500−22500)(15125−11025)= \frac{11250 - 15750}{\sqrt{(27500 - 22500)(15125 - 11025)}} =−45005000×4100= \frac{-4500}{\sqrt{5000 \times 4100}} =−450020500000=−45004527≈−0.995= \frac{-4500}{\sqrt{20500000}} = \frac{-4500}{4527} \approx -0.995

Since 
r≈−0.995r \approx -0.995
, this indicates a strong negative correlation.


5. Summary of Results

Dataset Regression Equation Correlation Coefficient (rr)
1 y=1.3+0.9xy = 1.3 + 0.9x 0.90.9 (Strong positive)
2 y=48−0.9xy = 48 - 0.9x −0.995-0.995 (Strong negative)

6. Conclusion

  • A positive correlation (
    r>0r > 0
    ) means 
    yy
    increases as 
    xx
    increases.
  • A negative correlation (
    r<0r < 0
    ) means 
    yy
    decreases as 
    xx
    increases.
  • The regression equation helps predict values based on given input data.

This explanation provides a clear foundation for writing a program to compute regression and correlation. You can implement this in Python, Java, or any language by following the formulas step-by-step. 🚀



Contact us for software training, education or development










 

Post a Comment

Me