I’m teaching myself linear regression (with some help from family), and I want to expand this code to calculate the MSE between the line of regression and the actual points. I understand the written math, but writing the program to do it is proving difficult.
Here’s “my” (I edited a geeks for geeks tutorial to shorten some things and get rid of the “main()” function they provided) code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
def estimate_coef(x, y):
n = np.size(x)
m_x = np.mean(x)
m_y = np.mean(y)
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
print(f"Estimated Coefficients:nb_0 = {b_0}nb_1 = {b_1}")
return(b_0, b_1)
def plot_regression_line(x, y, b):
#plotting the actual points as a scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
y_pred = b[0] + b[1]*x
#plotting regression line
plt.plot(x, y_pred, color = "g")
plt.xlabel('x')
plt.ylabel('y')
print(f"nCalcluated y predictions...n{y_pred}")
x = np.array([-5, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([-1, 1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
b = estimate_coef(x, y)
plot_regression_line(x, y, b)
plt.show()
Ideally I would like to add a function to calculate the mean squared error without needing to manually put in the values for Y_pred. I figured I may be able to make my plot_regression_line(x, y, b) return the values for y_pred it calculates, and use that as a variable for mean_squared_error. I’m not sure if that would even work, if that’s the best way to do it, or how I could make that happen.
Thank you for your time and assistance!
submitted by /u/Formula15
[link] [comments]
r/learnpython I’m teaching myself linear regression (with some help from family), and I want to expand this code to calculate the MSE between the line of regression and the actual points. I understand the written math, but writing the program to do it is proving difficult. Here’s “my” (I edited a geeks for geeks tutorial to shorten some things and get rid of the “main()” function they provided) code: import numpy as np import matplotlib.pyplot as plt from sklearn.metrics import mean_squared_error def estimate_coef(x, y): n = np.size(x) m_x = np.mean(x) m_y = np.mean(y) SS_xy = np.sum(y*x) – n*m_y*m_x SS_xx = np.sum(x*x) – n*m_x*m_x b_1 = SS_xy / SS_xx b_0 = m_y – b_1*m_x print(f”Estimated Coefficients:nb_0 = {b_0}nb_1 = {b_1}”) return(b_0, b_1) def plot_regression_line(x, y, b): #plotting the actual points as a scatter plot plt.scatter(x, y, color = “m”, marker = “o”, s = 30) y_pred = b[0] + b[1]*x #plotting regression line plt.plot(x, y_pred, color = “g”) plt.xlabel(‘x’) plt.ylabel(‘y’) print(f”nCalcluated y predictions…n{y_pred}”) x = np.array([-5, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) y = np.array([-1, 1, 3, 2, 5, 7, 8, 8, 9, 10, 12]) b = estimate_coef(x, y) plot_regression_line(x, y, b) plt.show() Ideally I would like to add a function to calculate the mean squared error without needing to manually put in the values for Y_pred. I figured I may be able to make my plot_regression_line(x, y, b) return the values for y_pred it calculates, and use that as a variable for mean_squared_error. I’m not sure if that would even work, if that’s the best way to do it, or how I could make that happen. Thank you for your time and assistance! submitted by /u/Formula15 [link] [comments]
I’m teaching myself linear regression (with some help from family), and I want to expand this code to calculate the MSE between the line of regression and the actual points. I understand the written math, but writing the program to do it is proving difficult.
Here’s “my” (I edited a geeks for geeks tutorial to shorten some things and get rid of the “main()” function they provided) code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
def estimate_coef(x, y):
n = np.size(x)
m_x = np.mean(x)
m_y = np.mean(y)
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
print(f"Estimated Coefficients:nb_0 = {b_0}nb_1 = {b_1}")
return(b_0, b_1)
def plot_regression_line(x, y, b):
#plotting the actual points as a scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
y_pred = b[0] + b[1]*x
#plotting regression line
plt.plot(x, y_pred, color = "g")
plt.xlabel('x')
plt.ylabel('y')
print(f"nCalcluated y predictions...n{y_pred}")
x = np.array([-5, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([-1, 1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
b = estimate_coef(x, y)
plot_regression_line(x, y, b)
plt.show()
Ideally I would like to add a function to calculate the mean squared error without needing to manually put in the values for Y_pred. I figured I may be able to make my plot_regression_line(x, y, b) return the values for y_pred it calculates, and use that as a variable for mean_squared_error. I’m not sure if that would even work, if that’s the best way to do it, or how I could make that happen.
Thank you for your time and assistance!
submitted by /u/Formula15
[link] [comments]