Now with an insight into the individuals’ characteristics like age and BMI, we wish to find how these variables affect the medical expenses, and hence use these to carry out regression and estimate/predict the average medical expenses for some specific individuals. Let us say we have a dataset of some individuals with their age, bio-mass index (BMI), and the amount spent by them on medical expenses in a month.
Reading excel linear regression analysis download#
Print('RMSE:',np.sqrt(metrics.You can download this Linear Regression Excel Template here – Linear Regression Excel Template Method #1 – Scatter Chart with a Trendline Print('MSE:',an_squared_error(y_train, y_pred)) Print('MAE:',an_absolute_error(y_train, y_pred)) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 4)Ĭoeffcients = pd.DataFrame(]).TĬoeffcients = coeffcients.rename(columns=) # Spliting target variable and independent variables
Housing = pd.read_csv(".data/housing.csv")įrom sklearn.model_selection import train_test_split Standard Error = SQRT(Unexplained variation / (n-(k+2))įrom ANOVA, Unexplained variation = 11473.14919 and n-(k+2) = 492įrom sklearn.linear_model import LinearRegression Standard Error – is the standard deviation of the observed y-values about the predicted ? -value for a given x-value N is number of observations (n = 506), and k is number of independent variables used in the model (k = 13). It only increases if the new predictor enhances the model.Īdjusted R-square = 1 – ( (n – (k + 1)) / (n – (k + 2) ) * (1 – R-square) Total variation is the sum of the squares of the differences between the y-value of each ordered pair and the mean of y. From ANOVA table, Explained variation = 31243.14662 and Total variation = 42716.29542Īdjusted R-square – when you penalize R-square for every new variable added to the model. R-square = Explained variation / Total variationĮxplained variation is the sum of the squared of the differences between each predicted y-value and the mean of y. It is also known as the “coefficient of determination”. R-Square – tells how close the data are to the fitted regression line. It varies between +1 to -1, and equal to the square root of R square. It tells the strength of the linear relationship.
Multiple R – also known as the correlation coefficient. “Regression Statistics”, tells how well the model captures the relationship between independent variables and the target variable.
Now will visit each section in the regression analysis to deeper our understanding. Press “OK” and you have done the regression analysis.
TAX – full-value property-tax rate per $10,000.RAD – index of accessibility to radial highways.DIS – weighted distances to five Boston employment centres.AGE – the proportion of owner-occupied units built prior to 1940.RM – the average number of rooms per dwelling.NOX – nitric oxides concentration (parts per 10 million).CHAS – Charles River dummy variable (1 if the tract bounds river otherwise 0).INDUS – the proportion of non-retail business acres per town.Stay here for a moment to understand the logical relationship with the median value or MEDV. Our goal is to predict the median value of homes using the independent variables.