작성일자 : 2023-12-25
Ver 0.1.1
강의에서 소개된 파이썬 주요 기능¶
- statsmodels.formula.api.ols: https://www.statsmodels.org/dev/generated/statsmodels.formula.api.ols.html
- statsmodels.regression.linear_model.OLS: https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLS.html
- statsmodels.regression.linear_model.OLS.fit: https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLS.fit.html
슈팅 횟수와 득점 횟수 간 선형 회귀¶
In [1]:
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
(1) 예시 경기 슈팅 및 득점 횟수 시각화¶
In [2]:
plt.figure(figsize=(10, 6))
plt.rcParams.update({'font.size': 15})
shot_counts = np.array([8, 38, 32, 22, 29, 13, 25, 37, 19, 23])
goal_counts = np.array([1, 5, 4, 3, 1, 0, 2, 7, 2, 2])
plt.scatter(shot_counts, goal_counts)
plt.xlim(0, 40)
plt.ylim(0, 8)
plt.xlabel('Number of shots')
plt.ylabel('Number of goals')
plt.show()
(2) 슈팅 횟수와 득점 횟수 간 선형 관계식 학습 - 상수항 포함¶
In [3]:
data = pd.DataFrame({'x': shot_counts, 'y': goal_counts})
smf.ols(formula='y ~ x', data=data)
Out[3]:
<statsmodels.regression.linear_model.OLS at 0x16859bc10>
In [4]:
data = pd.DataFrame({'x': shot_counts, 'y': goal_counts})
model_fit = smf.ols(formula='y ~ x', data=data).fit()
print(model_fit.summary())
OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.652 Model: OLS Adj. R-squared: 0.608 Method: Least Squares F-statistic: 14.98 Date: Mon, 18 Dec 2023 Prob (F-statistic): 0.00474 Time: 19:10:01 Log-Likelihood: -15.857 No. Observations: 10 AIC: 35.71 Df Residuals: 8 BIC: 36.32 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept -1.5930 1.185 -1.344 0.216 -4.326 1.140 x 0.1745 0.045 3.871 0.005 0.071 0.278 ============================================================================== Omnibus: 1.183 Durbin-Watson: 2.011 Prob(Omnibus): 0.554 Jarque-Bera (JB): 0.103 Skew: -0.240 Prob(JB): 0.950 Kurtosis: 3.125 Cond. No. 74.7 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
/Users/limjongjun/opt/anaconda3/envs/class101/lib/python3.8/site-packages/scipy/stats/_stats_py.py:1736: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10 warnings.warn("kurtosistest only valid for n>=20 ... continuing "
In [5]:
model_fit.params
Out[5]:
Intercept -1.592964 x 0.174511 dtype: float64
(3) 선형 회귀 결과와 관측치 비교 - 상수항 포함¶
In [6]:
plt.figure(figsize=(10, 6))
plt.scatter(shot_counts, goal_counts, c='black', s=50)
a = model_fit.params[0]
b = model_fit.params[-1]
x = np.arange(40, step=0.1)
y = a + b * x # 회귀식
plt.plot(x, y, c='black')
for i, n in enumerate(shot_counts):
plt.plot([n, n], [goal_counts[i], a + b * n], c='red')
plt.xlim(0, 40)
plt.hlines(0, 0, 40, linestyles='--', color='black')
plt.xlabel('Number of shots')
plt.ylabel('Number of goals')
plt.show()
- y절편이 0 미만인 것은 말이 안되므로 빼고 계산 필요 (y의 최소값은 0임)
(4) 슈팅 횟수와 득점 횟수 간 선형 관계식 학습 - 상수항 제외¶
In [7]:
data = pd.DataFrame({'x': shot_counts, 'y': goal_counts})
model_fit = smf.ols(formula='y ~ x - 1', data=data).fit() # 'y ~ x - 1'을 통해 상수항 제거
print(model_fit.summary())
OLS Regression Results ======================================================================================= Dep. Variable: y R-squared (uncentered): 0.849 Model: OLS Adj. R-squared (uncentered): 0.832 Method: Least Squares F-statistic: 50.44 Date: Mon, 18 Dec 2023 Prob (F-statistic): 5.66e-05 Time: 19:10:01 Log-Likelihood: -16.875 No. Observations: 10 AIC: 35.75 Df Residuals: 9 BIC: 36.05 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ x 0.1178 0.017 7.102 0.000 0.080 0.155 ============================================================================== Omnibus: 2.087 Durbin-Watson: 1.801 Prob(Omnibus): 0.352 Jarque-Bera (JB): 0.385 Skew: 0.451 Prob(JB): 0.825 Kurtosis: 3.334 Cond. No. 1.00 ============================================================================== Notes: [1] R² is computed without centering (uncentered) since the model does not contain a constant. [2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
/Users/limjongjun/opt/anaconda3/envs/class101/lib/python3.8/site-packages/scipy/stats/_stats_py.py:1736: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10 warnings.warn("kurtosistest only valid for n>=20 ... continuing "
In [8]:
model_fit.params
Out[8]:
x 0.1178 dtype: float64
(5) 선형 회귀 결과와 관측치 비교 - 상수항 제외¶
In [9]:
plt.figure(figsize=(10, 6))
plt.scatter(shot_counts, goal_counts, c='black', s=50)
b = model_fit.params[0]
x = np.arange(40, step=0.1)
y = b * x
plt.plot(x, y, c='black')
for i, n in enumerate(shot_counts):
plt.plot([n, n], [goal_counts[i], b * n], c='red')
plt.xlim(0, 40)
plt.ylim(0, 8)
plt.xlabel('Number of shots')
plt.ylabel('Number of goals')
plt.show()