本文介绍了使用statsmodels AutoReg模型的自回归建模。它还涵盖了ar_selectorder在选择最小化信息标准(如AIC)的模型方面的帮助。一个自回归模型形式如下 $$y{t}=\delta+\phi{1} y{t-1}+\ldots+\phi{p} y{t-p}+\epsilon_{t}$$
AutoReg具有如下参数:
它的完整的形式为: $$ y_{t}=\delta_{0}+\delta_{1} t+\phi_{1} y_{t-1}+\ldots+\phi_{p} y_{t-p}+\sum_{i=1}^{s-1} \gamma_{i} d_{i}+\sum_{j=1}^{m} \kappa_{j} x_{t, j}+\epsilon_{t} $$
其中:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import pandas_datareader as pdr
import seaborn as sns
from statsmodels.tsa.api import acf, graphics, pacf
from statsmodels.tsa.ar_model import AutoReg, ar_select_order
第一组例子使用的是未经过季节性调整的美国房屋开工环比增长率。季节性是明显的,有规律的波峰和波谷。我们将时间序列的频率设置为“MS”(month-start),以避免使用AutoReg时出现警告。
data = pdr.get_data_fred("HOUSTNSA", "1959-01-01", "2019-06-01")
housing = data.HOUSTNSA.pct_change().dropna()
# Scale by 100 to get percentages
housing = 100 * housing.asfreq("MS")
fig, ax = plt.subplots()
ax = housing.plot(ax=ax)
取AR(3)进行建模
mod = AutoReg(housing, 3, old_names=False)
res = mod.fit()
print(res.summary())
AutoReg支持与OLS相同的协方差估计。下面,我们使用cov_type=“HC0”,这是White的协方差估计。虽然参数估计值是相同的,但所有依赖于标准误差的量都在变化。
res = mod.fit(cov_type="HC0")
print(res.summary())
plot_predict可视化预测。在这里,我们提供了大量的预测,这些预测显示了该模型捕捉到的一连串季节性。
fig = res.plot_predict(720, 840)
plot_diagnostics表示该模型捕获了数据中的关键特性。
fig = plt.figure(figsize=(16, 9))
fig = res.plot_diagnostics(fig=fig, lags=30)
AutoReg支持季节性模型,这是模拟季节性的另一种方法。含这一项将动态缩短为仅AR(2)。
sel = ar_select_order(housing, 13, seasonal=True, old_names=False)
sel.ar_lags
res = sel.model.fit()
print(res.summary())
在未来10年的预测中,季节性哑变量是明显的,它在未来10年的所有时期都具有重要的季节性成分。
fig = res.plot_predict(720, 840)
虽然AutoReg不直接支持季节成分,因为它使用OLS来估计参数,但它可以使用过度参数化的季节AR来捕获季节动态,这不会在季节AR中施加限制。
我们首先使用只选择最大延迟的简单方法来选择一个模型。检查的最大延迟设置为13,因为这允许模型在一个既有短期AR(1)组件又有一个季节性AR(1)组件的季节性AR下进行检查,因此 $$ \left(1-\phi_{s} L^{12}\right)\left(1-\phi_{1} L\right) y_{t}=\epsilon_{t} $$ 变成 $$ y_{t}=\phi_{1} y_{t-1}+\phi_{s} Y_{t-12}-\phi_{1} \phi_{s} Y_{t-13}+\epsilon_{t} $$
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import pandas_datareader as pdr
import seaborn as sns
from statsmodels.tsa.api import acf, graphics, pacf
from statsmodels.tsa.ar_model import AutoReg, ar_select_order
sns.set_style("darkgrid")
pd.plotting.register_matplotlib_converters()
# Default figure size
sns.mpl.rc("figure", figsize=(16, 6))
sns.mpl.rc("font", size=14)
data = pdr.get_data_fred("HOUSTNSA", "1959-01-01", "2019-06-01")
housing = data.HOUSTNSA.pct_change().dropna()
# Scale by 100 to get percentages
housing = 100 * housing.asfreq("MS")
fig, ax = plt.subplots()
ax = housing.plot(ax=ax)
mod = AutoReg(housing, 3, old_names=False)
res = mod.fit()
print(res.summary())
AutoReg Model Results ============================================================================== Dep. Variable: HOUSTNSA No. Observations: 725 Model: AutoReg(3) Log Likelihood -2993.442 Method: Conditional MLE S.D. of innovations 15.289 Date: Fri, 12 Aug 2022 AIC 5996.884 Time: 14:29:21 BIC 6019.794 Sample: 05-01-1959 HQIC 6005.727 - 06-01-2019 =============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------- const 1.1228 0.573 1.961 0.050 0.000 2.245 HOUSTNSA.L1 0.1910 0.036 5.235 0.000 0.120 0.263 HOUSTNSA.L2 0.0058 0.037 0.155 0.877 -0.067 0.079 HOUSTNSA.L3 -0.1939 0.036 -5.319 0.000 -0.265 -0.122 Roots ============================================================================= Real Imaginary Modulus Frequency ----------------------------------------------------------------------------- AR.1 0.9680 -1.3298j 1.6448 -0.1499 AR.2 0.9680 +1.3298j 1.6448 0.1499 AR.3 -1.9064 -0.0000j 1.9064 -0.5000 -----------------------------------------------------------------------------
res = mod.fit(cov_type="HC0")
print(res.summary())
AutoReg Model Results ============================================================================== Dep. Variable: HOUSTNSA No. Observations: 725 Model: AutoReg(3) Log Likelihood -2993.442 Method: Conditional MLE S.D. of innovations 15.289 Date: Fri, 12 Aug 2022 AIC 5996.884 Time: 14:29:21 BIC 6019.794 Sample: 05-01-1959 HQIC 6005.727 - 06-01-2019 =============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------- const 1.1228 0.601 1.869 0.062 -0.055 2.300 HOUSTNSA.L1 0.1910 0.035 5.499 0.000 0.123 0.259 HOUSTNSA.L2 0.0058 0.039 0.150 0.881 -0.070 0.081 HOUSTNSA.L3 -0.1939 0.036 -5.448 0.000 -0.264 -0.124 Roots ============================================================================= Real Imaginary Modulus Frequency ----------------------------------------------------------------------------- AR.1 0.9680 -1.3298j 1.6448 -0.1499 AR.2 0.9680 +1.3298j 1.6448 0.1499 AR.3 -1.9064 -0.0000j 1.9064 -0.5000 -----------------------------------------------------------------------------
sel = ar_select_order(housing, 13, old_names=False)
sel.ar_lags
res = sel.model.fit()
print(res.summary())
AutoReg Model Results ============================================================================== Dep. Variable: HOUSTNSA No. Observations: 725 Model: AutoReg(13) Log Likelihood -2676.157 Method: Conditional MLE S.D. of innovations 10.378 Date: Fri, 12 Aug 2022 AIC 5382.314 Time: 14:29:21 BIC 5450.835 Sample: 03-01-1960 HQIC 5408.781 - 06-01-2019 ================================================================================ coef std err z P>|z| [0.025 0.975] -------------------------------------------------------------------------------- const 1.3615 0.458 2.970 0.003 0.463 2.260 HOUSTNSA.L1 -0.2900 0.036 -8.161 0.000 -0.360 -0.220 HOUSTNSA.L2 -0.0828 0.031 -2.652 0.008 -0.144 -0.022 HOUSTNSA.L3 -0.0654 0.031 -2.106 0.035 -0.126 -0.005 HOUSTNSA.L4 -0.1596 0.031 -5.166 0.000 -0.220 -0.099 HOUSTNSA.L5 -0.0434 0.031 -1.387 0.165 -0.105 0.018 HOUSTNSA.L6 -0.0884 0.031 -2.867 0.004 -0.149 -0.028 HOUSTNSA.L7 -0.0556 0.031 -1.797 0.072 -0.116 0.005 HOUSTNSA.L8 -0.1482 0.031 -4.803 0.000 -0.209 -0.088 HOUSTNSA.L9 -0.0926 0.031 -2.960 0.003 -0.154 -0.031 HOUSTNSA.L10 -0.1133 0.031 -3.665 0.000 -0.174 -0.053 HOUSTNSA.L11 0.1151 0.031 3.699 0.000 0.054 0.176 HOUSTNSA.L12 0.5352 0.031 17.133 0.000 0.474 0.596 HOUSTNSA.L13 0.3178 0.036 8.937 0.000 0.248 0.388 Roots ============================================================================== Real Imaginary Modulus Frequency ------------------------------------------------------------------------------ AR.1 1.0913 -0.0000j 1.0913 -0.0000 AR.2 0.8743 -0.5018j 1.0080 -0.0829 AR.3 0.8743 +0.5018j 1.0080 0.0829 AR.4 0.5041 -0.8765j 1.0111 -0.1669 AR.5 0.5041 +0.8765j 1.0111 0.1669 AR.6 0.0056 -1.0530j 1.0530 -0.2491 AR.7 0.0056 +1.0530j 1.0530 0.2491 AR.8 -0.5263 -0.9335j 1.0716 -0.3317 AR.9 -0.5263 +0.9335j 1.0716 0.3317 AR.10 -0.9525 -0.5880j 1.1194 -0.4120 AR.11 -0.9525 +0.5880j 1.1194 0.4120 AR.12 -1.2928 -0.2608j 1.3189 -0.4683 AR.13 -1.2928 +0.2608j 1.3189 0.4683 ------------------------------------------------------------------------------
fig = res.plot_predict(720, 840)
fig = plt.figure(figsize=(16, 9))
fig = res.plot_diagnostics(fig=fig, lags=30)
sel = ar_select_order(housing, 13, seasonal=True, old_names=False)
sel.ar_lags
res = sel.model.fit()
print(res.summary())
AutoReg Model Results ============================================================================== Dep. Variable: HOUSTNSA No. Observations: 725 Model: Seas. AutoReg(2) Log Likelihood -2652.556 Method: Conditional MLE S.D. of innovations 9.487 Date: Fri, 12 Aug 2022 AIC 5335.112 Time: 14:29:22 BIC 5403.863 Sample: 04-01-1959 HQIC 5361.648 - 06-01-2019 =============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------- const 1.2726 1.373 0.927 0.354 -1.418 3.963 s(2,12) 32.6477 1.824 17.901 0.000 29.073 36.222 s(3,12) 23.0685 2.435 9.472 0.000 18.295 27.842 s(4,12) 10.7267 2.693 3.983 0.000 5.449 16.005 s(5,12) 1.6792 2.100 0.799 0.424 -2.437 5.796 s(6,12) -4.4229 1.896 -2.333 0.020 -8.138 -0.707 s(7,12) -4.2113 1.824 -2.309 0.021 -7.786 -0.636 s(8,12) -6.4124 1.791 -3.581 0.000 -9.922 -2.902 s(9,12) 0.1095 1.800 0.061 0.952 -3.419 3.638 s(10,12) -16.7511 1.814 -9.234 0.000 -20.307 -13.196 s(11,12) -20.7023 1.862 -11.117 0.000 -24.352 -17.053 s(12,12) -11.9554 1.778 -6.724 0.000 -15.440 -8.470 HOUSTNSA.L1 -0.2953 0.037 -7.994 0.000 -0.368 -0.223 HOUSTNSA.L2 -0.1148 0.037 -3.107 0.002 -0.187 -0.042 Roots ============================================================================= Real Imaginary Modulus Frequency ----------------------------------------------------------------------------- AR.1 -1.2862 -2.6564j 2.9514 -0.3218 AR.2 -1.2862 +2.6564j 2.9514 0.3218 -----------------------------------------------------------------------------
fig = res.plot_predict(720, 840)
fig = plt.figure(figsize=(16, 9))
fig = res.plot_diagnostics(lags=30, fig=fig)
yoy_housing = data.HOUSTNSA.pct_change(12).resample("MS").last().dropna()
_, ax = plt.subplots()
ax = yoy_housing.plot(ax=ax)