以金融价格为例,传统的时间序列模型比如ARIMA,ARIMA-GARCH等,只分析价格自身的变化, 模型的形式为: $$ y_{t}=\beta_{1} \cdot y_{t-1}+\beta_{2} \cdot y_{t-2}+\ldots $$ 其中 $y_{t-1}$ 称为自身的滞后项。 但是VAR模型除了分析自身滞后项的影响外,还分析其他相关因素的滞后项对末来值产生的影响,模 型的形式为: $$ y_{t}=\beta_{1} \cdot y_{t-1}+\alpha_{1} \cdot x_{t-1}+\beta_{2} \cdot y_{t-2}+\alpha_{2} \cdot x_{t-2}+\ldots $$ 其中 $x_{t-1}$ 就是其他因子的滞后项。 总结一下,就是可以把VAR模型看做是集合多元线性回归的优点(可以加入多个因子) 以及时间序列 模型的优点(可以分析滞后项的影响)的综合模型。
VAR其实是一类模型,以上是最基础的VAR模型形式,其他还有SVAR,CVAR,VECM,同统称为VAR 类模型。
这种数学模型都有固定的建模步骤:
具体的实现步骤一般是,把滞后项的阶数列一个范围,比如1-5,然后直接建模,其他啥都不看,先看 AIC,BIC,LR的值。一般符合条件的不会只有一个,可以挑2-3个最好的,继续进行。
VAR除了对原始数据要进行平稳处理,估计出来的参数还需要检验参数稳定性。 这是为了查看模型在拟合时,数据样本有没有发生结构性变化。 有两张检验方法,这两种方法的基本概念是:
第二个是:cusum检验,模型残差累积和在一个区间内波动,不超出区间。 这里要注意的是CUSUM检验的原价设 $(\mathrm{HO})$ :系数平稳,备择假设才是不平稳。所以 CUSUM结果要无法拒绝原假设才算通过。 只有通过参数稳定性检验的模型才具有预测能力,进行脉冲响应和方法分解分析才有意义。
7) 使用乔里斯基正交化残差进行脉冲响应分析 举例:要分析和预测的是 $Y$ ,影响 $Y$ 的有两个因子 $X 1 , X 2$ 。 脉冲响应是1对 1 ,根据以上条件,就要做两个脉冲响应分析, 分别是:Y和X1,Y和X2。 看看不同因子上升或者下降,对Y的冲击的程度和方式(Y上升还是下降),以及持续时间。
VAR建模的时候以上面的条件为例,其实模型估计参数时会给出三个3个方程(应变量各自不同):
方程1: $y_{t}=\beta_{1} \cdot y_{t-1}+\alpha_{1} \cdot X 1_{t-1}+\Theta_{1} \cdot X 2_{t-1}+\varepsilon_{t}$
方程2: $X 1_{t}=\beta_{1} \cdot X 1_{t-1}+\alpha_{1} \cdot y_{t-1}+\Theta_{1} \cdot X 2_{t-1}+\eta_{t}$
方程3: $X 2_{t}=\beta_{1} \cdot X 2_{t-1}+\alpha_{1} \cdot y_{t-1}+\Theta_{1} \cdot X 1_{t-1}+\omega_{t}$
方程1的残差序列: $\varepsilon_{t}$ 方程2的残差序列: $\eta_{t}$ 方差3的残差序列: $\omega_{t}$ 三个方程的乔里斯基正交化的步骤就是:
最后用正交4/正交 5 ,得到的序列就是乔里斯基正交化残差了。 乔里斯基正交化之前要对方程的变量按重要性排序,更重要的放在分子上。
以上的步骤是不是很庞大,看着很麻烦? 但是电脑都会快速处理好的。
1)导入模块
# 模型相关包
import statsmodels.api as sm
import statsmodels.stats.diagnostic
# 画图包
import matplotlib.pyplot as plt
# 其他包
import pandas as pd
import numpy as np
data = pd.read_excel('data/gong0809.xlsx',index_col=0)
read = data['read']
share = data['share']
favorite = data['favorite']
2)画序列相关图
fig = plt.figure(figsize=(12,8))
plt.plot(read,'r',label='read')
plt.plot(share,'g',label='share')
plt.grid(True)
plt.axis('tight')
plt.legend(loc=0)
plt.ylabel('Price')
plt.show()
3)ADF单位根 python里的ADF检验结果就是下面的adfResult,我这里用output整理了一下,方便浏览。童鞋们也可以print结果,然后自行整理
maxlags = 10
adfResult = sm.tsa.stattools.adfuller(read,maxlags)
output = pd.DataFrame(index=['Test Statistic Value', "p-value", "Lags Used", "Number of Observations Used",
"Critical Value(1%)", "Critical Value(5%)", "Critical Value(10%)"],
columns=['value'])
output['value']['Test Statistic Value'] = adfResult[0]
output['value']['p-value'] = adfResult[1]
output['value']['Lags Used'] = adfResult[2]
output['value']['Number of Observations Used'] = adfResult[3]
output['value']['Critical Value(1%)'] = adfResult[4]['1%']
output['value']['Critical Value(5%)'] = adfResult[4]['5%']
output['value']['Critical Value(10%)'] = adfResult[4]['10%']
4)协整检验 python里面的协整检验通过coint()这个函数进行的,返回P-value值,越小,说明协整关系越强
result = sm.tsa.stattools.coint(read,favorite)
5)模型估计+定阶 这里PYTHON真的很烦,python有两套var估计,一个是VARMAX,一个是VAR。我看了官方文档后,觉得估计参数和定阶还是用VARMAX最好,因为可以返回很多东西,尤其是summary()里面的统计结果特别详细,直接包含了AIC,BIC,HQIC。
varLagNum = 5
data1 = data.copy()
data1 = data1.astype('float64')
#建立对象,dataframe就是前面的data,varLagNum就是你自己定的滞后阶数
orgMod = sm.tsa.VARMAX(data1,order=(varLagNum,0),trend='ct',exog=None)
#估计:就是模型
fitMod = orgMod.fit(maxiter=1000,disp=False)
# 打印统计结果
print(fitMod.summary())
# 获得模型残差
resid = fitMod.resid
result = {'fitMod':fitMod,'resid':resid}
6)系数平稳检验:CUSUM检验 这里也注意,Python这里不像EVIEWS,python没有办法算AR根,弄不到AR根图,但是python可以进行cusum检验。返回3各变量,第2个是P-value值,记得我之前说的吗,cusum检验要无法拒绝原假设,也就是说P-value值要大于0.05
# 原假设:无漂移(平稳),备择假设:有漂移(不平稳)
result = statsmodels.stats.diagnostic.breaks_cusumolsresid(resid)
7)脉冲响应图
# orthogonalized=True,代表采用乔里斯基正交
ax = fitMod.impulse_responses(terms, orthogonalized=True).plot(figsize=(12, 8))
plt.show()
8)方差分解图
md = sm.tsa.VAR(dataFrame)
re = md.fit(2)
fevd = re.fevd(10)
# 打印出方差分解的结果
print(fevd.summary())
# 画图
fevd.plot(figsize=(12, 16))
plt.show()
参考资料
# 模型相关包
import statsmodels.api as sm
import statsmodels.stats.diagnostic
# 画图包
import matplotlib.pyplot as plt
# 其他包
import pandas as pd
import numpy as np
data = pd.read_excel('data/gong0809.xlsx',index_col=0)
data
read | share | favorite | |
---|---|---|---|
date | |||
2022-04-27 | 20 | 1 | 0 |
2022-04-28 | 55 | 3 | 3 |
2022-04-29 | 29 | 2 | 0 |
2022-04-30 | 16 | 1 | 0 |
2022-05-01 | 22 | 3 | 0 |
... | ... | ... | ... |
2022-07-20 | 220 | 9 | 9 |
2022-07-21 | 179 | 9 | 4 |
2022-07-22 | 138 | 8 | 10 |
2022-07-23 | 187 | 8 | 4 |
2022-07-24 | 183 | 27 | 15 |
89 rows × 3 columns
read = data['read']
share = data['share']
favorite = data['favorite']
fig = plt.figure(figsize=(12,8))
plt.plot(read,'r',label='read')
plt.plot(share,'g',label='share')
plt.grid(True)
plt.axis('tight')
plt.legend(loc=0)
plt.ylabel('Price')
plt.show()
maxlags = 10
adfResult = sm.tsa.stattools.adfuller(read,maxlags)
output = pd.DataFrame(index=['Test Statistic Value', "p-value", "Lags Used", "Number of Observations Used",
"Critical Value(1%)", "Critical Value(5%)", "Critical Value(10%)"],
columns=['value'])
output['value']['Test Statistic Value'] = adfResult[0]
output['value']['p-value'] = adfResult[1]
output['value']['Lags Used'] = adfResult[2]
output['value']['Number of Observations Used'] = adfResult[3]
output['value']['Critical Value(1%)'] = adfResult[4]['1%']
output['value']['Critical Value(5%)'] = adfResult[4]['5%']
output['value']['Critical Value(10%)'] = adfResult[4]['10%']
output
value | |
---|---|
Test Statistic Value | -1.469033 |
p-value | 0.548771 |
Lags Used | 3 |
Number of Observations Used | 85 |
Critical Value(1%) | -3.509736 |
Critical Value(5%) | -2.896195 |
Critical Value(10%) | -2.585258 |
result = sm.tsa.stattools.coint(read,favorite)
result
(-3.2385833417160983, 0.06370889539002407, array([-4.02522283, -3.40644402, -3.09299669]))
varLagNum = 5
data1 = data.copy()
data1 = data1.astype('float64')
#建立对象,dataframe就是前面的data,varLagNum就是你自己定的滞后阶数
orgMod = sm.tsa.VARMAX(data1,order=(varLagNum,0),trend='ct',exog=None)
#估计:就是模型
fitMod = orgMod.fit(maxiter=1000,disp=False)
# 打印统计结果
print(fitMod.summary())
# 获得模型残差
resid = fitMod.resid
result = {'fitMod':fitMod,'resid':resid}
D:\softwares\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:471: ValueWarning: No frequency information was provided, so inferred frequency D will be used. self._init_dates(dates, freq) D:\softwares\anaconda3\lib\site-packages\statsmodels\base\model.py:604: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals warnings.warn("Maximum Likelihood optimization failed to "
Statespace Model Results ========================================================================================= Dep. Variable: ['read', 'share', 'favorite'] No. Observations: 89 Model: VAR(5) Log Likelihood -837.433 + intercept AIC 1788.866 Date: Fri, 12 Aug 2022 BIC 1930.718 Time: 14:28:34 HQIC 1846.043 Sample: 04-27-2022 - 07-24-2022 Covariance Type: opg ==================================================================================== Ljung-Box (L1) (Q): 0.03, 0.12, 0.01 Jarque-Bera (JB): 382.75, 59.69, 1.04 Prob(Q): 0.86, 0.72, 0.94 Prob(JB): 0.00, 0.00, 0.59 Heteroskedasticity (H): 1.35, 3.40, 2.17 Skew: 2.55, 1.03, 0.23 Prob(H) (two-sided): 0.42, 0.00, 0.04 Kurtosis: 11.78, 6.44, 2.75 Results for equation read =============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------- intercept 2.9580 35.048 0.084 0.933 -65.734 71.650 drift 0.0233 0.081 0.287 0.774 -0.136 0.183 L1.read 0.3908 0.770 0.508 0.612 -1.118 1.899 L1.share -0.9649 8.811 -0.110 0.913 -18.235 16.305 L1.favorite -1.8398 10.158 -0.181 0.856 -21.750 18.070 L2.read -0.1233 0.813 -0.152 0.879 -1.717 1.470 L2.share 4.2776 7.833 0.546 0.585 -11.075 19.630 L2.favorite -1.4612 11.400 -0.128 0.898 -23.806 20.883 L3.read 0.2678 0.774 0.346 0.729 -1.248 1.784 L3.share -2.8585 7.657 -0.373 0.709 -17.866 12.149 L3.favorite -0.3737 7.357 -0.051 0.959 -14.793 14.046 L4.read -0.0810 0.957 -0.085 0.932 -1.956 1.794 L4.share 1.8967 8.934 0.212 0.832 -15.614 19.408 L4.favorite 1.2625 10.954 0.115 0.908 -20.207 22.732 L5.read -0.1281 0.799 -0.160 0.873 -1.693 1.437 L5.share -0.4269 7.001 -0.061 0.951 -14.149 13.296 L5.favorite 4.5810 10.678 0.429 0.668 -16.347 25.509 Results for equation share =============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------- drift 0.8390 0.970 0.865 0.387 -1.063 2.740 intercept -0.4090 1.618 -0.253 0.800 -3.581 2.763 L1.read 0.0221 0.062 0.354 0.723 -0.100 0.145 L1.share 0.0661 0.548 0.121 0.904 -1.007 1.139 L1.favorite -0.4312 0.759 -0.568 0.570 -1.918 1.056 L2.read 0.0016 0.043 0.037 0.970 -0.083 0.086 L2.share 0.2762 0.505 0.547 0.585 -0.714 1.266 L2.favorite 0.0003 0.742 0.000 1.000 -1.454 1.455 L3.read 0.0190 0.067 0.285 0.775 -0.112 0.150 L3.share -0.1697 0.607 -0.280 0.780 -1.359 1.020 L3.favorite -0.2594 0.587 -0.442 0.658 -1.409 0.890 L4.read 0.0417 0.063 0.659 0.510 -0.082 0.166 L4.share -0.4493 0.629 -0.714 0.475 -1.682 0.783 L4.favorite -0.1593 0.886 -0.180 0.857 -1.897 1.578 L5.read -0.0276 0.053 -0.525 0.599 -0.131 0.075 L5.share 0.2678 0.635 0.422 0.673 -0.977 1.512 L5.favorite 0.4319 0.784 0.551 0.582 -1.104 1.968 Results for equation favorite =============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------- intercept 1.9110 3.322 0.575 0.565 -4.601 8.423 drift 0.0199 0.039 0.507 0.612 -0.057 0.097 L1.read 0.0312 0.014 2.182 0.029 0.003 0.059 L1.share -0.0077 0.332 -0.023 0.982 -0.658 0.643 L1.favorite -0.2382 0.347 -0.687 0.492 -0.917 0.441 L2.read -0.0222 0.035 -0.635 0.525 -0.091 0.046 L2.share 0.2652 0.289 0.916 0.359 -0.302 0.833 L2.favorite 0.0222 0.486 0.046 0.964 -0.931 0.975 L3.read 0.0071 0.024 0.291 0.771 -0.041 0.055 L3.share -0.2550 0.304 -0.840 0.401 -0.850 0.340 L3.favorite 0.1163 0.249 0.468 0.640 -0.371 0.603 L4.read 0.0001 0.019 0.007 0.994 -0.037 0.038 L4.share 0.1197 0.351 0.341 0.733 -0.568 0.808 L4.favorite 0.4411 0.406 1.088 0.277 -0.354 1.236 L5.read -0.0055 0.032 -0.170 0.865 -0.068 0.058 L5.share -0.1079 0.362 -0.298 0.766 -0.817 0.602 L5.favorite 0.1974 0.415 0.475 0.635 -0.617 1.012 Error covariance matrix =========================================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------------------- sqrt.var.read 42.7984 9.255 4.624 0.000 24.658 60.938 sqrt.cov.read.share 2.4967 0.765 3.264 0.001 0.997 3.996 sqrt.var.share 2.5223 0.499 5.051 0.000 1.543 3.501 sqrt.cov.read.favorite 1.1851 0.606 1.954 0.051 -0.003 2.374 sqrt.cov.share.favorite 0.5452 0.640 0.852 0.394 -0.709 1.800 sqrt.var.favorite 1.6856 0.463 3.643 0.000 0.779 2.592 =========================================================================================== Warnings: [1] Covariance matrix calculated using the outer product of gradients (complex-step).
fitMod.predict(0,180).plot.line()
<AxesSubplot:>
data.plot.line()
<AxesSubplot:xlabel='date'>
# 原假设:无漂移(平稳),备择假设:有漂移(不平稳)
result = statsmodels.stats.diagnostic.breaks_cusumolsresid(resid.values)
result
(1.0457107767255178, 0.22418353062921537, [(1, 1.63), (5, 1.36), (10, 1.22)])
# orthogonalized=True,代表采用乔里斯基正交
terms = 7
ax = fitMod.impulse_responses(terms, orthogonalized=True).plot(figsize=(12, 8))
plt.show()
md = sm.tsa.VAR(data)
re = md.fit(2)
fevd = re.fevd(10)
# 打印出方差分解的结果
print(fevd.summary())
# 画图
fevd.plot(figsize=(12, 16))
plt.show()
D:\softwares\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:471: ValueWarning: No frequency information was provided, so inferred frequency D will be used. self._init_dates(dates, freq)
FEVD for read read share favorite 0 1.000000 0.000000 0.000000 1 0.981561 0.017771 0.000668 2 0.976132 0.022673 0.001195 3 0.977319 0.020787 0.001894 4 0.977150 0.021029 0.001821 5 0.977706 0.020449 0.001846 6 0.977913 0.020286 0.001801 7 0.978109 0.020106 0.001785 8 0.978225 0.020007 0.001768 9 0.978306 0.019935 0.001758 FEVD for share read share favorite 0 0.442184 0.557816 0.000000 1 0.476581 0.506962 0.016457 2 0.518667 0.463617 0.017717 3 0.543950 0.438130 0.017920 4 0.559162 0.423460 0.017378 5 0.569874 0.413091 0.017036 6 0.576469 0.406765 0.016766 7 0.581053 0.402358 0.016589 8 0.584047 0.399483 0.016470 9 0.586096 0.397515 0.016389 FEVD for favorite read share favorite 0 0.355678 0.010353 0.633968 1 0.459580 0.022902 0.517517 2 0.488364 0.033642 0.477994 3 0.520384 0.031744 0.447871 4 0.535799 0.031208 0.432993 5 0.547580 0.030690 0.421730 6 0.554646 0.030348 0.415005 7 0.559544 0.030146 0.410309 8 0.562769 0.029994 0.407238 9 0.564957 0.029899 0.405144 None
md = sm.tsa.VAR(data)
re = md.fit(2)
fevd = re.fevd(10)
# 打印出方差分解的结果
print(fevd.summary())
# 画图
fevd.plot(figsize=(12, 16))
plt.show()
D:\softwares\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:471: ValueWarning: No frequency information was provided, so inferred frequency D will be used. self._init_dates(dates, freq)
FEVD for read read share favorite 0 1.000000 0.000000 0.000000 1 0.981561 0.017771 0.000668 2 0.976132 0.022673 0.001195 3 0.977319 0.020787 0.001894 4 0.977150 0.021029 0.001821 5 0.977706 0.020449 0.001846 6 0.977913 0.020286 0.001801 7 0.978109 0.020106 0.001785 8 0.978225 0.020007 0.001768 9 0.978306 0.019935 0.001758 FEVD for share read share favorite 0 0.442184 0.557816 0.000000 1 0.476581 0.506962 0.016457 2 0.518667 0.463617 0.017717 3 0.543950 0.438130 0.017920 4 0.559162 0.423460 0.017378 5 0.569874 0.413091 0.017036 6 0.576469 0.406765 0.016766 7 0.581053 0.402358 0.016589 8 0.584047 0.399483 0.016470 9 0.586096 0.397515 0.016389 FEVD for favorite read share favorite 0 0.355678 0.010353 0.633968 1 0.459580 0.022902 0.517517 2 0.488364 0.033642 0.477994 3 0.520384 0.031744 0.447871 4 0.535799 0.031208 0.432993 5 0.547580 0.030690 0.421730 6 0.554646 0.030348 0.415005 7 0.559544 0.030146 0.410309 8 0.562769 0.029994 0.407238 9 0.564957 0.029899 0.405144 None