以金融价格为例,传统的时间序列模型比如ARIMA,ARIMA-GARCH等,只分析价格自身的变化, 模型的形式为: $$ y_{t}=\beta_{1} \cdot y_{t-1}+\beta_{2} \cdot y_{t-2}+\ldots $$ 其中 $y_{t-1}$ 称为自身的滞后项。 但是VAR模型除了分析自身滞后项的影响外,还分析其他相关因素的滞后项对末来值产生的影响,模 型的形式为: $$ y_{t}=\beta_{1} \cdot y_{t-1}+\alpha_{1} \cdot x_{t-1}+\beta_{2} \cdot y_{t-2}+\alpha_{2} \cdot x_{t-2}+\ldots $$ 其中 $x_{t-1}$ 就是其他因子的滞后项。 总结一下,就是可以把VAR模型看做是集合多元线性回归的优点(可以加入多个因子) 以及时间序列 模型的优点(可以分析滞后项的影响)的综合模型。
VAR其实是一类模型,以上是最基础的VAR模型形式,其他还有SVAR,CVAR,VECM,同统称为VAR 类模型。
这种数学模型都有固定的建模步骤:
具体的实现步骤一般是,把滞后项的阶数列一个范围,比如1-5,然后直接建模,其他啥都不看,先看 AIC,BIC,LR的值。一般符合条件的不会只有一个,可以挑2-3个最好的,继续进行。
VAR除了对原始数据要进行平稳处理,估计出来的参数还需要检验参数稳定性。 这是为了查看模型在拟合时,数据样本有没有发生结构性变化。 有两张检验方法,这两种方法的基本概念是:
第二个是:cusum检验,模型残差累积和在一个区间内波动,不超出区间。 这里要注意的是CUSUM检验的原价设 $(\mathrm{HO})$ :系数平稳,备择假设才是不平稳。所以 CUSUM结果要无法拒绝原假设才算通过。 只有通过参数稳定性检验的模型才具有预测能力,进行脉冲响应和方法分解分析才有意义。
7) 使用乔里斯基正交化残差进行脉冲响应分析 举例:要分析和预测的是 $Y$ ,影响 $Y$ 的有两个因子 $X 1 , X 2$ 。 脉冲响应是1对 1 ,根据以上条件,就要做两个脉冲响应分析, 分别是:Y和X1,Y和X2。 看看不同因子上升或者下降,对Y的冲击的程度和方式(Y上升还是下降),以及持续时间。
VAR建模的时候以上面的条件为例,其实模型估计参数时会给出三个3个方程(应变量各自不同):
方程1: $y_{t}=\beta_{1} \cdot y_{t-1}+\alpha_{1} \cdot X 1_{t-1}+\Theta_{1} \cdot X 2_{t-1}+\varepsilon_{t}$
方程2: $X 1_{t}=\beta_{1} \cdot X 1_{t-1}+\alpha_{1} \cdot y_{t-1}+\Theta_{1} \cdot X 2_{t-1}+\eta_{t}$
方程3: $X 2_{t}=\beta_{1} \cdot X 2_{t-1}+\alpha_{1} \cdot y_{t-1}+\Theta_{1} \cdot X 1_{t-1}+\omega_{t}$
方程1的残差序列: $\varepsilon_{t}$ 方程2的残差序列: $\eta_{t}$ 方差3的残差序列: $\omega_{t}$ 三个方程的乔里斯基正交化的步骤就是:
最后用正交4/正交 5 ,得到的序列就是乔里斯基正交化残差了。 乔里斯基正交化之前要对方程的变量按重要性排序,更重要的放在分子上。
以上的步骤是不是很庞大,看着很麻烦? 但是电脑都会快速处理好的。
1)导入模块
# 模型相关包
import statsmodels.api as sm
import statsmodels.stats.diagnostic
# 画图包
import matplotlib.pyplot as plt
# 其他包
import pandas as pd
import numpy as np
data = pd.read_excel('data/gong0809.xlsx',index_col=0)
read = data['read']
share = data['share']
favorite = data['favorite']
2)画序列相关图
fig = plt.figure(figsize=(12,8))
plt.plot(read,'r',label='read')
plt.plot(share,'g',label='share')
plt.grid(True)
plt.axis('tight')
plt.legend(loc=0)
plt.ylabel('Price')
plt.show()
3)ADF单位根 python里的ADF检验结果就是下面的adfResult,我这里用output整理了一下,方便浏览。童鞋们也可以print结果,然后自行整理
maxlags = 10
adfResult = sm.tsa.stattools.adfuller(read,maxlags)
output = pd.DataFrame(index=['Test Statistic Value', "p-value", "Lags Used", "Number of Observations Used",
                                         "Critical Value(1%)", "Critical Value(5%)", "Critical Value(10%)"],
                                  columns=['value'])
output['value']['Test Statistic Value'] = adfResult[0]
output['value']['p-value'] = adfResult[1]
output['value']['Lags Used'] = adfResult[2]
output['value']['Number of Observations Used'] = adfResult[3]
output['value']['Critical Value(1%)'] = adfResult[4]['1%']
output['value']['Critical Value(5%)'] = adfResult[4]['5%']
output['value']['Critical Value(10%)'] = adfResult[4]['10%']
4)协整检验 python里面的协整检验通过coint()这个函数进行的,返回P-value值,越小,说明协整关系越强
result = sm.tsa.stattools.coint(read,favorite)
5)模型估计+定阶 这里PYTHON真的很烦,python有两套var估计,一个是VARMAX,一个是VAR。我看了官方文档后,觉得估计参数和定阶还是用VARMAX最好,因为可以返回很多东西,尤其是summary()里面的统计结果特别详细,直接包含了AIC,BIC,HQIC。
varLagNum = 5
data1 = data.copy()
data1 = data1.astype('float64')
#建立对象,dataframe就是前面的data,varLagNum就是你自己定的滞后阶数
orgMod = sm.tsa.VARMAX(data1,order=(varLagNum,0),trend='ct',exog=None)
#估计:就是模型
fitMod = orgMod.fit(maxiter=1000,disp=False)
# 打印统计结果
print(fitMod.summary())
# 获得模型残差
resid = fitMod.resid
result = {'fitMod':fitMod,'resid':resid}
6)系数平稳检验:CUSUM检验 这里也注意,Python这里不像EVIEWS,python没有办法算AR根,弄不到AR根图,但是python可以进行cusum检验。返回3各变量,第2个是P-value值,记得我之前说的吗,cusum检验要无法拒绝原假设,也就是说P-value值要大于0.05
# 原假设:无漂移(平稳),备择假设:有漂移(不平稳)
result = statsmodels.stats.diagnostic.breaks_cusumolsresid(resid)
7)脉冲响应图
# orthogonalized=True,代表采用乔里斯基正交 
ax = fitMod.impulse_responses(terms, orthogonalized=True).plot(figsize=(12, 8))
plt.show()
8)方差分解图
md = sm.tsa.VAR(dataFrame)
re = md.fit(2)
fevd = re.fevd(10)
# 打印出方差分解的结果
print(fevd.summary())
# 画图
fevd.plot(figsize=(12, 16))
plt.show()
参考资料
# 模型相关包
import statsmodels.api as sm
import statsmodels.stats.diagnostic
# 画图包
import matplotlib.pyplot as plt
# 其他包
import pandas as pd
import numpy as np
data = pd.read_excel('data/gong0809.xlsx',index_col=0)
data
| read | share | favorite | |
|---|---|---|---|
| date | |||
| 2022-04-27 | 20 | 1 | 0 | 
| 2022-04-28 | 55 | 3 | 3 | 
| 2022-04-29 | 29 | 2 | 0 | 
| 2022-04-30 | 16 | 1 | 0 | 
| 2022-05-01 | 22 | 3 | 0 | 
| ... | ... | ... | ... | 
| 2022-07-20 | 220 | 9 | 9 | 
| 2022-07-21 | 179 | 9 | 4 | 
| 2022-07-22 | 138 | 8 | 10 | 
| 2022-07-23 | 187 | 8 | 4 | 
| 2022-07-24 | 183 | 27 | 15 | 
89 rows × 3 columns
read = data['read']
share = data['share']
favorite = data['favorite']
fig = plt.figure(figsize=(12,8))
plt.plot(read,'r',label='read')
plt.plot(share,'g',label='share')
plt.grid(True)
plt.axis('tight')
plt.legend(loc=0)
plt.ylabel('Price')
plt.show()
maxlags = 10
adfResult = sm.tsa.stattools.adfuller(read,maxlags)
output = pd.DataFrame(index=['Test Statistic Value', "p-value", "Lags Used", "Number of Observations Used",
                                         "Critical Value(1%)", "Critical Value(5%)", "Critical Value(10%)"],
                                  columns=['value'])
output['value']['Test Statistic Value'] = adfResult[0]
output['value']['p-value'] = adfResult[1]
output['value']['Lags Used'] = adfResult[2]
output['value']['Number of Observations Used'] = adfResult[3]
output['value']['Critical Value(1%)'] = adfResult[4]['1%']
output['value']['Critical Value(5%)'] = adfResult[4]['5%']
output['value']['Critical Value(10%)'] = adfResult[4]['10%']
output
| value | |
|---|---|
| Test Statistic Value | -1.469033 | 
| p-value | 0.548771 | 
| Lags Used | 3 | 
| Number of Observations Used | 85 | 
| Critical Value(1%) | -3.509736 | 
| Critical Value(5%) | -2.896195 | 
| Critical Value(10%) | -2.585258 | 
result = sm.tsa.stattools.coint(read,favorite)
result
(-3.2385833417160983, 0.06370889539002407, array([-4.02522283, -3.40644402, -3.09299669]))
varLagNum = 5
data1 = data.copy()
data1 = data1.astype('float64')
#建立对象,dataframe就是前面的data,varLagNum就是你自己定的滞后阶数
orgMod = sm.tsa.VARMAX(data1,order=(varLagNum,0),trend='ct',exog=None)
#估计:就是模型
fitMod = orgMod.fit(maxiter=1000,disp=False)
# 打印统计结果
print(fitMod.summary())
# 获得模型残差
resid = fitMod.resid
result = {'fitMod':fitMod,'resid':resid}
D:\softwares\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:471: ValueWarning: No frequency information was provided, so inferred frequency D will be used.
  self._init_dates(dates, freq)
D:\softwares\anaconda3\lib\site-packages\statsmodels\base\model.py:604: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "
                                 Statespace Model Results                                
=========================================================================================
Dep. Variable:     ['read', 'share', 'favorite']   No. Observations:                   89
Model:                                    VAR(5)   Log Likelihood                -837.433
                                     + intercept   AIC                           1788.866
Date:                           Fri, 12 Aug 2022   BIC                           1930.718
Time:                                   14:28:34   HQIC                          1846.043
Sample:                               04-27-2022                                         
                                    - 07-24-2022                                         
Covariance Type:                             opg                                         
====================================================================================
Ljung-Box (L1) (Q):       0.03, 0.12, 0.01   Jarque-Bera (JB):   382.75, 59.69, 1.04
Prob(Q):                  0.86, 0.72, 0.94   Prob(JB):              0.00, 0.00, 0.59
Heteroskedasticity (H):   1.35, 3.40, 2.17   Skew:                  2.55, 1.03, 0.23
Prob(H) (two-sided):      0.42, 0.00, 0.04   Kurtosis:             11.78, 6.44, 2.75
                           Results for equation read                           
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
intercept       2.9580     35.048      0.084      0.933     -65.734      71.650
drift           0.0233      0.081      0.287      0.774      -0.136       0.183
L1.read         0.3908      0.770      0.508      0.612      -1.118       1.899
L1.share       -0.9649      8.811     -0.110      0.913     -18.235      16.305
L1.favorite    -1.8398     10.158     -0.181      0.856     -21.750      18.070
L2.read        -0.1233      0.813     -0.152      0.879      -1.717       1.470
L2.share        4.2776      7.833      0.546      0.585     -11.075      19.630
L2.favorite    -1.4612     11.400     -0.128      0.898     -23.806      20.883
L3.read         0.2678      0.774      0.346      0.729      -1.248       1.784
L3.share       -2.8585      7.657     -0.373      0.709     -17.866      12.149
L3.favorite    -0.3737      7.357     -0.051      0.959     -14.793      14.046
L4.read        -0.0810      0.957     -0.085      0.932      -1.956       1.794
L4.share        1.8967      8.934      0.212      0.832     -15.614      19.408
L4.favorite     1.2625     10.954      0.115      0.908     -20.207      22.732
L5.read        -0.1281      0.799     -0.160      0.873      -1.693       1.437
L5.share       -0.4269      7.001     -0.061      0.951     -14.149      13.296
L5.favorite     4.5810     10.678      0.429      0.668     -16.347      25.509
                           Results for equation share                          
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
drift           0.8390      0.970      0.865      0.387      -1.063       2.740
intercept      -0.4090      1.618     -0.253      0.800      -3.581       2.763
L1.read         0.0221      0.062      0.354      0.723      -0.100       0.145
L1.share        0.0661      0.548      0.121      0.904      -1.007       1.139
L1.favorite    -0.4312      0.759     -0.568      0.570      -1.918       1.056
L2.read         0.0016      0.043      0.037      0.970      -0.083       0.086
L2.share        0.2762      0.505      0.547      0.585      -0.714       1.266
L2.favorite     0.0003      0.742      0.000      1.000      -1.454       1.455
L3.read         0.0190      0.067      0.285      0.775      -0.112       0.150
L3.share       -0.1697      0.607     -0.280      0.780      -1.359       1.020
L3.favorite    -0.2594      0.587     -0.442      0.658      -1.409       0.890
L4.read         0.0417      0.063      0.659      0.510      -0.082       0.166
L4.share       -0.4493      0.629     -0.714      0.475      -1.682       0.783
L4.favorite    -0.1593      0.886     -0.180      0.857      -1.897       1.578
L5.read        -0.0276      0.053     -0.525      0.599      -0.131       0.075
L5.share        0.2678      0.635      0.422      0.673      -0.977       1.512
L5.favorite     0.4319      0.784      0.551      0.582      -1.104       1.968
                         Results for equation favorite                         
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
intercept       1.9110      3.322      0.575      0.565      -4.601       8.423
drift           0.0199      0.039      0.507      0.612      -0.057       0.097
L1.read         0.0312      0.014      2.182      0.029       0.003       0.059
L1.share       -0.0077      0.332     -0.023      0.982      -0.658       0.643
L1.favorite    -0.2382      0.347     -0.687      0.492      -0.917       0.441
L2.read        -0.0222      0.035     -0.635      0.525      -0.091       0.046
L2.share        0.2652      0.289      0.916      0.359      -0.302       0.833
L2.favorite     0.0222      0.486      0.046      0.964      -0.931       0.975
L3.read         0.0071      0.024      0.291      0.771      -0.041       0.055
L3.share       -0.2550      0.304     -0.840      0.401      -0.850       0.340
L3.favorite     0.1163      0.249      0.468      0.640      -0.371       0.603
L4.read         0.0001      0.019      0.007      0.994      -0.037       0.038
L4.share        0.1197      0.351      0.341      0.733      -0.568       0.808
L4.favorite     0.4411      0.406      1.088      0.277      -0.354       1.236
L5.read        -0.0055      0.032     -0.170      0.865      -0.068       0.058
L5.share       -0.1079      0.362     -0.298      0.766      -0.817       0.602
L5.favorite     0.1974      0.415      0.475      0.635      -0.617       1.012
                                  Error covariance matrix                                  
===========================================================================================
                              coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------------------
sqrt.var.read              42.7984      9.255      4.624      0.000      24.658      60.938
sqrt.cov.read.share         2.4967      0.765      3.264      0.001       0.997       3.996
sqrt.var.share              2.5223      0.499      5.051      0.000       1.543       3.501
sqrt.cov.read.favorite      1.1851      0.606      1.954      0.051      -0.003       2.374
sqrt.cov.share.favorite     0.5452      0.640      0.852      0.394      -0.709       1.800
sqrt.var.favorite           1.6856      0.463      3.643      0.000       0.779       2.592
===========================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
fitMod.predict(0,180).plot.line()
<AxesSubplot:>
data.plot.line()
<AxesSubplot:xlabel='date'>
# 原假设:无漂移(平稳),备择假设:有漂移(不平稳)
result = statsmodels.stats.diagnostic.breaks_cusumolsresid(resid.values)
result
(1.0457107767255178, 0.22418353062921537, [(1, 1.63), (5, 1.36), (10, 1.22)])
# orthogonalized=True,代表采用乔里斯基正交 
terms = 7
ax = fitMod.impulse_responses(terms, orthogonalized=True).plot(figsize=(12, 8))
plt.show()
md = sm.tsa.VAR(data)
re = md.fit(2)
fevd = re.fevd(10)
# 打印出方差分解的结果
print(fevd.summary())
# 画图
fevd.plot(figsize=(12, 16))
plt.show()
D:\softwares\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:471: ValueWarning: No frequency information was provided, so inferred frequency D will be used. self._init_dates(dates, freq)
FEVD for read
         read     share  favorite
0    1.000000  0.000000  0.000000
1    0.981561  0.017771  0.000668
2    0.976132  0.022673  0.001195
3    0.977319  0.020787  0.001894
4    0.977150  0.021029  0.001821
5    0.977706  0.020449  0.001846
6    0.977913  0.020286  0.001801
7    0.978109  0.020106  0.001785
8    0.978225  0.020007  0.001768
9    0.978306  0.019935  0.001758
FEVD for share
         read     share  favorite
0    0.442184  0.557816  0.000000
1    0.476581  0.506962  0.016457
2    0.518667  0.463617  0.017717
3    0.543950  0.438130  0.017920
4    0.559162  0.423460  0.017378
5    0.569874  0.413091  0.017036
6    0.576469  0.406765  0.016766
7    0.581053  0.402358  0.016589
8    0.584047  0.399483  0.016470
9    0.586096  0.397515  0.016389
FEVD for favorite
         read     share  favorite
0    0.355678  0.010353  0.633968
1    0.459580  0.022902  0.517517
2    0.488364  0.033642  0.477994
3    0.520384  0.031744  0.447871
4    0.535799  0.031208  0.432993
5    0.547580  0.030690  0.421730
6    0.554646  0.030348  0.415005
7    0.559544  0.030146  0.410309
8    0.562769  0.029994  0.407238
9    0.564957  0.029899  0.405144
None
md = sm.tsa.VAR(data)
re = md.fit(2)
fevd = re.fevd(10)
# 打印出方差分解的结果
print(fevd.summary())
# 画图
fevd.plot(figsize=(12, 16))
plt.show()
D:\softwares\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:471: ValueWarning: No frequency information was provided, so inferred frequency D will be used. self._init_dates(dates, freq)
FEVD for read
         read     share  favorite
0    1.000000  0.000000  0.000000
1    0.981561  0.017771  0.000668
2    0.976132  0.022673  0.001195
3    0.977319  0.020787  0.001894
4    0.977150  0.021029  0.001821
5    0.977706  0.020449  0.001846
6    0.977913  0.020286  0.001801
7    0.978109  0.020106  0.001785
8    0.978225  0.020007  0.001768
9    0.978306  0.019935  0.001758
FEVD for share
         read     share  favorite
0    0.442184  0.557816  0.000000
1    0.476581  0.506962  0.016457
2    0.518667  0.463617  0.017717
3    0.543950  0.438130  0.017920
4    0.559162  0.423460  0.017378
5    0.569874  0.413091  0.017036
6    0.576469  0.406765  0.016766
7    0.581053  0.402358  0.016589
8    0.584047  0.399483  0.016470
9    0.586096  0.397515  0.016389
FEVD for favorite
         read     share  favorite
0    0.355678  0.010353  0.633968
1    0.459580  0.022902  0.517517
2    0.488364  0.033642  0.477994
3    0.520384  0.031744  0.447871
4    0.535799  0.031208  0.432993
5    0.547580  0.030690  0.421730
6    0.554646  0.030348  0.415005
7    0.559544  0.030146  0.410309
8    0.562769  0.029994  0.407238
9    0.564957  0.029899  0.405144
None