在使用sklearn的TweedieRegressor模块拟合历史数据的过程中,出现“overflow encountered in exp”以及“invalid value encountered in true_divide”的报错,同时拟合结果所有的参数都为0.
train_ctp, test_ctp= train_test_split(policy_glm_ctp, test_size=0.3, random_state=629)
for j in ['train','test']:
exec("{}_ctp_x={}_ctp.drop(['ee','ninc','ult','rp','frqc','svrt'],axis=1)".format(j,j))
exec("{}_ctp_x_dummy = pd.get_dummies(data={}_ctp_x, drop_first=False)".format(j,j))
exec("{}_ctp_x_dummy.drop(['age_g_(45, 50]','airbag_g_1-2','branch3_湖州中支公司','brandfamilycode_-----',\
'bsnssclass_续保','channel_CH02','countrynature_合资车','curbweight_g_(1000, 1500]','exhaust_g_(1.9, 2.0]',\
'gender_1','ncdclass0_c00','ncdclass0_com_o-3','oiltype_汽油','power_g_缺失','pricejy_g_(100000, 150000]',\
'scoreplat_com_g_缺失','scoreplat_g_缺失','seat_5','vehicleclass_轿车类及其他','vhlage_0'],axis=1,inplace=True\
)".format(j))
model_ctp=linear_model.TweedieRegressor(link='log',power=1.5,max_iter=1000)
model_ctp.fit(train_ctp_x_dummy,train_ctp['rp'],sample_weight=train_ctp['ee'])
print(model_ctp.score(train_ctp_x_dummy, train_ctp['rp'],sample_weight=train_ctp['ee']))
print(model_ctp.coef_)
C:\Users\xujianbin\Anaconda3\lib\site-packages\sklearn\linear_model\_glm\link.py:90: RuntimeWarning: overflow encountered in exp
return np.exp(lin_pred)
C:\Users\xujianbin\Anaconda3\lib\site-packages\sklearn\linear_model\_glm\link.py:93: RuntimeWarning: overflow encountered in exp
return np.exp(lin_pred)
C:\Users\xujianbin\Anaconda3\lib\site-packages\sklearn\_loss\glm_distribution.py:132: RuntimeWarning: invalid value encountered in true_divide
return -2 * (y - y_pred) / self.unit_variance(y_pred)
-2.220446049250313e-16
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
当我删除fit过程中的sample_weight=train_ctp['ee']参数后,可以得到正常的结果
model_ctp.fit(train_ctp_x_dummy,train_ctp['rp'])
print(model_ctp.score(train_ctp_x_dummy, train_ctp['rp']))
print(model_ctp.coef_)
0.025823470325313402
[-0.00024694 0.11349995 0.01234578 0.0950132 0.04150193 0.05111229
0.08149291 0.08414812 0.03324968 0.22793334 0.07515022 0.00856634
-0.05111691 -0.04949008 -0.0343207 0.05717735 -0.03742427 0.
-0.00889182 -0.01089217 0.05836472 0.0654532 0. 0.11701838
-0.18117352 0.05231738 -0.18180506 0.17178391 -0.05501642 -0.08015561
0.02080678 -0.03350041 -0.01551873 -0.05747879 -0.03502268 -0.07745178
-0.01459489 0.04801213 0.09053479 0.05117658 -0.13038004 0.04854011
0.14954648 0.0963431 -0.07634689 0.01314994 -0.00278057 0.03202863
0.1119826 0.08074991 -0.11225946 -0.15717543 0.05060687 0.0575444
0.01657435 0.05230751 0.00065387 -0.00871944 -0.00061494 0.17111251
0.06346847 -0.1249479 -0.07564325 0.03737487 0.02567693 -0.10369658
0.0880398 0.08698734 -0.05848983 0.12325772 -0.04991964 0.12909947
-0.00357975 -0.0177114 0.07283991 -0.02153609 0.03255424 0.0339337
-0.09179229 0.01860906 -0.18465931 0.01794021 -0.06896402 0.0216197
0.01824635 0.05005794 0.10840158 -0.02131447 0.19873738 0.06512552
-0.00333001 -0.0037739 -0.01094982 0.03787879 0.11777489 0.00371876
0.05529405 0.12074268 0.13251841 -0.03552237 0.04622581 0.06455707
-0.01694808 -0.03056059 -0.04209729 0.09313118 -0.17195307 0.
-0.05929124 0.13049018 -0.02568634 -0.0765889 0.06844005 -0.09255841
-0.17662861 0.05441542 0.03770555 0.23487849 -0.08591625 -0.06586151
0.0109495 0.14869793 -0.06392443 0.00470135 0.1557998 0.07705521
0.01774447 0.13763442 -0.07773967 0.02061869 -0.06287382 0.06093335
-0.10945242 0.07563117 -0.01536319 0.06458289 -0.04258964 0.14643477
-0.07130573 -0.11505023 0.00772579]
由于我的权重变量ee都是≥0的,最大是1.002740,所以猜测是不是ee=0的数据引起的异常,将这部分数据剔除后(实际只减少了287867条数据中的1条)重新运行带上sample_weight参数的代码,出现了同样的报错;剔除后运行不带sample_weight参数的代码,也出现也新的报错(overflow encountered in power)。
policy_glm_ctp=policy_glm_ctp.loc[policy_glm_ctp['ee']>0,:]
train_ctp, test_ctp= train_test_split(policy_glm_ctp, test_size=0.3, random_state=629)
#中间未变动的代码省略
model_ctp.fit(train_ctp_x_dummy,train_ctp['rp'],sample_weight=train_ctp['ee'])
print(model_ctp.score(train_ctp_x_dummy, train_ctp['rp'],sample_weight=train_ctp['ee']))
print(model_ctp.coef_)
C:\Users\xujianbin\Anaconda3\lib\site-packages\sklearn\linear_model\_glm\link.py:90: RuntimeWarning: overflow encountered in exp
return np.exp(lin_pred)
C:\Users\xujianbin\Anaconda3\lib\site-packages\sklearn\linear_model\_glm\link.py:93: RuntimeWarning: overflow encountered in exp
return np.exp(lin_pred)
C:\Users\xujianbin\Anaconda3\lib\site-packages\sklearn\_loss\glm_distribution.py:132: RuntimeWarning: invalid value encountered in true_divide
return -2 * (y - y_pred) / self.unit_variance(y_pred)
0.0
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
policy_glm_ctp=policy_glm_ctp.loc[policy_glm_ctp['ee']>0,:]
train_ctp, test_ctp= train_test_split(policy_glm_ctp, test_size=0.3, random_state=629)
#中间未变动的代码省略
model_ctp.fit(train_ctp_x_dummy,train_ctp['rp'])
print(model_ctp.score(train_ctp_x_dummy, train_ctp['rp']))
print(model_ctp.coef_)
C:\Users\xujianbin\Anaconda3\lib\site-packages\sklearn\_loss\glm_distribution.py:246: RuntimeWarning: overflow encountered in power
return np.power(y_pred, self.power)
0.0
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
最终是希望能够在保留sample_weight参数的情况下得到正常的拟合结果。
感谢各位了!
可能数据量太小导致计算结果有异常