sklearn的TweedieRegressor拟合过程中报错“overflow encountered in exp”

问题遇到的现象和发生背景

在使用sklearn的TweedieRegressor模块拟合历史数据的过程中,出现“overflow encountered in exp”以及“invalid value encountered in true_divide”的报错,同时拟合结果所有的参数都为0.

问题相关代码,请勿粘贴截图
 train_ctp, test_ctp= train_test_split(policy_glm_ctp, test_size=0.3, random_state=629)
 
for j in ['train','test']:
    exec("{}_ctp_x={}_ctp.drop(['ee','ninc','ult','rp','frqc','svrt'],axis=1)".format(j,j))
    exec("{}_ctp_x_dummy = pd.get_dummies(data={}_ctp_x, drop_first=False)".format(j,j))
    exec("{}_ctp_x_dummy.drop(['age_g_(45, 50]','airbag_g_1-2','branch3_湖州中支公司','brandfamilycode_-----',\
         'bsnssclass_续保','channel_CH02','countrynature_合资车','curbweight_g_(1000, 1500]','exhaust_g_(1.9, 2.0]',\
         'gender_1','ncdclass0_c00','ncdclass0_com_o-3','oiltype_汽油','power_g_缺失','pricejy_g_(100000, 150000]',\
         'scoreplat_com_g_缺失','scoreplat_g_缺失','seat_5','vehicleclass_轿车类及其他','vhlage_0'],axis=1,inplace=True\
        )".format(j))
model_ctp=linear_model.TweedieRegressor(link='log',power=1.5,max_iter=1000)
model_ctp.fit(train_ctp_x_dummy,train_ctp['rp'],sample_weight=train_ctp['ee'])
print(model_ctp.score(train_ctp_x_dummy, train_ctp['rp'],sample_weight=train_ctp['ee']))
print(model_ctp.coef_)
运行结果及报错内容
C:\Users\xujianbin\Anaconda3\lib\site-packages\sklearn\linear_model\_glm\link.py:90: RuntimeWarning: overflow encountered in exp
  return np.exp(lin_pred)
C:\Users\xujianbin\Anaconda3\lib\site-packages\sklearn\linear_model\_glm\link.py:93: RuntimeWarning: overflow encountered in exp
  return np.exp(lin_pred)
C:\Users\xujianbin\Anaconda3\lib\site-packages\sklearn\_loss\glm_distribution.py:132: RuntimeWarning: invalid value encountered in true_divide
  return -2 * (y - y_pred) / self.unit_variance(y_pred)
-2.220446049250313e-16
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
我的解答思路和尝试过的方法

当我删除fit过程中的sample_weight=train_ctp['ee']参数后,可以得到正常的结果

model_ctp.fit(train_ctp_x_dummy,train_ctp['rp'])
print(model_ctp.score(train_ctp_x_dummy, train_ctp['rp']))
print(model_ctp.coef_)
 
0.025823470325313402
[-0.00024694  0.11349995  0.01234578  0.0950132   0.04150193  0.05111229
  0.08149291  0.08414812  0.03324968  0.22793334  0.07515022  0.00856634
 -0.05111691 -0.04949008 -0.0343207   0.05717735 -0.03742427  0.
 -0.00889182 -0.01089217  0.05836472  0.0654532   0.          0.11701838
 -0.18117352  0.05231738 -0.18180506  0.17178391 -0.05501642 -0.08015561
  0.02080678 -0.03350041 -0.01551873 -0.05747879 -0.03502268 -0.07745178
 -0.01459489  0.04801213  0.09053479  0.05117658 -0.13038004  0.04854011
  0.14954648  0.0963431  -0.07634689  0.01314994 -0.00278057  0.03202863
  0.1119826   0.08074991 -0.11225946 -0.15717543  0.05060687  0.0575444
  0.01657435  0.05230751  0.00065387 -0.00871944 -0.00061494  0.17111251
  0.06346847 -0.1249479  -0.07564325  0.03737487  0.02567693 -0.10369658
  0.0880398   0.08698734 -0.05848983  0.12325772 -0.04991964  0.12909947
 -0.00357975 -0.0177114   0.07283991 -0.02153609  0.03255424  0.0339337
 -0.09179229  0.01860906 -0.18465931  0.01794021 -0.06896402  0.0216197
  0.01824635  0.05005794  0.10840158 -0.02131447  0.19873738  0.06512552
 -0.00333001 -0.0037739  -0.01094982  0.03787879  0.11777489  0.00371876
  0.05529405  0.12074268  0.13251841 -0.03552237  0.04622581  0.06455707
 -0.01694808 -0.03056059 -0.04209729  0.09313118 -0.17195307  0.
 -0.05929124  0.13049018 -0.02568634 -0.0765889   0.06844005 -0.09255841
 -0.17662861  0.05441542  0.03770555  0.23487849 -0.08591625 -0.06586151
  0.0109495   0.14869793 -0.06392443  0.00470135  0.1557998   0.07705521
  0.01774447  0.13763442 -0.07773967  0.02061869 -0.06287382  0.06093335
 -0.10945242  0.07563117 -0.01536319  0.06458289 -0.04258964  0.14643477
 -0.07130573 -0.11505023  0.00772579]

由于我的权重变量ee都是≥0的,最大是1.002740,所以猜测是不是ee=0的数据引起的异常,将这部分数据剔除后(实际只减少了287867条数据中的1条)重新运行带上sample_weight参数的代码,出现了同样的报错;剔除后运行不带sample_weight参数的代码,也出现也新的报错(overflow encountered in power)。

policy_glm_ctp=policy_glm_ctp.loc[policy_glm_ctp['ee']>0,:]
train_ctp, test_ctp= train_test_split(policy_glm_ctp, test_size=0.3, random_state=629)
#中间未变动的代码省略
model_ctp.fit(train_ctp_x_dummy,train_ctp['rp'],sample_weight=train_ctp['ee'])
print(model_ctp.score(train_ctp_x_dummy, train_ctp['rp'],sample_weight=train_ctp['ee']))
print(model_ctp.coef_)
 
C:\Users\xujianbin\Anaconda3\lib\site-packages\sklearn\linear_model\_glm\link.py:90: RuntimeWarning: overflow encountered in exp
  return np.exp(lin_pred)
C:\Users\xujianbin\Anaconda3\lib\site-packages\sklearn\linear_model\_glm\link.py:93: RuntimeWarning: overflow encountered in exp
  return np.exp(lin_pred)
C:\Users\xujianbin\Anaconda3\lib\site-packages\sklearn\_loss\glm_distribution.py:132: RuntimeWarning: invalid value encountered in true_divide
  return -2 * (y - y_pred) / self.unit_variance(y_pred)
0.0
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
policy_glm_ctp=policy_glm_ctp.loc[policy_glm_ctp['ee']>0,:]
train_ctp, test_ctp= train_test_split(policy_glm_ctp, test_size=0.3, random_state=629)
#中间未变动的代码省略
model_ctp.fit(train_ctp_x_dummy,train_ctp['rp'])
print(model_ctp.score(train_ctp_x_dummy, train_ctp['rp']))
print(model_ctp.coef_)
 
C:\Users\xujianbin\Anaconda3\lib\site-packages\sklearn\_loss\glm_distribution.py:246: RuntimeWarning: overflow encountered in power
  return np.power(y_pred, self.power)
0.0
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
我想要达到的结果

最终是希望能够在保留sample_weight参数的情况下得到正常的拟合结果。

感谢各位了!

可能数据量太小导致计算结果有异常