请问如下代码是同时进行了过采样和下采样吗?另外,具体sampling_strategy应该怎么去用呢?
0 13634
1 6305
#SMOTE上采样:原少数类*1.5=9458个
from imblearn.combine import SMOTEENN
smo = SMOTEENN(sampling_strategy={1: 9458 },random_state=24)
tra1_x1, tra1_y1 = smo.fit_resample(train1.drop(['Pred','Date'], axis=1), train1['Pred'])
#下采样:原少数类*0.5*3.5=11034个
from imblearn.combine import SMOTETomek
rus = SMOTETomek(sampling_strategy={0: 11034 },random_state=24)
tra1_x1, tra1_y1 = rus.fit_resample(train1.drop(['Pred','Date'], axis=1), train1['Pred'])
print(tra1_x1.shape)
print((tra1_y1==1).sum()/len(tra1_y1))
ValueError: With over-sampling methods, the number of samples in a class should be greater or equal to the original number of samples. Originally, there is 13634 samples and 11034 samples are asked.
谢谢大家!
!好像上面是两种不同的方法;我换了一种方式,先过采样,再下采样也解决了问题,代码如下:
from imblearn.over_sampling import SMOTE
smote = SMOTE(sampling_strategy={1: 9458 },random_state=2021)
tra1_x1, tra1_y1 = smote.fit_resample(train1.drop(['Pred','Date'], axis=1), train1['Pred'])
print(tra1_x1.shape)
print((tra1_y1==1).sum()/len(tra1_y1))
from imblearn.under_sampling import RandomUnderSampler
cc = RandomUnderSampler(sampling_strategy={0:11034 },random_state=2021)
tra1_x2, tra1_y2 = cc.fit_resample(tra1_x1, tra1_y1)
print(tra1_x2.shape)
print((tra1_y2==1).sum()/len(tra1_y2))