请问为何使用borderline smote算法未达到样本均衡呢?
代码如下:
使用smote算法:
from imblearn.over_sampling import SMOTE
nn_k = NearestNeighbors(n_neighbors=3)
bsmote = SMOTE(random_state=42,k_neighbors=nn_k)
X_train_smote,y_train_smote = bsmote.fit_resample(X_train,y_train)
print(sorted(Counter(y_train_smote).items()))
结果:[(0, 136), (1, 136), (2, 136)] #样本已均衡
使用borderline smote算法:
from imblearn.over_sampling import BorderlineSMOTE
nn_k = NearestNeighbors(n_neighbors=3)
bsmote = BorderlineSMOTE(random_state=42,k_neighbors=nn_k)
X_train_smote,y_train_smote = bsmote.fit_resample(X_train,y_train)
print(sorted(Counter(y_train_smote).items()))
结果:[(0, 136), (1, 136), (2, 3)] #样本未均衡
类别2样本数量太少了,可能是因为分布太稀疏。或者调整一下 k_neighbors 的值,看看是否能够改善样本均衡的结果