1、想把序列里边的3种协议(第二位置)换成对应onehot编码
原始样本例子如:
0,tcp,http,SF,181,5450,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,9,9,1.00,0.00,0.11,0.00,0.00,0.00,0.00,0.00,normal.
2、代码如下,
def handleProtocol ( input ):
protoclo_list = [ 'tcp', 'udp', 'icmp' ]
if input[ 1 ] in protoclo_list:
a=find_index ( input[ 1 ], protoclo_list )[ 0 ]#返回x在y数组中的序列号
values = array ( protoclo_list )
print ( values )
# integer encode
label_encoder = LabelEncoder ( )
integer_encoded = label_encoder.fit_transform ( values )
print ( integer_encoded )
# binary encode
n_sample = len ( integer_encoded )
n_class = max ( integer_encoded ) + 1
onehot_labels = np.zeros ( (n_sample, n_class) ) # 长度行,种类列的矩阵
onehot_labels[ np.arange ( n_sample ), integer_encoded ] = 1 # 有label对应内容的值为1
return onehot_labels[a]
求指点问题
要想使用one_hot编码,建议直接调用pandas里面的get_dummies函数。比如你把数据放进dataframe里,然后直接
pd.get_dummies(columns='proto')就可以了