第一次学习使用sklearn库,库自带的数据可以,但用了我自己爬取的数据就不行了,希望有人能指点一下,非常感谢。
代码
encoder = OneHotEncoder()
data_X = encoder.fit_transform(data_X)
data_Y = encoder.fit_transform(np.array(data_Y).reshape(-1,1))
X_train, X_test, Y_train, Y_test = train_test_split(data_X, data_Y, test_size=0.3, random_state=41)
print(X_train)
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, Y_train)
#knn.predict(Y_test)
knn.score(X_train,Y_train)
报错
Traceback (most recent call last):
File "C:\Users\20673\AppData\Local\Programs\Python\Python39\lib\code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "C:\Program Files\JetBrains\PyCharm 2021.3.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "C:\Program Files\JetBrains\PyCharm 2021.3.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "D:/untitled/3.py", line 31, in <module>
knn.score(X_train,Y_train)
File "D:\untitled\venv\lib\site-packages\sklearn\base.py", line 500, in score
return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
File "D:\untitled\venv\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "D:\untitled\venv\lib\site-packages\sklearn\metrics\_classification.py", line 202, in accuracy_score
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "D:\untitled\venv\lib\site-packages\sklearn\metrics\_classification.py", line 85, in _check_targets
type_pred = type_of_target(y_pred)
File "D:\untitled\venv\lib\site-packages\sklearn\utils\multiclass.py", line 261, in type_of_target
if is_multilabel(y):
File "D:\untitled\venv\lib\site-packages\sklearn\utils\multiclass.py", line 163, in is_multilabel
labels = np.unique(y)
File "<__array_function__ internals>", line 5, in unique
File "D:\untitled\venv\lib\site-packages\numpy\lib\arraysetops.py", line 272, in unique
ret = _unique1d(ar, return_index, return_inverse, return_counts)
File "D:\untitled\venv\lib\site-packages\numpy\lib\arraysetops.py", line 333, in _unique1d
ar.sort()
File "D:\untitled\venv\lib\site-packages\scipy\sparse\base.py", line 283, in __bool__
raise ValueError("The truth value of an array with more than one "
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
因为数据中有汉字我在网上搜到可以使用独热编码,但是我一直不明白使用后数据变成有括号这种是代表了什么意思
(0, 188) 1.0
(0, 1175) 1.0
(1, 565) 1.0
(1, 1802) 1.0
(2, 328) 1.0
(2, 1827) 1.0
希望有人能帮我解答一下问题
数据变成有括号这种事稀疏矩阵(sparse矩阵),你也可以理解为他是一个矩阵arr(二维),他的括号里面的你可以理解是坐标
例如(0,188)就是arr[0][188]的值是1,下面同理,因为one-hot编码的矩阵太长,不可能每次都遍历寻找1,所以很多时候都是采用稀疏矩阵
训练数据的问题,你把稀疏矩阵转矩阵试试