决策树报错,X has 432 features, but DecisionTreeClassifier is expecting 803 features as input.?

我用决策树分析一个泰坦尼克号的死亡率,其他同学用别的算法是可以跑出来的,但是我只能跑出训练集的正确率,一用测试集就会报错

train = pd.read_csv(r'C:\Users\56484\Desktop\删了双引号的0,1替换成字母的训练集.csv')
test = pd.read_csv(r'C:\Users\56484\Desktop\测试集.csv')

确定目标值和特征值(加个copy是因为报错,我也不知道为什么报错)

x_train = train[["Parch", "SibSp", "Age", "Sex", "Pclass", "Ticket", "Fare", "Cabin", "Embarked"]].copy()
y_train = train[["Survived"]].copy()

x_test = test[["Parch", "SibSp", "Age", "Sex", "Pclass", "Ticket", "Fare", "Cabin", "Embarked"]].copy()
y_test = test[["Survived"]].copy()

缺失值处理(先用这个“print(x_test.isnull().sum())”看看test或train哪里有缺失值)

x_train["Age"].fillna(value=x_train["Age"].mean(), inplace=True)
x_test["Age"].fillna(value=x_train["Age"].mean(), inplace=True)

x_train["Cabin"] = x_train["Cabin"].fillna("")
x_test["Cabin"] = x_test["Cabin"].fillna("
")

x_train["Embarked"] = x_train["Embarked"].fillna("")
x_test["Embarked"] = x_test["Embarked"].fillna("
")

x_test["Fare"] = x_test["Fare"].fillna("*")

特征工程(把数据转化成字典类型)

transfer = DictVectorizer(sparse=False)
x_train = transfer.fit_transform(x_train.to_dict(orient="records"))
x_test = transfer.fit_transform(x_test.to_dict(orient="records"))

决策树(min啥的是“内部节点再划分所需最少样本数”,叶子节点最少样本数,主要是防止过拟合)

estimator = DecisionTreeClassifier(min_samples_split=20, min_samples_leaf=20)
estimator.fit(x_train, y_train)

模型评估(第一行代码输出的是判断结果,第二行输出的是正确率)

print(estimator.predict(x_test))#用测试集来测试就会报错
print(estimator.score(x_test, y_test))