用statsmodel包做logit回归。

熬了5个小时了,对于小白来讲太蒙了,有没有大神帮帮我。

import pandas as pd
import numpy as np
import statsmodels.api as sm
import pylab as pl
df = pd.read_csv(r'C:\Users\Administrator\Desktop\Application.csv')
print(df.head())

df.columns = ['admit','gre','gpa','sch_rank']
print(df.columns)

df.describe()

dummy_ranks = pd.get_dummies(df['sch_rank'],prefix = 'sch_rank')
print(dummy_ranks.head())

cols_to_keep = ['admit','gre','gpa']
data = df[cols_to_keep].join(dummy_ranks.loc[:, : 'sch_rank_3'])
print(data.head())

data['intercept'] = 1.0
print(data.head())

train_cols = data.columns[1:]
print(train_cols)

logit = sm.Logit(data['admit'],data[train_cols])
result = logit.fit()

我是按照一篇教程做的,但是我这怎么就过不了啊。教程地址放下边。

https://blog.csdn.net/weixin_39641876/article/details/110974752?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522162204686716780271567505%2522%252C%2522scm%2522%253A%252220140713.130102334..%2522%257D&request_id=162204686716780271567505&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~sobaiduend~default-1-110974752.first_rank_v2_pc_rank_v29&utm_term=Logit%E6%A8%A1%E5%9E%8B%E6%8B%9F%E5%90%88%E5%AE%9E%E6%88%98%E6%A1%88%E4%BE%8B%EF%BC%88Python%EF%BC%89%E2%80%94%E2%80%94%E7%A6%BB%E6%95%A3%E9%80%89%E6%8B%A9%E6%A8%A1%E5%9E%8B%E4%B9%8B%E5%85%AD&spm=1018.2226.3001.4187

是不是你数据集不全啊,我看别人的是400条数据,你这才14条