Python编译出现MemoryError问题,救

问题遇到的现象和发生背景

用Jupyter学习纽约出租车运行情况分析建模时,在聚类的时候,运行下列代码出现MemoryError报错。

问题相关代码
kmeans = KMeans(n_clusters=15, random_state=2, n_init = 10).fit(loc_df)
loc_df['label'] = kmeans.labels_

loc_df = loc_df.sample(200000)
plt.figure(figsize = (10,10))
for label in loc_df.label.unique():
    plt.plot(loc_df.longitude[loc_df.label == label],loc_df.latitude[loc_df.label == label],'.',alpha = 0.3, markersize = 0.3)
    
plt.title('NewYork Clusters')
plt.show()

运行结果及报错内容
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-16-06a5f4870f57> in <module>()
----> 1 kmeans = KMeans(n_clusters=15, random_state=2, n_init = 10).fit(loc_df)
      2 loc_df['label'] = kmeans.labels_
      3 
      4 loc_df = loc_df.sample(200000)
      5 plt.figure(figsize = (10,10))

D:\Anaconda3\lib\site-packages\sklearn\cluster\k_means_.py in fit(self, X, y)
    894                 tol=self.tol, random_state=random_state, copy_x=self.copy_x,
    895                 n_jobs=self.n_jobs, algorithm=self.algorithm,
--> 896                 return_n_iter=True)
    897         return self
    898 

D:\Anaconda3\lib\site-packages\sklearn\cluster\k_means_.py in k_means(X, n_clusters, init, precompute_distances, n_init, max_iter, verbose, tol, random_state, copy_x, n_jobs, algorithm, return_n_iter)
    344                 X, n_clusters, max_iter=max_iter, init=init, verbose=verbose,
    345                 precompute_distances=precompute_distances, tol=tol,
--> 346                 x_squared_norms=x_squared_norms, random_state=random_state)
    347             # determine if these results are the best so far
    348             if best_inertia is None or inertia < best_inertia:

D:\Anaconda3\lib\site-packages\sklearn\cluster\k_means_.py in _kmeans_single_elkan(X, n_clusters, max_iter, init, verbose, x_squared_norms, random_state, tol, precompute_distances)
    398         print('Initialization complete')
    399     centers, labels, n_iter = k_means_elkan(X, n_clusters, centers, tol=tol,
--> 400                                             max_iter=max_iter, verbose=verbose)
    401     inertia = np.sum((X - centers[labels]) ** 2, dtype=np.float64)
    402     return labels, inertia, centers, n_iter

sklearn\cluster\_k_means_elkan.pyx in sklearn.cluster._k_means_elkan.k_means_elkan()

MemoryError: 

请问这个怎么解决吖

读取文件用的是

df = pd.read_csv('yellow_tripdata_2012-01.csv')

内存错误,一般是运行时内存占用超过了内存配置

loc_df = loc_df.sample(200000),20w的数据聚类需要内存还是挺多的,改小一些试试看