text_corpus = [
"Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system"
]
stop_list=set('for a of the and to in'.split(' '))
texts = [[word for word in document.lower().split() if word not in stop_list]for document in text_corpus]
主要是最后一行,应该是先执行for document in text_corpus ,也就是从text_corpus中每次去除一个字符串也就是doucument,然后把这个字符串document放到前面执行,然后前面一长串我就不知道咋执行的
拿到了doucument后先做处理document.lower().split()
然后循环遍历document.lower().split()的元素,如果不在stop_list中,那么就添加到新的列表,否则不添加
[x for x in a]是个列表推导式
你这明显是个嵌套的列表推导式,类似二重for循环
可以写为等价二重for循环如下:
a=[]
for document in text_corpus:
b=[]
a.append(b)
for word in document.lower().split():
if word not in stop_list:
b.append(word)
texts = []
for document in text_corpus:
word_list = document.lower().split()
tmp_texts = []
for word in word_list:
if word not in stop_list:
tmp_texts.append(word)
texts.append(tmp_texts)
text_corpus = [
"Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system"
]
stop_list = set('for a of the and to in'.split(' '))
texts = [[word for word in document.lower().split() if word not in stop_list] for document in text_corpus]
# 第七行的代码等价于下面的for循环
texts1 = []
for document in text_corpus:
line = []
for word in document.lower().split():
if word not in stop_list:
line.append(word)
texts1.append(line)
print(texts == texts1)