不知道df的数据结构是什么样的,将sentences.append(str(sent_tokenize(s)))中的str去掉,同时变量名最好不要重复。如下示例代码运行结果正常:
from nltk.tokenize import sent_tokenize,word_tokenize
text = ["He is so lucky. he won a lottery.","It's a good news! Have a nice day!"]
sents=[]
for s in text:
sents.append(sent_tokenize(s))
words=[word_tokenize(x) for y in sents for x in y]
print(words)
sentences=[x for y in sents for x in y]
print(sentences)