用spacy分词构建元学习数据集时遇到的分词问题

论文FEW-SHOT TEXT CLASSIFICATION WITH DISTRIBUTIONAL SIGNATURES中的Amazon数据集由text raw label组成，text是raw的分词结果，以列表形式存储，这是其中一个样本的text:

list1 = ['i', 'was', 'pleasantly', 'surprised', 'with', 'this', '"', 'out', 'of', 'the', 'box', '"', 'series', '.', ' ', 'good', 'writing', ',', 'good', 'acting', ',', 'laugh', 'out', 'loud', 'situations', '.', ' ', 'devito', 'showing', 'up', 'in', 'the', 'second', 'season', 'gave', 'it', 'a', 'little', 'boost', 'as', 'he', "'s", 'always', 'dependable', 'for', 'turning', 'the', 'mundane', 'into', 'the', 'hilarious', '.', 'it', "'s", 'basically', 'about', '3', 'jackass', 'friends', 'in', 'philly', 'who', 'own', 'a', 'bar', 'and', 'get', 'themselves', 'into', 'offbeat', 'situations', '.', ' ', 'what', 'i', 'liked', 'best', 'is', 'that', 'it', 'is', 'not', 'the', 'clice', 'venue', 'for', 'the', 'young', 'and', 'the', 'beautiful', '.', ' ', 'it', 'often', 'hi', '-', 'lightes', 'the', 'old', 'and', 'the', 'ugly', 'and', 'in', 'doing', 'so', 'cultivates', 'a', 'good', 'portion', 'of', 'the', 'laughs', '.', 'worth', 'you', 'time', 'and', 'money', '....', 'bg']
论文中没有写用的什么分词方法
这是我用spacy的en_core_web_sm对raw分词得到的结果
list2 = ['i', 'was', 'pleasantly', 'surprised', 'with', 'this', '"', 'out', 'of', 'the', 'box', '"', 'series', '.', ' ', 'good', 'writing', ',', 'good', 'acting', ',', 'laugh', 'out', 'loud', 'situations', '.', ' ', 'devito', 'showing', 'up', 'in', 'the', 'second', 'season', 'gave', 'it', 'a', 'little', 'boost', 'as', 'he', "'s", 'always', 'dependable', 'for', 'turning', 'the', 'mundane', 'into', 'the', 'hilarious.it', "'s", 'basically', 'about', '3', 'jackass', 'friends', 'in', 'philly', 'who', 'own', 'a', 'bar', 'and', 'get', 'themselves', 'into', 'offbeat', 'situations', '.', ' ', 'what', 'i', 'liked', 'best', 'is', 'that', 'it', 'is', 'not', 'the', 'clice', 'venue', 'for', 'the', 'young', 'and', 'the', 'beautiful', '.', ' ', 'it', 'often', 'hi', '-', 'lightes', 'the', 'old', 'and', 'the', 'ugly', 'and', 'in', 'doing', 'so', 'cultivates', 'a', 'good', 'portion', 'of', 'the', 'laughs.worth', 'you', 'time', 'and', 'money', '....', 'bg']

所有不匹配的分词结果都是单词中包含'.'，类似hilarious.it，c.g.i
我现在想把spacy分词结果中的包含的'.'的单词手动分开，但是会出现影响到其他只包含'.'的字符串，并没有找到很好的手动分割方法
或者是不是有更合适的分词方法，能直接得到text的结果

其实主要是空格的问题有的句子连接处没打空格就下一句了就会合在一起看我的elif里面的内容解决了这个问题


import spacy
import re
list1 = []
nlp = spacy.load("en_core_web_sm")
str1 = "I was pleasantly surprised with this \"out of the box\" series.  Good writing, good acting, laugh out loud situations.  Devito showing up in the second season gave it a little boost as he's always dependable for turning the mundane into the hilarious.It's basically about 3 jackass friends in Philly who own a bar and get themselves into offbeat situations.  What I liked best is that it is not the clice venue for the young and the beautiful.  It often hi-lightes the old and the ugly and in doing so cultivates a good portion of the laughs.Worth you time and money....bg"
doc = nlp(str1.lower())
for token in doc:
    if str(token)=='"':
        list1.append(str("\""))
    elif '.' in str(token) and str(token).count('.')!=len(str(token)):
        for x in re.findall(r'\w+|\.',str(token)):
            list1.append(x)
    else:
        list1.append(str(token))