首先是这样一个问题,对于数据集地址的问题,数据集很多,只列了一点举例为什么同样是excel导出的csv文件,但地址同样是E盘,一个是E:\data.csv
citrus fruit,semi-finished bread,margarine,ready soups,,,,,,,,,,,,,,,,,,,,,,,,,,,,
tropical fruit,yogurt,coffee,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
whole milk,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
一个是E:\OneDrive\桌面\data.csv
citrus fruit,semi-finished bread,margarine,ready soups
tropical fruit,yogurt,coffee
whole milk
为啥会出现这种问题呢?接下来我用第二种路径进行了实验
def load_data(path):
result = []
with open(path) as f:
for line in f:
line = line.strip('\n')
result.append(line.split(","))
return result
dataset = load_data("E:\OneDrive\文档\data.csv")
print(len(dataset))
for i in range(10):
print(i + 1, dataset[i], sep="->")
import itertools
items = set(itertools.chain(*dataset))
str_to_index = {}
index_to_str = {}
for index, item in enumerate(items):
str_to_index[item] = index
index_to_str[index] = item
print("字符串到编号:", list(str_to_index.items())[:10])
print("编号到字符串:", list(index_to_str.items())[:10])
for i in range(len(dataset)):
for j in range(len(dataset[i])):
dataset[i][j] = str_to_index[dataset[i][j]]
for i in range(10):
print(i + 1, dataset[i], sep="->")
到这里没有问题能够完好运行
运行结果:
32
1->[' shrimp', 'almonds', 'avocado', 'vegetables mix', 'green grapes ']
2->[' chutney', 'chicken', 'energy drink ']
3->[' turkey', ' avocado', 'green tee ']
4->[' mineral water', 'milk', 'energy bar', 'whole wheat rice', 'green tea ']
5->[' low fat yogurt', 'yams ']
6->[' whole wheat pasta', 'french fries ']
7->[' soup', 'light cream', 'shallot ']
8->[' frozen vegetables', 'spaghetti', 'green tea ']
9->[' french fries', 'cottage cheese ']
10->[' eggs', 'pet food', 'salmon', 'toothpaste ']
字符串到编号: [('cooking oil', 0), ('low fat yogurt ', 1), ('pancakes', 2), ('eggs ', 3), ('almonds', 4), ('honey', 5), ('eggs', 6), ('protein bar ', 7), ('shampoo ', 8), ('yams ', 9)]
编号到字符串: [(0, 'cooking oil'), (1, 'low fat yogurt '), (2, 'pancakes'), (3, 'eggs '), (4, 'almonds'), (5, 'honey'), (6, 'eggs'), (7, 'protein bar '), (8, 'shampoo '), (9, 'yams ')]
1->[48, 4, 25, 56, 31]
2->[53, 18, 24]
3->[44, 33, 70]
4->[15, 78, 68, 75, 71]
5->[61, 9]
6->[32, 64]
7->[11, 69, 20]
8->[59, 17, 71]
9->[67, 27]
10->[55, 46, 37, 38]
接下来是进行了数据扁平,将其放入frozenset中
def buildC1(dataset):
item1 = set(itertools.chain(*dataset))
return [frozenset([i]) for i in item1]
c1 = buildC1(dataset)
c1
但这里输出没反应,和上面输出结果一样,并没有像哔哩哔哩博主一样输出frozenset(),但也没有报错,不知道错在哪。
后面我跟着敲了一些
def ck_to_lk(dataset,ck,min_support):
support = {}
for row in dataset:
for item in ck:
if item.issudset(row):
support[item] = support.get(item,0) + 1
total = len(dataset)
return {k: v / total for k, v in support.items() if v / total >= min_support}
l1 = ck_to_lk(dataset,c1,0.06)
l1
def lk_to_(lk_list):
ck : set()
lk_size = len(lk_list)
if lk_size > 1:
k = len(lk_list[0])
for i, j in itertools.combinations(range(lk_size),2):
t = lk_list[i] | lk_list[j]
if len(t) == k + 1:
ck. add(t)
return ck
**c2 = lk_to_ck(list(l1.keys()))**
c2
这里报错了
Traceback (most recent call last):
File "D:\test\py1.py", line 51, in <module>
l1 = ck_to_lk(dataset,c1,0.06)
File "D:\test\py1.py", line 46, in ck_to_lk
if item.issudset(row):
AttributeError: 'frozenset' object has no attribute 'issudset'
下面这一行是错误的,报错为未解析的引用
c2 = lk_to_ck(list(l1.keys()))
整个过程是跟着博主边讲解边写的,但不知道问题出在哪,希望有解答,万分感谢!