关联性分析中置信度和支持度咋算(标签-数据分析)

数据分析中关联性分析
算出数据中的支持度和置信度

df_Frequent_Itemsets = apriori(df_ShoppingCarts_sets
, min_support=0.07
, use_colnames=True)

有兴趣的wx

该回答引用chatGPT
置信度表示A项目集出现的同时B项目集也出现的概率，计算公式为：Confidence(A → B) = Support(A & B) / Support(A)

支持度表示A项目集出现的概率，计算公式为：Support(A) = number of transactions containing A / total number of transactions

在您的代码中，使用了Apriori算法对df_ShoppingCarts_sets数据进行关联性分析，并且将支持度设置为0.07。您可以从df_Frequent_Itemsets中提取置信度和支持度的数据。
代码如下:

from mlxtend.frequent_patterns import apriori

# 执行Apriori算法
df_Frequent_Itemsets = apriori(df_ShoppingCarts_sets, min_support=0.07, use_colnames=True)

# 计算置信度
def calculate_confidence(row):
    confidence = row['support'] / df_Frequent_Itemsets[df_Frequent_Itemsets['itemsets'] == row['antecedent']]['support'].iloc[0]
    return confidence

df_Frequent_Itemsets['confidence'] = df_Frequent_Itemsets.apply(calculate_confidence, axis=1)

# 打印结果
print(df_Frequent_Itemsets)

在计算支持度之前，请确保您已经使用Apriori算法或其他算法对数据进行了关联性分析，并且已经得到了频繁项集。

以下是一段示例代码，可以帮助您计算支持度和置信度：


# 频繁项集数据
frequent_itemsets = apriori(df, min_support=0.3, use_colnames=True)

# 计算支持度
total_transactions = df.shape[0]
frequent_itemsets['support'] = frequent_itemsets['support'] / total_transactions

# 计算置信度
for i in range(frequent_itemsets.shape[0]):
    antecedent = frequent_itemsets.iloc[i]['itemsets'][:-1]
    support_antecedent = frequent_itemsets[frequent_itemsets['itemsets'] == antecedent]['support'].values[0]
    frequent_itemsets.at[i, 'confidence'] = frequent_itemsets.iloc[i]['support'] / support_antecedent

# 打印结果
print(frequent_itemsets)

这里使用了Pandas库来处理数据，假设您的数据存储在名为df的DataFrame中。您可以通过更改min_support的值来调整支持度的阈值。

不知道你这个问题是否已经解决, 如果还没有解决的话:

关于该问题，我找了一篇非常好的博客，你可以看看是否有帮助，链接：关联分析中的支持度、置信度和提升度计算

如果你已经解决了该问题, 非常希望你能够分享一下解决方案, 写成博客, 将相关链接放在评论区, 以帮助更多的人 ^-^

关联性分析是数据挖掘的一种技术，主要用于发现数据中隐藏的关系和模式。支持度和置信度是关联性分析中的重要概念。

支持度（support）：表示一个项集出现的频率，它是在数据中出现次数除以数据总数。

置信度（confidence）：表示A项集对B项集产生影响的程度，它是A项集与B项集同时出现的频率除以A项集出现的频率。

上述代码使用了apriori算法来发现频繁项集，参数min_support=0.07表示只考虑支持度大于等于0.07的项集。使用use_colnames=True表示使用列名作为项集的标识。

对于数据分析而言，关联性分析可以帮助我们算出数据中的支持度和置信度。为此，可以使用Apriori算法，它是一种用来发现频繁项集的算法，能够快速识别出一个集合中所有元素之间的关联。具体步骤是首先加载需要进行关联关系分析的数据，再设定支持度和置信度，然后使用 apriori函数，最后获得关联规则的支持度和置信度列表。