最终希望实现的是:在下面这个图上加上显著性标记,然后统计方法用的是重复测量方差分析,两两比较之后P值大于0.05则为ns,小于0.05为,小于0.01为,小于0.001为。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm
data = pd.read_csv(r'D:\pythonprogram\data\emojidata\data01.csv',encoding='gbk')
df = pd.DataFrame({
'Positive emoji':(data['zza1'].mean(),data['zzha1'].mean(),data['zfa1'].mean()),
'Negative emoji':(data['fza1'].mean(),data['fzha1'].mean(),data['ffa1'].mean())
})
df_std = pd.DataFrame({
'Positive emoji':(data['zza1'].std(),data['zzha1'].std(),data['zfa1'].std()),
'Negative emoji':(data['fza1'].std(),data['fzha1'].std(),data['ffa1'].std())
})
ax = df.plot(kind = 'bar',color=['pink','lime'],width=0.5,hatch='',yerr=df_std*0.5,capsize=10,ecolor='grey')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.xticks([0,1,2],
['Positive','Neutral','Negative'],
fontproperties='stkaiti',
rotation = 0
)
plt.yticks(np.arange(0,1.1,0.1),fontproperties='stkaiti')
plt.ylabel('Correct rate',fontproperties='stkaiti',fontsize=14)
plt.xlabel('Sentence',fontproperties='stkaiti',fontsize=14)
font = fm.FontProperties(fname=r'C:\Windows\Fonts\AdobeSongStd-Light.otf')
plt.legend(prop=font,bbox_to_anchor=(1.12, 1.12))
plt.show()
plt.savefig('no1emoji.png', dpi=600, bbox_inches='tight')
这是gitub上statannotations作者的示例:
代码如下:
df = sns.load_dataset('diamonds')
df = df[df['color'].map(lambda x: x in 'EIJ')]
# Modifying data to yield unequal boxes in the hue value
df.loc[df['cut'] == 'Ideal', 'price'] = df.loc[df['cut'] == 'Ideal', 'price'].map(lambda x: min(x, 5000))
df.loc[df['cut'] == 'Premium', 'price'] = df.loc[df['cut'] == 'Premium', 'price'].map(lambda x: min(x, 7500))
df.loc[df['cut'] == 'Good', 'price'] = df.loc[df['cut'] == 'Good', 'price'].map(lambda x: min(x, 15000))
df.loc[df['cut'] == 'Very Good', 'price'] = df.loc[df['cut'] == 'Very Good', 'price'].map(lambda x: min(x, 3000))
df.head()
x = "color"
y = "price"
hue = "cut"
hue_order=['Ideal', 'Premium', 'Good', 'Very Good', 'Fair']
order = ["E", "I", "J"]
pairs=[
(("E", "Ideal"), ("E", "Very Good")),
(("E", "Ideal"), ("E", "Premium")),
(("E", "Ideal"), ("E", "Good")),
(("I", "Ideal"), ("I", "Premium")),
(("I", "Ideal"), ("I", "Good")),
(("J", "Ideal"), ("J", "Premium")),
(("J", "Ideal"), ("J", "Good")),
(("E", "Good"), ("I", "Ideal")),
(("I", "Premium"), ("J", "Ideal")),
]
ax = sns.boxplot(data=df, x=x, y=y, order=order, hue=hue, hue_order=hue_order)
annot.new_plot(ax, pairs, data=df, x=x, y=y, order=order, hue=hue, hue_order=hue_order)
annot.configure(test='Mann-Whitney', verbose=2)
annot.apply_test()
annot.annotate()
plt.legend(loc='upper left', bbox_to_anchor=(1.03, 1))
plt.savefig('example_hue_layout.png', dpi=300, bbox_inches='tight')
Traceback (most recent call last):
File "D:\python\lib\code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "D:\pycharm\PyCharm 2022.2.3\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "D:\pycharm\PyCharm 2022.2.3\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:\Users\20893\PycharmProjects\pythonProject\paperwrite\journal\230430.py", line 32, in <module>
annot = Annotator(ax, pairs, data=df, x=x, y=y, order=order)
File "C:\Users\20893\AppData\Roaming\Python\Python310\site-packages\statannotations\Annotator.py", line 106, in __init__
self._plotter = self._get_plotter(engine, ax, pairs, plot, data,
File "C:\Users\20893\AppData\Roaming\Python\Python310\site-packages\statannotations\Annotator.py", line 776, in _get_plotter
return engine_plotter(*args, **kwargs)
File "C:\Users\20893\AppData\Roaming\Python\Python310\site-packages\statannotations\_Plotter.py", line 82, in __init__
_Plotter.__init__(self, ax, pairs, data, x, y, hue, order, hue_order,
File "C:\Users\20893\AppData\Roaming\Python\Python310\site-packages\statannotations\_Plotter.py", line 28, in __init__
check_pairs_in_data(pairs, data, group_coord, hue, hue_order)
File "C:\Users\20893\AppData\Roaming\Python\Python310\site-packages\statannotations\utils.py", line 130, in check_pairs_in_data
_check_pairs_in_data_no_hue(pairs, data, coord)
File "C:\Users\20893\AppData\Roaming\Python\Python310\site-packages\statannotations\utils.py", line 77, in _check_pairs_in_data_no_hue
raise ValueError(f"Missing x value(s) "
ValueError: Missing x value(s) `"('J', 'Ideal')", "('J', 'Good')", "('I', 'Good')", "('I', 'Ideal')", "('J', 'Premium')", "('E', 'Ideal')", "('E', 'Very Good')", "('E', 'Good')", "('I', 'Premium')", "('E', 'Premium')"` in color (specified in `pairs`) in data
对于使用Python中的statannotations库添加显著性标记出现的annot函数未定义的问题,我可以提供如下的帮助:
pip install statannotations
from statannotations.Annotator import Annotator
annotator = Annotator(ax, pairs, data=df, x=x, y=y, order=order)
annotator.configure(test='Mann-Whitney', text_format='star', line_height=0.03, line_width=1)
其中ax是绘图对象,pairs是待比较的数据对,data、x、y和order是数据源和绘图参数,test指定使用的测试方法,text_format指定显著性标记类型,line_height和line_width指定标记线条的长度和宽度。
annotator.apply_and_annotate()
在使用该方法时,Annotator类会根据配置的参数自动计算并绘制显著性标记。
如果以上四个步骤都已经确认正确,并且仍然出现annot函数未定义的问题,则可能是该库的版本问题或者其他未知原因。此时,可以尝试更新statannotations库的版本或者联系该库的维护者进行咨询。
另外,针对重复测量方差分析得出两两比较之后P值的大小,从而确定显著性标记的类型是ns、、、还是的问题,可以使用Annotator类中的text_format参数进行指定。text_format参数可以设置为'none'、'ns'、'simple'、'star'、'bracket'中的一种,分别对应不显示标记、显示'ns'、显示''、显示'**'和显示''。例如:
annotator.configure(test='Mann-Whitney', text_format='star', line_height=0.03, line_width=1)
以上代码配置了Mann-Whitney检验、显示'*'标记、标记线条长度为0.03、线条宽度为1。根据具体情况进行参数设置即可。