审稿人提出一个问题:
The most serious issue: The results evidence in the manuscript is made by Accuracy metric. But if you measure TP, TN, FP, FN, then you suppose to classify the only label at the test. You have 6 labels totally (Walking forward, Standing, Sitting, Walking in place, Running in place, Jumping) and have tested them separately. Hence you usually had about 1/6 positive outputs among total. If so, the Accuracy metric does not show the efficiency of your classifier due to narrow hypothesis effect. For instance, you are testing Sitting and you have 1000 samples with 150 labeled as Sitting. Then if your classifier gives
TP = 10, TN = 850, FP = 0 and FN = 140,
you'll receive Acc = 86%, whereas you have really detected just 1/15 of target cases. You should describe you experiment in details and make your conclusions based on the relevant metrics.
机翻了一下:稿件中的结果证据是通过准确性指标做出的。但是,如果您测量TP,TN,FP,FN,那么假设在测试中对唯一的标签进行分类。您总共有 6 个标签(向前走、站立、坐下、原地行走、原地跑步、跳跃),并分别对其进行了测试。因此,您通常总共有大约 1/6 的正输出。如果是这样,由于狭义假设效应,准确度指标不会显示分类器的效率。例如,您正在测试坐姿,您有 1000 个样本,其中 150 个标记为坐姿。然后,如果您的分类器给出
TP = 10, TN = 850, FP = 0 和 FN = 140,
这是什么意思。
”If so, the Accuracy metric does not show the efficiency of your classifier due to narrow hypothesis effect“
该回答引用GPTᴼᴾᴱᴺᴬᴵ
该意思是,如果您只使用准确性指标来评估分类器的性能,而不考虑其他指标(如精确度、召回率、F1得分等),那么可能会出现狭窄的假设效应。换句话说,准确性指标可能会高估分类器的性能,因为您只关注正确分类的样本数,而忽略了误分类的样本数。例如,在您的例子中,如果您仅使用准确性指标来评估分类器的性能,您可能会认为分类器的性能很好,因为它的准确性高达86%。但实际上,分类器只能正确识别目标样本的1/15,这可能不足以满足您的需求。因此,建议使用更全面的指标来评估分类器的性能,以确保您得出的结论更加准确。
这个要考虑样本不平衡,误报漏报的比率。比如说你检测总体人群中患某个罕见病的概率,你不检测,所有人都说没病,你的准确率往往也高达99%+,但是并不能说明你的检测有效。
不知道你这个问题是否已经解决, 如果还没有解决的话: