下面是对样本数据的预处理,包含九种故障类型,一种正常类型,将数据分为测试集、训练集和验证集,比例为7:2:1,样本量为1000,如果要进行不均衡样本分类,将正常类样本与故障样本比例设置为200:1,应该怎么写代码呢?
for data_type in range(10):
fs = 12000
t = 0.1
opt = "0-"
N = 1024
data = all_data[data_type]
for load_type in range(1):
load_data = data
start = 0
for i in range(700):
temp = load_data[start: start + N]
start += 100
temp = ([i for arr in temp for i in arr])
f, t, nd = signal.stft(temp, fs=fs, window='hann', nperseg=64, noverlap=None, nfft=None,
detrend=False, return_onesided=True, boundary='zeros', padded=True, axis=-1)
Gaussian = cv2.GaussianBlur(abs(nd), (3, 3), 1)
stfts_train_pics.append(Gaussian)
stfts_train_labels.append(data_type)
for i in range(200):
temp = load_data[start: start + N]
start += 100
temp = ([i for arr in temp for i in arr])
f, t, nd = signal.stft(temp, fs=fs, window='hann', nperseg=64, noverlap=None, nfft=None,
detrend=False, return_onesided=True, boundary='zeros', padded=True, axis=-1)
Gaussian = cv2.GaussianBlur(abs(nd), (3, 3), 1)
stfts_test_pics.append(Gaussian)
stfts_test_labels.append(data_type)
for i in range(100):
temp = load_data[start: start + N]
start += 100
temp = ([i for arr in temp for i in arr])
f, t, nd = signal.stft(temp, fs=fs, window='hann', nperseg=64, noverlap=None, nfft=None,
detrend=False, return_onesided=True, boundary='zeros', padded=True, axis=-1)
Gaussian = cv2.GaussianBlur(abs(nd), (3, 3), 1)
stfts_validation_pics.append(Gaussian)
stfts_validation_labels.append(data_type)
【以下回答由 GPT 生成】
问题的关键在于如何实现不均衡样本抽样,将正常类样本和故障样本的比例设置为200:1。下面是一个可能的解决方案和代码示例:
num_normal_samples = 200
num_fault_samples = num_normal_samples * 200
balanced_data = []
for data_type in range(10):
fs = 12000
t = 0.1
opt = "0-"
N = 1024
data = all_data[data_type]
# 调整正常样本数量
num_normal_samples_adjusted = min(num_normal_samples, len(data))
for i in range(num_normal_samples_adjusted):
temp = load_data[start: start + N]
start += 100
temp = ([i for arr in temp for i in arr])
f, t, nd = signal.stft(temp, fs=fs, window='hann', nperseg=64, noverlap=None, nfft=None,
detrend=False, return_onesided=True, boundary='zeros', padded=True, axis=-1)
Gaussian = cv2.GaussianBlur(abs(nd), (3, 3), 1)
balanced_data.append((Gaussian, data_type))
# 调整故障样本数量
num_fault_samples_adjusted = min(num_fault_samples, len(data) - num_normal_samples_adjusted)
for i in range(num_fault_samples_adjusted):
temp = load_data[start: start + N]
start += 100
temp = ([i for arr in temp for i in arr])
f, t, nd = signal.stft(temp, fs=fs, window='hann', nperseg=64, noverlap=None, nfft=None,
detrend=False, return_onesided=True, boundary='zeros', padded=True, axis=-1)
Gaussian = cv2.GaussianBlur(abs(nd), (3, 3), 1)
balanced_data.append((Gaussian, data_type))
现在,balanced_data列表中的样本已经按照所需的比例进行了分类,并且可以继续使用它进行训练集、测试集和验证集的划分。
请注意,上述示例代码仅仅是一种可能的解决方案,具体代码实现可能还需要根据实际情况进行调整。但这个示例可以作为一个起点来进行进一步的优化和开发。
【相关推荐】