CNN网络训练flower_data数据集loss不收敛的问题

问题遇到的现象和发生背景

本人为pytorch初学者,试写了一个CNN网络模型,使用flower_data数据集进行训练,但出现了loss不收敛现象,不知是哪一步出了问题,望亲们指正。

用代码块功能插入代码,请勿粘贴截图

模型结构:

class Hjy(nn.Module):
    def __init__(self):
        super(Hjy, self).__init__()
        self.Module = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=(3,3), stride=(1,1), padding=(1,1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=(3,3), stride=(1,1), padding=(1,1)),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2,stride=2,padding=0,dilation=1,ceil_mode=False),
            nn.Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False),
            nn.Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False),
            nn.Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
        )

        self.allPool = nn.Sequential(
            nn.AdaptiveAvgPool2d(output_size=(10,10))
        )

        self.allLinear = nn.Sequential(
            nn.Linear(in_features=512*10*10, out_features=5120, bias=True),
            nn.ReLU(inplace=True),
            nn.Linear(in_features=5120, out_features=5120, bias=True),
            nn.ReLU(inplace=True),
            nn.Linear(in_features=5120, out_features=102, bias=True),
        )

    def forward(self,x):
        x = self.Module(x)
        x = self.allPool(x)
        x = torch.flatten(x,1)
        x = self.allLinear(x)
        return x

执行训练部分:

if __name__ == "__main__":
    data_transform = transforms.Compose(
        [
            transforms.Resize((64, 64)),
            transforms.ToTensor(),
        ]
    )
    epochs = 5
    accuracy = Acuracy_score()
    net = Hjy()
    _loss = nn.CrossEntropyLoss()
    _optim = torch.optim.SGD(net.parameters(), lr = 0.01)

    dataset = datasets.ImageFolder("D:/AIPretrain/flower_data/train", transform=data_transform)
    data = DataLoader(dataset=dataset, batch_size=5, shuffle=True, num_workers=2)

    for i in range(epochs):
        net.train(True)
        for img, label in data:
            _y = net(img)
            loss = _loss(_y, label)
            _optim.zero_grad()
            loss.backward()
            _optim.step()
            print(f"epoch:{i}  loss:{loss}  Acc:{accuracy(_y,label)}")

运行结果及报错内容

img

我的解答思路和尝试过的方法

我打印了训练期间的参数变化,结果如下

img

之后我将net换为VGG16网络进行训练,发现loss仍然不收敛,于是就觉得是数据集读入有问题,我将ImageFolder读入的数据集转为PIL格式查看,结果发现图像是全黑的

img

奇怪的是我不经过transforms的数据(读入格式为PIL),输出就是正常的

img

我想要达到的结果

请问是我的数据读入有问题么,该如何修改我的代码才能解决loss不收敛的问题,求解答.

你这transforms将图片resize到64X64,你自己计算一下经过你的11层卷积外加4层池化之后输出多大。
vgg16的默认输入应该是224x224大小才适用的,如果你跟这个偏差很大的话,你得缩减网络,比如变成vgg11或者其他大小的VGG