torch.gather()结果无法复现

深度学习模型中已经固定了随机数种子，使用torch.gather()函数仍然会导致结果不可复现。（使用 torch.use_deterministic_algorithms(True)会抛出异常）。网上关于torch.gather() 确定性的内容较少，请问这是否是一个通用的问题，以及如何解决？

异常信息：RuntimeError: scatter_add_cuda_kernel does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation, or you can use the 'warn_only=True' option, if that's acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize adding deterministic support for this operation.

torch.gather()函数的确存在确定性问题，因为该函数的实现依赖于GPU的运算顺序，而GPU的运算顺序是由硬件和驱动程序决定的，可能在不同的机器上产生不同的结果。因此，即使使用了固定随机数种子，仍然可能导致无法复现的结果。

解决这个问题的方法之一是使用CPU来运行模型，因为CPU的运算顺序是确定的，不受硬件和驱动程序的影响。另一种方法是使用torch.manual_seed()函数来手动设置随机数种子，并在每次运行模型时都使用相同的种子。这样可以保证每次运行的结果是一样的。

该回答引用GPTᴼᴾᴱᴺᴬᴵ,具体如下：

 在 PyTorch 中，torch.gather()函数的确定性实现依赖于底层的 scatter_add() 函数，而 scatter_add() 函数目前并没有确定性实现。因此，即使在模型中设置了随机数种子和使用了 torch.use_deterministic_algorithms(True)，使用 torch.gather() 函数仍然会导致结果不可复现。

  如果需要在模型中使用 torch.gather() 函数，并且希望得到可复现的结果，可以使用 torch.backends.cudnn.deterministic = True 和 torch.backends.cudnn.benchmark = False 来设置 CUDA 和 cuDNN 的确定性选项。具体来说，可以在代码的开始部分添加如下代码：


import torch

# 设置随机数种子
torch.manual_seed(0)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

其中，torch.backends.cudnn.deterministic = True 用于强制让 CUDA 和 cuDNN 使用确定性算法，而 torch.backends.cudnn.benchmark = False 则禁用了 cuDNN 中的自动调整选项。这两个选项的设置可以保证 CUDA 和 cuDNN 的行为在不同运行环境下是一致的，从而使得模型的运行结果具有可复现性。

如果以上回答对您有所帮助，望采纳～谢谢

文章：torch.gather()用法详解中也许有你想要的答案，请看下吧
同时，你还可以查看手册：pytorch all_gather() (in module torch.distributed) 中的内容