使用 CUDA 运行时 API 检查错误的规范方法是什么？

通过查看 CUDA 问题的答案和评论，以及在 CUDA 标记维基中，我看到经常有人建议检查每个 API 调用的返回状态是否有错误。 API 文档包含了诸如 cuda / etlasterror、 cuda / ekatlasterror 和 cudaigerstring 这样的功能，但是如何才能在不需要大量额外代码的情况下将这些功能组合在一起以可靠地捕获和报告错误呢？

我用下面的方法很管用。该解决方案使用内置的 cuda 功能，实现起来非常简单。
有关代码如下:

#include <stdio.h>#include <stdlib.h>
__global__ void foo(int *ptr){
  *ptr = 7;}
int main(void){
  foo<<<1,1>>>(0);
  // make the host block until the device is finished with foo
  cudaDeviceSynchronize();
  // check for error
  cudaError_t error = cudaGetLastError();
  if(error != cudaSuccess)
  {
    // print the CUDA error message and exit
    printf("CUDA error: %s\n", cudaGetErrorString(error));
    exit(-1);
  }
  return 0;}

有时，我们可能希望在 c + + 上下文中报告错误条件并从中恢复，以此作为较大应用程序的一部分。
这里有一个相当简洁的方法可以做到这一点: 使用止推: : 系统错误出现一个由std::runtime_error产生的 thrust::system_error:

#include <thrust/system_error.h>#include <thrust/system/cuda/error.h>#include <sstream>
void throw_on_cuda_error(cudaError_t code, const char *file, int line){
  if(code != cudaSuccess)
  {
    std::stringstream ss;
    ss << file << "(" << line << ")";
    std::string file_and_line;
    ss >> file_and_line;
    throw thrust::system_error(code, thrust::cuda_category(), file_and_line);
  }}

然后就把文件名、行号和cudaError_t的描述合并到出现的异常中的.what()的menber:

#include <iostream>
int main(){
  try
  {
    // do something crazy
    throw_on_cuda_error(cudaSetDevice(-1), __FILE__, __LINE__);
  }
  catch(thrust::system_error &e)
  {
    std::cerr << "CUDA error after cudaSetDevice: " << e.what() << std::endl;
    // oops, recover
    cudaSetDevice(0);
  }
  return 0;}

输出结果：

$ nvcc exception.cu -run
CUDA error after cudaSetDevice: exception.cu(23): invalid device ordinal

如果需要，some_function可以区分 CUDA 错误和其他类型的错误:

try{
  // call some_function which may throw something
  some_function();}catch(thrust::system_error &e){
  std::cerr << "CUDA error during some_function: " << e.what() << std::endl;}catch(std::bad_alloc &e){
  std::cerr << "Bad memory allocation during some_function: " << e.what() << std::endl;}catch(std::runtime_error &e){
  std::cerr << "Runtime error during some_function: " << e.what() << std::endl;}catch(...){
  std::cerr << "Some other kind of error during some_function" << std::endl;

  // no idea what to do, so just rethrow the exception
  throw;}

因为 thrust::system_error 是 std::runtime_error，所以如果我们不要求前者的精确度，我们可以用同样的方式处理:

try{
  // call some_function which may throw something
  some_function();}catch(std::runtime_error &e){
  std::cerr << "Runtime error during some_function: " << e.what() << std::endl;}

检查运行时 API 代码错误的最好方法可能是定义一个assert 样式处理程序函数和宏，如下所示:

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true){
   if (code != cudaSuccess) 
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }}

然后你可以用 gpuErrchk 宏处理 API 调用的返回状态，例如:
gpuErrchk( cudaMalloc(&a_d, size*sizeof(int)) );
如果调用中出现错误，将向 stderr 发出描述错误的文本消息以及代码中发生错误的文件和行，应用程序将退出。您可以修改 gpuAssert 以引发异常，而不是在更复杂的应用程序中(如果需要的话)调用 exit ()。
第二个相关的问题是如何检查内核启动中的错误，这些错误不能直接在像标准运行时 API 调用这样的宏调用中。

kernel<<<1,1>>>(a);
gpuErrchk( cudaPeekAtLastError() );
gpuErrchk( cudaDeviceSynchronize() );

将首先检查无效的启动参数，然后强制等待，直到内核停止并检查执行错误。如果你有一个像下面这样的阻塞 API 调用，那么就消除同步:

kernel<<<1,1>>>(a_d);
gpuErrchk( cudaPeekAtLastError() );
gpuErrchk( cudaMemcpy(a_h, a_d, size * sizeof(int), cudaMemcpyDeviceToHost) );

在这种情况下，cuda / emcpy 调用可以返回在内核执行期间发生的错误或者内存拷贝本身发生的错误。新手不太会这个方法，我建议在调试期间内核启动后使用明确的同步，以便更容易理解问题可能出现在哪里。
请注意，当使用 CUDA Dynamic Parallelism 时，一种非常类似的方法可以也应该应用于设备内核中 CUDA 运行时 API 的任何使用，以及在任何设备内核启动之后:

#include <assert.h>#define cdpErrchk(ans) { cdpAssert((ans), __FILE__, __LINE__); }
__device__ void cdpAssert(cudaError_t code, const char *file, int line, bool abort=true){
   if (code != cudaSuccess)
   {
      printf("GPU kernel assert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) assert(0);
   }}