Skip to main content

Run and Debug

Build Process

When you build a Hybridizer project, three things happen:

  1. Standard C# compilation → Your .dll assembly
  2. Hybridizer code generation → CUDA .cu source files
  3. nvcc compilation → Native GPU library (_CUDA.dll)
YourProject.csproj
→ dotnet build
→ YourProject.dll (MSBuild step 1)
→ YourProject_CUDA.cu (Hybridizer generates)
→ nvcc YourProject_CUDA.cu (MSBuild step 2)
→ YourProject_CUDA.dll (native GPU code)

Run Configurations

CUDA (GPU)

dynamic wrapper = HybRunner.Cuda()
.SetDistrib(128, 256);

OMP (CPU — for debugging)

dynamic wrapper = HybRunner.OMP();

Runs the same generated code on CPU with OpenMP threads. Useful for:

  • Debugging without GPU
  • Setting breakpoints in generated code
  • Validating numerical correctness

Enable Line Information

Build with debug info to map generated CUDA code back to your C# source:

<!-- In your .csproj -->
<PropertyGroup>
<HybridizerEmitLineInfo>true</HybridizerEmitLineInfo>
</PropertyGroup>

With line info enabled:

  • NVIDIA Nsight shows your C# source lines in the profiler
  • Errors reference your original C# code, not the generated .cu

Debugging Workflow

Step 1: Verify with OMP

#if DEBUG
dynamic wrapper = HybRunner.OMP();
#else
dynamic wrapper = HybRunner.Cuda();
#endif

Step 2: Check for Errors

wrapper.MyKernel(args);
cuda.ERROR_CHECK(cuda.DeviceSynchronize());

Step 3: Compare GPU vs CPU

// Direct C# call = CPU reference
MyKernel(args_cpu);

// GPU call
wrapper.MyKernel(args_gpu);
cuda.DeviceSynchronize();

// Compare
for (int i = 0; i < N; i++)
Assert.AreEqual(cpu[i], gpu[i], 1e-5f);

Step 4: Inspect Generated Code

Open the .cu file in your build output. Look for:

  • Correct loop bounds
  • Memory access patterns
  • Shared memory declarations

Profiling

Quick Timing

var sw = Stopwatch.StartNew();
wrapper.MyKernel(args);
cuda.DeviceSynchronize();
sw.Stop();
Console.WriteLine($"{sw.ElapsedMilliseconds} ms");

NVIDIA Nsight Systems (timeline)

nsys profile --stats=true YourProject.exe

Shows:

  • Kernel launch timeline
  • Memory copy durations
  • CPU/GPU overlap

NVIDIA Nsight Compute (kernel analysis)

ncu --set full YourProject.exe

Shows:

  • Achieved occupancy
  • Memory throughput
  • Compute throughput
  • Warp stalls

Environment Variables

VariablePurposeExample
CUDA_VISIBLE_DEVICESSelect GPU0 (first GPU)
HYBRIDIZER_VERBOSEVerbose build output1

Common Debugging Issues

SymptomLikely CauseFix
Results all zerosMissing DeviceSynchronizeAdd sync before reading
Random wrong valuesRace conditionCheck __syncthreads placement
Build succeeds, crash at runtimeDLL not foundCheck bin/ for _CUDA.dll
Correct with OMP, wrong with CUDAParallelization issueCheck shared memory / atomics

See also: FAQ & Troubleshooting for more solutions.