Skip to main content

Port Existing Code to Hybridizer

This guide walks you through the process of migrating existing C# code to run on GPU or other accelerators using Hybridizer.

Overview

Step 1: Identify Hot Paths

Use profiling to find code that:

  • Consumes significant CPU time
  • Has data-parallel structure (same operation on many elements)
  • Works with large arrays or matrices
// Before: CPU-bound loop
for (int i = 0; i < N; i++)
{
result[i] = Math.Sin(data[i]) * Math.Cos(data[i]);
}

Step 2: Check Compatibility

Review the known limitations:

SupportedNot Supported
Arrays of primitivesHeap allocation in kernel
Blittable structsStrings
Math operationsExceptions (partial)
Generics (with templates)foreach loops
Virtual functionsGeneric methods

Step 3: Add Attributes

Mark your method as an entry point:

using Hybridizer.Runtime.CUDAImports;

public class MyProcessor
{
[EntryPoint]
public static void Process(double[] data, double[] result, int N)
{
for (int i = threadIdx.x + blockIdx.x * blockDim.x;
i < N;
i += blockDim.x * gridDim.x)
{
result[i] = Math.Sin(data[i]) * Math.Cos(data[i]);
}
}
}

Key changes:

  1. Added [EntryPoint] attribute
  2. Replaced sequential loop with grid-stride loop
  3. Used threadIdx and blockIdx for indexing

Step 4: Refactor Unsupported Patterns

Replace foreach with for

// Before (not supported)
foreach (var item in collection) { ... }

// After
for (int i = 0; i < collection.Length; i++) { ... }

Extract Helper Methods

// Mark helpers as [Kernel]
[Kernel]
public static double ComputeValue(double x)
{
return Math.Sin(x) * Math.Cos(x);
}

[EntryPoint]
public static void Process(double[] data, double[] result, int N)
{
int i = threadIdx.x + blockIdx.x * blockDim.x;
if (i < N)
result[i] = ComputeValue(data[i]);
}

Handle Object Allocations

// Before (not supported - heap allocation)
var temp = new MyClass();

// After - use structs passed as parameters
public struct MyStruct { public float Value; }

[EntryPoint]
public static void Process(MyStruct[] data, int N) { ... }

Step 5: Validate Results

Always compare GPU results with a CPU reference:

// CPU reference
double[] cpuResult = new double[N];
for (int i = 0; i < N; i++)
cpuResult[i] = Math.Sin(data[i]) * Math.Cos(data[i]);

// GPU result
wrapper.Process(data, gpuResult, N);

// Compare
for (int i = 0; i < N; i++)
{
double diff = Math.Abs(cpuResult[i] - gpuResult[i]);
if (diff > 1e-10)
Console.WriteLine($"Mismatch at {i}: {diff}");
}
warning

Floating-point results may differ slightly between CPU and GPU due to different instruction ordering and precision.

Step 6: Optimize

After validating correctness, optimize performance:

  1. Launch configuration: Use enough threads
  2. Memory access: Ensure coalescence
  3. Reduce transfers: Keep data on GPU

See Optimize Kernels for details.

Common Porting Patterns

Original PatternHybridizer Equivalent
for (i = 0; i < N; i++)Grid-stride loop
Parallel.For(...)[EntryPoint] with threading
LINQ operationsExplicit loops
Object creationPre-allocated struct arrays

Next Steps