Port Existing Code to Hybridizer

This guide walks you through the process of migrating existing C# code to run on GPU or other accelerators using Hybridizer.

Overview

Step 1: Identify Hot Paths

Use profiling to find code that:

Consumes significant CPU time
Has data-parallel structure (same operation on many elements)
Works with large arrays or matrices

// Before: CPU-bound loop
for (int i = 0; i < N; i++)
{
    result[i] = Math.Sin(data[i]) * Math.Cos(data[i]);
}

Step 2: Check Compatibility

Review the known limitations:

Supported	Not Supported
Arrays of primitives	Heap allocation in kernel
Blittable structs	Strings
Math operations	Exceptions (partial)
Generics (with templates)	`foreach` loops
Virtual functions	Generic methods

Step 3: Add Attributes

Mark your method as an entry point:

using Hybridizer.Runtime.CUDAImports;

public class MyProcessor
{
    [EntryPoint]
    public static void Process(double[] data, double[] result, int N)
    {
        for (int i = threadIdx.x + blockIdx.x * blockDim.x;
             i < N;
             i += blockDim.x * gridDim.x)
        {
            result[i] = Math.Sin(data[i]) * Math.Cos(data[i]);
        }
    }
}

Key changes:

Added [EntryPoint] attribute
Replaced sequential loop with grid-stride loop
Used threadIdx and blockIdx for indexing

Step 4: Refactor Unsupported Patterns

Replace `foreach` with `for`

// Before (not supported)
foreach (var item in collection) { ... }

// After
for (int i = 0; i < collection.Length; i++) { ... }

Extract Helper Methods

// Mark helpers as [Kernel]
[Kernel]
public static double ComputeValue(double x)
{
    return Math.Sin(x) * Math.Cos(x);
}

[EntryPoint]
public static void Process(double[] data, double[] result, int N)
{
    int i = threadIdx.x + blockIdx.x * blockDim.x;
    if (i < N)
        result[i] = ComputeValue(data[i]);
}

Handle Object Allocations

// Before (not supported - heap allocation)
var temp = new MyClass();

// After - use structs passed as parameters
public struct MyStruct { public float Value; }

[EntryPoint]
public static void Process(MyStruct[] data, int N) { ... }

Step 5: Validate Results

Always compare GPU results with a CPU reference:

// CPU reference
double[] cpuResult = new double[N];
for (int i = 0; i < N; i++)
    cpuResult[i] = Math.Sin(data[i]) * Math.Cos(data[i]);

// GPU result
wrapper.Process(data, gpuResult, N);

// Compare
for (int i = 0; i < N; i++)
{
    double diff = Math.Abs(cpuResult[i] - gpuResult[i]);
    if (diff > 1e-10)
        Console.WriteLine($"Mismatch at {i}: {diff}");
}

warning

Floating-point results may differ slightly between CPU and GPU due to different instruction ordering and precision.

Step 6: Optimize

After validating correctness, optimize performance:

Launch configuration: Use enough threads
Memory access: Ensure coalescence
Reduce transfers: Keep data on GPU

See Optimize Kernels for details.

Common Porting Patterns

Original Pattern	Hybridizer Equivalent
`for (i = 0; i < N; i++)`	Grid-stride loop
`Parallel.For(...)`	`[EntryPoint]` with threading
LINQ operations	Explicit loops
Object creation	Pre-allocated struct arrays

Next Steps

Optimize Kernels — Performance tuning
Core Concepts — Work distribution
CUDA Threading — Threading model

Overview​

Step 1: Identify Hot Paths​

Step 2: Check Compatibility​

Step 3: Add Attributes​

Step 4: Refactor Unsupported Patterns​

Replace foreach with for​

Extract Helper Methods​

Handle Object Allocations​

Step 5: Validate Results​

Step 6: Optimize​

Common Porting Patterns​

Next Steps​