Skip to main content

Hello World: Vector Addition

Sample source: 1.Simple/HelloWorld

This is the simplest possible Hybridizer program: add two double arrays together. The same C# code runs both on the CPU (.NET) and on the GPU (CUDA).

The Code

using Hybridizer.Runtime.CUDAImports;
using System.Threading.Tasks;

class Program
{
[EntryPoint]
public static void Run(int N, double[] a, [In] double[] b)
{
Parallel.For(0, N, i => { a[i] += b[i]; });
}
}

Key observations

ElementPurpose
[EntryPoint]Tells Hybridizer to compile this method for GPU
Parallel.ForHybridizer maps this to a CUDA grid-stride loop
[In] on bDeclares b as read-only — saves a device-to-host copy
No [In]/[Out] on aa is both read and written — gets copied both ways

Launching on GPU

static void Main(string[] args)
{
int N = 1024 * 1024 * 16;
double[] acuda = new double[N];
double[] adotnet = new double[N];
double[] b = new double[N];

// Initialize data…
Random rand = new();
for (int i = 0; i < N; ++i)
{
acuda[i] = rand.NextDouble();
adotnet[i] = acuda[i];
b[i] = rand.NextDouble();
}

// Setup GPU wrapper
cuda.GetDeviceProperties(out cudaDeviceProp prop, 0);
HybRunner runner = SatelliteLoader.Load()
.SetDistrib(prop.multiProcessorCount * 16, 128);
dynamic wrapped = runner.Wrap(new Program());

// Run on GPU
wrapped.Run(N, acuda, b);
cuda.ERROR_CHECK(cuda.DeviceSynchronize());

// Run on CPU for comparison
Run(N, adotnet, b);

// Verify
for (int k = 0; k < N; ++k)
{
if (acuda[k] != adotnet[k]) {
Console.WriteLine("ERROR!");
return;
}
}
Console.WriteLine("DONE");
}

How It Works

Grid-stride loop expansion

When Hybridizer sees Parallel.For(0, N, i => ...), it generates a CUDA kernel equivalent to:

__global__ void Run(int N, double* a, const double* b)
{
for (int i = threadIdx.x + blockIdx.x * blockDim.x;
i < N;
i += blockDim.x * gridDim.x)
{
a[i] += b[i];
}
}

The [In] / [Out] Attributes

These System.Runtime.InteropServices attributes control data transfer direction:

AttributeTransferUse When
(none)Host ↔ Device (both ways)Array is read and written
[In]Host → Device onlyArray is read-only on device
[Out]Device → Host onlyArray is write-only on device
tip

Using [In] and [Out] correctly can halve your memory transfer time. See the InOut sample for a benchmark.

Next Steps