Skip to content

A native AOT-first universal compute framework for .NET 9+

License

Notifications You must be signed in to change notification settings

mivertowski/DotCompute

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,042 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

DotCompute

NuGet License: MIT .NET Build

High-performance GPU and CPU compute acceleration for .NET

DotCompute enables .NET developers to write compute kernels in pure C# and execute them across GPUs and CPUs with automatic optimization. Define kernels using attributes, and the framework handles compilation, memory management, and backend selection.

[Kernel]
public static void VectorAdd(ReadOnlySpan<float> a, ReadOnlySpan<float> b, Span<float> result)
{
    int idx = Kernel.ThreadId.X;
    if (idx < result.Length)
        result[idx] = a[idx] + b[idx];
}

Features

Kernel Development

  • Attribute-Based API - Define kernels with [Kernel] and [RingKernel] attributes using familiar C# syntax
  • Source Generators - Compile-time code generation for optimal performance with Native AOT support
  • IDE Integration - 12 Roslyn diagnostic rules with real-time feedback and 5 automated code fixes

GPU Acceleration

  • Multi-Backend Support - CUDA (NVIDIA), OpenCL (cross-platform), Metal (Apple), and CPU SIMD
  • Automatic Backend Selection - Intelligent workload-based routing between CPU and GPU
  • Unified Memory - Seamless data transfer with 90% allocation reduction through pooling

Performance

  • SIMD Vectorization - AVX2/AVX512/NEON support with measured 3.7x CPU speedup
  • GPU Acceleration - 21-92x speedup on CUDA (benchmarked on RTX 2000 Ada)
  • Kernel Fusion - Automatic operation merging for 50-80% bandwidth reduction

Advanced Capabilities

  • Ring Kernels - Persistent GPU computation with lock-free message passing for actor systems
  • Atomic Operations - Lock-free concurrent access (AtomicAdd, AtomicCAS, AtomicMin/Max)
  • LINQ Integration - GPU-accelerated LINQ queries with automatic kernel compilation
  • Cross-Backend Debugging - Validate results across CPU and GPU for correctness

Quick Start

1. Install the packages

dotnet add package DotCompute.Core
dotnet add package DotCompute.Backends.CPU
dotnet add package DotCompute.Backends.CUDA  # For NVIDIA GPUs

2. Define a kernel

using DotCompute.Core;

public static class Kernels
{
    [Kernel]
    public static void Scale(ReadOnlySpan<float> input, Span<float> output, float factor)
    {
        int idx = Kernel.ThreadId.X;
        if (idx < output.Length)
            output[idx] = input[idx] * factor;
    }
}

3. Execute with dependency injection

using Microsoft.Extensions.DependencyInjection;
using DotCompute.Runtime;

var services = new ServiceCollection();
services.AddDotComputeRuntime();
services.AddProductionOptimization();

var provider = services.BuildServiceProvider();
var orchestrator = provider.GetRequiredService<IComputeOrchestrator>();

// Automatic backend selection (GPU if available, CPU otherwise)
await orchestrator.ExecuteAsync("Scale", input, output, 2.0f);

Installation

# Core framework
dotnet add package DotCompute.Core

# Backends (install what you need)
dotnet add package DotCompute.Backends.CPU     # SIMD-optimized CPU execution
dotnet add package DotCompute.Backends.CUDA    # NVIDIA GPU (requires CUDA Toolkit)
dotnet add package DotCompute.Backends.OpenCL  # Cross-platform GPU (experimental)
dotnet add package DotCompute.Backends.Metal   # Apple Silicon / macOS

# Extensions
dotnet add package DotCompute.Linq             # GPU-accelerated LINQ
dotnet add package DotCompute.Algorithms       # Common parallel algorithms

Backend Support

Backend Status Performance Hardware
CPU Production 3.7x (SIMD) AVX2/AVX512/NEON processors
CUDA Production 21-92x NVIDIA GPUs (Compute Capability 5.0+)
Metal Production Native acceleration Apple Silicon, Intel Macs (2016+)
OpenCL Experimental Cross-platform NVIDIA, AMD, Intel, ARM Mali, Qualcomm Adreno

Code Examples

GPU-Accelerated LINQ

using DotCompute.Linq;

var result = data
    .AsComputeQueryable()
    .Where(x => x > threshold)
    .Select(x => x * factor)
    .Sum();  // Executes on GPU automatically

Atomic Operations

[Kernel]
public static void Histogram(ReadOnlySpan<int> values, Span<int> bins)
{
    int idx = Kernel.ThreadId.X;
    if (idx < values.Length)
    {
        int bin = values[idx] / 10;
        AtomicOps.AtomicAdd(ref bins[bin], 1);
    }
}

Ring Kernels (Persistent GPU Computation)

[RingKernel(Mode = RingKernelMode.Persistent, Capacity = 10000)]
public static void ProcessMessages(
    IMessageQueue<Message> incoming,
    IMessageQueue<Message> outgoing,
    Span<float> state)
{
    int id = Kernel.ThreadId.X;

    while (incoming.TryDequeue(out var msg))
    {
        state[id] += msg.Value;
        outgoing.Enqueue(new Message { Id = id, Value = state[id] });
    }
}

Cross-Backend Debugging

services.AddProductionDebugging();

var debugService = provider.GetRequiredService<IKernelDebugService>();
var result = await debugService.ValidateKernelAsync("MyKernel", testData);

if (!result.IsValid)
{
    foreach (var issue in result.Issues)
        Console.WriteLine($"{issue.Severity}: {issue.Message}");
}

Performance

Benchmarks performed with BenchmarkDotNet on .NET 9.0:

Operation Dataset Baseline DotCompute Speedup
Vector Add 100K floats 2.14ms 0.58ms 3.7x (CPU SIMD)
Matrix Multiply 1024x1024 850ms 9.2ms 92x (CUDA)
Sum Reduction 1M elements 10ms 0.3ms 33x (CUDA)
Filter + Map 1M elements 35ms 1.5ms 23x (fused kernel)

GPU benchmarks on NVIDIA RTX 2000 Ada. Results vary by hardware and workload.


Requirements

Minimum

  • .NET 9.0 SDK
  • 64-bit OS (Windows, Linux, macOS)

For CUDA

For Metal

  • macOS 10.13+ (High Sierra)
  • Metal-capable GPU

For OpenCL

  • OpenCL 1.2+ runtime from your GPU vendor

WSL2 Limitations

WSL2 has limited GPU memory coherence that affects advanced features:

Feature Native Linux WSL2
Basic kernels Full support Full support
Persistent ring kernels Sub-ms latency ~5s latency
System-scope atomics Works Unreliable

For production workloads requiring low latency, use native Linux.


Documentation

Guides

Advanced Topics

Reference


Building from Source

git clone https://github.com/mivertowski/DotCompute.git
cd DotCompute

# Build
dotnet build DotCompute.sln --configuration Release

# Run tests (CPU only)
dotnet test --filter "Category!=Hardware"

# Run all tests (requires NVIDIA GPU)
./scripts/run-tests.sh DotCompute.sln

Contributing

Contributions welcome in these areas:

  • Performance optimizations
  • Additional backend implementations
  • Documentation improvements
  • Bug fixes and test coverage

See the contribution guidelines for details.


License

MIT License - see LICENSE for details.

Copyright (c) 2025 Michael Ivertowski

About

A native AOT-first universal compute framework for .NET 9+

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Sponsor this project

 

Contributors 3

  •  
  •  
  •