Skip to content

Releases: mivertowski/RustCompute

v0.4.0: GPU Infrastructure Generalization & Python Bindings

25 Jan 21:23

Choose a tag to compare

Highlights

This release extracts ~7,000+ lines of proven GPU infrastructure from RustGraph into RingKernel, making these capabilities available to all RingKernel users.

New: Python Bindings (ringkernel-python)

PyO3-based Python wrapper with full async/await support:

import ringkernel
import asyncio

async def main():
    runtime = await ringkernel.RingKernel.create(backend="cpu")
    kernel = await runtime.launch("processor", ringkernel.LaunchOptions())
    await kernel.terminate()
    await runtime.shutdown()

asyncio.run(main())

Features:

  • Async/await with sync fallbacks
  • HLC timestamps and K2K messaging
  • CUDA device enumeration and GPU memory pool management
  • Benchmark framework with regression detection
  • Hybrid CPU/GPU dispatcher with adaptive thresholds
  • Resource guard for memory limit enforcement
  • Type stubs for IDE support

New: PTX Compilation Cache

Disk-based PTX caching for faster kernel loading with SHA-256 content hashing and compute capability awareness.

New: GPU Stratified Memory Pool

Size-stratified GPU VRAM pool with 6 size classes (256B-256KB), O(1) allocation from free lists.

New: Multi-Stream Execution Manager

Multi-stream CUDA execution for compute/transfer overlap with event-based synchronization.

New: Benchmark Framework

Comprehensive benchmarking with regression detection, baseline comparison, and multiple report formats (Markdown, JSON, LaTeX).

New: Hybrid CPU-GPU Dispatcher

Intelligent workload routing with adaptive threshold learning between CPU and GPU execution.

New: Resource Guard

Memory limit enforcement with safety margins and RAII reservation patterns.

New: Kernel Mode Selector

Intelligent kernel launch configuration based on workload profile and GPU architecture.


See CHANGELOG.md for full details.

v0.3.2: GPU Profiling Infrastructure

21 Jan 09:54

Choose a tag to compare

What's New

GPU Profiling Infrastructure

  • CUDA event-based timing and NVTX markers
  • Memory allocation tracking
  • Chrome trace export for visualization

Publishing Fixes

  • Fixed publish script to add User-Agent header for crates.io API
  • Updated dependency versions across all crates for v0.3.2 publishing
  • ringkernel-ir, ringkernel-graph, ringkernel-montecarlo now use workspace versions

Crates Published

  • ringkernel-core, ringkernel-cuda-codegen, ringkernel-wgpu-codegen
  • ringkernel-derive, ringkernel-cpu, ringkernel-cuda, ringkernel-wgpu, ringkernel-metal
  • ringkernel-codegen, ringkernel-ecosystem, ringkernel-audio-fft
  • ringkernel (main crate)

See crates.io/crates/ringkernel for the published crates.

v0.3.1: Enterprise Readiness

19 Jan 20:16

Choose a tag to compare

RingKernel v0.3.1: Enterprise Readiness

This release adds comprehensive enterprise-grade features for production deployments.

🔐 Enterprise Security

  • Real Cryptography: AES-256-GCM, ChaCha20-Poly1305, Argon2 key derivation
  • Secrets Management: SecretStore trait with key rotation, caching, and chained stores
  • K2K Message Encryption: Kernel-to-kernel encryption with forward secrecy
  • TLS/mTLS Support: Full TLS with rustls, certificate rotation, SNI resolution

🔑 Authentication & Authorization

  • Authentication Providers: ApiKeyAuth, JwtAuth (RS256/HS256), ChainedAuthProvider
  • RBAC: Role-based access control with deny-by-default PolicyEvaluator
  • Multi-tenancy: TenantContext, ResourceQuota, usage tracking

📊 Observability

  • OpenTelemetry: OTLP export to Jaeger, Honeycomb, Datadog, Grafana Cloud
  • Structured Logging: Multi-sink logger with trace correlation (JSON/Text)
  • Alert Routing: Severity-based routing with deduplication (Slack, Teams, PagerDuty)
  • Remote Audit Sinks: Syslog, CloudWatch Logs, Elasticsearch

⚡ Rate Limiting

  • Algorithms: TokenBucket, SlidingWindow, LeakyBucket
  • Builder API: Fluent configuration with RateLimiterBuilder
  • Distributed: SharedRateLimiter for multi-instance deployments

🔧 Operational Excellence

  • Automatic Recovery: Configurable policies per failure type (Restart, Migrate, Checkpoint, Notify, Escalate, Circuit)
  • Operation Timeouts: Deadline propagation with Timeout and Deadline types
  • Recovery Manager: Retry tracking, cooldown periods, automatic escalation

📦 Feature Flags

[dependencies]
ringkernel-core = { version = "0.3.1", features = ["enterprise"] }

# Or select specific features:
ringkernel-core = { version = "0.3.1", features = ["crypto", "auth", "tls", "rate-limiting", "alerting"] }

📈 Metrics

  • Test Coverage: 900+ tests (up from 825+)
  • Crates Published: 21 crates to crates.io

🚀 Quick Start

use ringkernel_core::prelude::*;

// Enterprise runtime with production preset
let runtime = RuntimeBuilder::new()
    .production()
    .build()?;

// API key authentication
let auth = ApiKeyAuth::new()
    .add_key("sk-prod-abc123", Identity::new("service-a"));

// Rate limiting
let limiter = RateLimiterBuilder::new()
    .algorithm(RateLimitAlgorithm::TokenBucket)
    .rate(1000)
    .burst(100)
    .build();

Full Changelog

See CHANGELOG.md for complete details.

v0.3.0: Multi-Kernel Dispatch, Memory Pools, Global Reductions

19 Jan 09:34

Choose a tag to compare

RingKernel v0.3.0

GPU-native persistent actor model framework for Rust. This release adds multi-kernel dispatch, memory pools, global reduction primitives, and two new crates.

Highlights

  • 21 crates published to crates.io - Full workspace now available
  • 825+ tests across the workspace
  • cudarc 0.18.2 and wgpu 27.0 support

New Features

Multi-Kernel Dispatch and Persistent Message Routing

  • #[derive(PersistentMessage)] macro for GPU kernel dispatch
  • KernelDispatcher component with builder pattern and metrics
  • CUDA handler dispatch code generator (CudaDispatchTable)
  • Queue tiering system (QueueTier, QueueFactory, QueueMonitor)

Memory Pool Management

  • StratifiedMemoryPool with 5 size buckets (256B to 64KB)
  • AnalyticsContext for grouped buffer lifecycle
  • PressureHandler for memory pressure monitoring
  • CUDA ReductionBufferCache and WebGPU StagingBufferPool

Global Reduction Primitives

  • ReductionOp enum: Sum, Min, Max, And, Or, Xor, Product
  • ReductionBuffer<T> using mapped memory (zero-copy host read)
  • Multi-phase kernel execution with SyncMode (Cooperative, SoftwareBarrier, MultiLaunch)
  • PageRank example with dangling node handling

CUDA NVRTC Compilation

  • compile_ptx() function for runtime CUDA compilation
  • Downstream crates can compile CUDA without direct cudarc dependency

Domain System

  • 20 business domains with reserved type ID ranges
  • #[message(domain = "FraudDetection")] attribute
  • Domains: GraphAnalytics, FraudDetection, ProcessIntelligence, Banking, etc.

New Crates

  • ringkernel-montecarlo - Philox RNG, antithetic variates, control variates, importance sampling
  • ringkernel-graph - CSR matrix, BFS, SCC (Tarjan/Kosaraju), Union-Find, SpMV

Breaking Changes

  • cudarc API updated to 0.18.2 (module loading, kernel launch builder pattern)
  • wgpu API updated to 27.0 (Arc-based resources)

Installation

[dependencies]
ringkernel = "0.3.0"

# Optional backends
ringkernel-cuda = "0.3.0"
ringkernel-wgpu = "0.3.0"

Documentation

Full Changelog: v0.2.0...v0.3.0

RingKernel v0.2.0

14 Jan 16:48

Choose a tag to compare

What's Changed

  • Claude/persistent kernel implementation d nc3 o by @mivertowski in #9

Full Changelog: v0.1.3...v0.2.0

v0.1.3 - Dependency Updates & CI Fixes

17 Dec 14:18

Choose a tag to compare

Highlights

  • wgpu 27.0 - Major update with Arc-based resource tracking (~40% performance improvement in some workloads)
  • Dependency updates - tokio 1.48, axum 0.8, tonic 0.14, egui 0.31, winit 0.30
  • CI/CD fixes - Workspace builds without CUDA/nvcc installed

What's Changed

Dependencies Updated

Package From To
wgpu 0.19 27.0
tokio 1.35 1.48
thiserror 1.0 2.0
axum 0.7 0.8
tower 0.4 0.5
tonic 0.11 0.14
prost 0.12 0.14
egui/egui-wgpu/egui-winit 0.27 0.31
winit 0.29 0.30
glam 0.27 0.29
metal 0.27 0.31
arrow 52 54
polars 0.39 0.46
rayon 1.10 1.11
actix-rt 2.9 2.10

Deferred Updates

  • iced: Kept at 0.13 (0.14 requires major application API rewrite)
  • rkyv: Kept at 0.7 (0.8 has incompatible data format)

CI/CD Improvements

  • CUDA features are now opt-in (not default)
  • Workspace builds succeed without nvcc installed
  • Feature-gated CUDA tests with #[cfg(feature = "cuda")]

See CHANGELOG.md for full details.

v0.1.2

11 Dec 09:55

Choose a tag to compare

Release v0.1.2

- **WaveSim3D** - 3D acoustic wave simulation with realistic physics
  - Full 3D FDTD wave propagation solver
  - Binaural audio rendering with HRTF support
  - Volumetric ray marching visualization
  - GPU-native actor system for distributed simulation

- Expanded GPU intrinsics from ~45 to 120+ operations across 13 categories
- Atomic operations: and, or, xor, inc, dec
- 3D stencil intrinsics: up, down, at(dx, dy, dz)
- Warp match/reduce operations (Volta+/SM 8.0+)
- Bit manipulation, memory, special, and timing ops
- 171 tests (up from 143)

- Added required-features to CUDA-only wavesim binaries
- Updated GitHub Actions release workflow

See CHANGELOG.md for full details.

v0.1.1 - AccNet & ProcInt Showcase Applications

04 Dec 15:40

Choose a tag to compare

What's New

New Showcase Applications

AccNet - GPU-Accelerated Accounting Network Analytics

  • Network visualization with force-directed graph layout
  • Fraud detection: circular flows, threshold clustering, Benford's Law violations
  • GAAP compliance checking for accounting rule violations
  • Temporal analysis for seasonality, trends, and behavioral anomalies
  • GPU kernels: Suspense detection, GAAP violation, Benford analysis, PageRank

ProcInt - GPU-Accelerated Process Intelligence

  • DFG (Directly-Follows Graph) mining from event streams
  • Pattern detection: bottlenecks, loops, rework, long-running activities
  • Conformance checking with fitness and precision metrics
  • Timeline view with partial order traces and concurrent activity visualization
  • Multi-sector templates: Healthcare, Manufacturing, Finance, IT
  • GPU kernels: DFG construction, pattern detection, partial order derivation, conformance checking

Changes

  • Updated showcase documentation with AccNet and ProcInt sections
  • Updated CI workflow to exclude CUDA tests on runners without GPU hardware

Fixes

  • Fixed 14 clippy warnings in ringkernel-accnet
  • Fixed benchmark API compatibility in ringkernel-accnet
  • Fixed code formatting issues across showcase applications

Run the Applications

# AccNet - Accounting Network Analytics
cargo run -p ringkernel-accnet --release

# ProcInt - Process Intelligence
cargo run -p ringkernel-procint --release

Full Changelog: v0.1.0...v0.1.1

RingKernel v0.1.0

03 Dec 16:12

Choose a tag to compare

RingKernel v0.1.0 - Initial Release

A GPU-native persistent actor model framework for Rust.

Highlights

  • Persistent GPU Kernels: GPU compute units as long-running actors that maintain state between invocations
  • Lock-free Message Queues: High-performance host↔GPU and kernel-to-kernel communication
  • Hybrid Logical Clocks (HLC): Causal ordering across distributed GPU operations
  • Multiple Backends: CPU, CUDA, WebGPU support
  • Zero-copy Serialization: rkyv-based message passing
  • Rust-to-GPU Transpilers: Write GPU kernels in Rust DSL, transpile to CUDA C or WGSL

Crates

Crate Description
ringkernel Main facade crate
ringkernel-core Core traits, types, HLC, K2K, PubSub
ringkernel-derive Proc macros (#[derive(RingMessage)], #[ring_kernel])
ringkernel-cpu CPU backend
ringkernel-cuda NVIDIA CUDA backend
ringkernel-wgpu WebGPU backend
ringkernel-cuda-codegen Rust-to-CUDA transpiler
ringkernel-wgpu-codegen Rust-to-WGSL transpiler
ringkernel-wavesim Wave simulation demo
ringkernel-txmon Transaction monitoring demo

Quick Start

[dependencies]
ringkernel = "0.1"
tokio = { version = "1", features = ["full"] }

For GPU backends:

ringkernel = { version = "0.1", features = ["cuda"] }
# or
ringkernel = { version = "0.1", features = ["wgpu"] }

Documentation

Performance

Benchmarked on NVIDIA RTX Ada:

  • CUDA Codegen: ~93B elem/sec (12,378x vs CPU)
  • Message queue throughput: ~75M ops/sec
  • HLC timestamp generation: <10ns per tick

What's Included

  • 14 workspace crates
  • 390+ tests
  • 20+ examples
  • Comprehensive documentation
  • Educational simulation modes (WaveSim)
  • Real-time fraud detection demo (TxMon)