Releases: mivertowski/RustCompute
v0.4.0: GPU Infrastructure Generalization & Python Bindings
Highlights
This release extracts ~7,000+ lines of proven GPU infrastructure from RustGraph into RingKernel, making these capabilities available to all RingKernel users.
New: Python Bindings (ringkernel-python)
PyO3-based Python wrapper with full async/await support:
import ringkernel
import asyncio
async def main():
runtime = await ringkernel.RingKernel.create(backend="cpu")
kernel = await runtime.launch("processor", ringkernel.LaunchOptions())
await kernel.terminate()
await runtime.shutdown()
asyncio.run(main())Features:
- Async/await with sync fallbacks
- HLC timestamps and K2K messaging
- CUDA device enumeration and GPU memory pool management
- Benchmark framework with regression detection
- Hybrid CPU/GPU dispatcher with adaptive thresholds
- Resource guard for memory limit enforcement
- Type stubs for IDE support
New: PTX Compilation Cache
Disk-based PTX caching for faster kernel loading with SHA-256 content hashing and compute capability awareness.
New: GPU Stratified Memory Pool
Size-stratified GPU VRAM pool with 6 size classes (256B-256KB), O(1) allocation from free lists.
New: Multi-Stream Execution Manager
Multi-stream CUDA execution for compute/transfer overlap with event-based synchronization.
New: Benchmark Framework
Comprehensive benchmarking with regression detection, baseline comparison, and multiple report formats (Markdown, JSON, LaTeX).
New: Hybrid CPU-GPU Dispatcher
Intelligent workload routing with adaptive threshold learning between CPU and GPU execution.
New: Resource Guard
Memory limit enforcement with safety margins and RAII reservation patterns.
New: Kernel Mode Selector
Intelligent kernel launch configuration based on workload profile and GPU architecture.
See CHANGELOG.md for full details.
v0.3.2: GPU Profiling Infrastructure
What's New
GPU Profiling Infrastructure
- CUDA event-based timing and NVTX markers
- Memory allocation tracking
- Chrome trace export for visualization
Publishing Fixes
- Fixed publish script to add User-Agent header for crates.io API
- Updated dependency versions across all crates for v0.3.2 publishing
- ringkernel-ir, ringkernel-graph, ringkernel-montecarlo now use workspace versions
Crates Published
- ringkernel-core, ringkernel-cuda-codegen, ringkernel-wgpu-codegen
- ringkernel-derive, ringkernel-cpu, ringkernel-cuda, ringkernel-wgpu, ringkernel-metal
- ringkernel-codegen, ringkernel-ecosystem, ringkernel-audio-fft
- ringkernel (main crate)
See crates.io/crates/ringkernel for the published crates.
v0.3.1: Enterprise Readiness
RingKernel v0.3.1: Enterprise Readiness
This release adds comprehensive enterprise-grade features for production deployments.
🔐 Enterprise Security
- Real Cryptography: AES-256-GCM, ChaCha20-Poly1305, Argon2 key derivation
- Secrets Management:
SecretStoretrait with key rotation, caching, and chained stores - K2K Message Encryption: Kernel-to-kernel encryption with forward secrecy
- TLS/mTLS Support: Full TLS with rustls, certificate rotation, SNI resolution
🔑 Authentication & Authorization
- Authentication Providers:
ApiKeyAuth,JwtAuth(RS256/HS256),ChainedAuthProvider - RBAC: Role-based access control with deny-by-default
PolicyEvaluator - Multi-tenancy:
TenantContext,ResourceQuota, usage tracking
📊 Observability
- OpenTelemetry: OTLP export to Jaeger, Honeycomb, Datadog, Grafana Cloud
- Structured Logging: Multi-sink logger with trace correlation (JSON/Text)
- Alert Routing: Severity-based routing with deduplication (Slack, Teams, PagerDuty)
- Remote Audit Sinks: Syslog, CloudWatch Logs, Elasticsearch
⚡ Rate Limiting
- Algorithms: TokenBucket, SlidingWindow, LeakyBucket
- Builder API: Fluent configuration with
RateLimiterBuilder - Distributed:
SharedRateLimiterfor multi-instance deployments
🔧 Operational Excellence
- Automatic Recovery: Configurable policies per failure type (Restart, Migrate, Checkpoint, Notify, Escalate, Circuit)
- Operation Timeouts: Deadline propagation with
TimeoutandDeadlinetypes - Recovery Manager: Retry tracking, cooldown periods, automatic escalation
📦 Feature Flags
[dependencies]
ringkernel-core = { version = "0.3.1", features = ["enterprise"] }
# Or select specific features:
ringkernel-core = { version = "0.3.1", features = ["crypto", "auth", "tls", "rate-limiting", "alerting"] }📈 Metrics
- Test Coverage: 900+ tests (up from 825+)
- Crates Published: 21 crates to crates.io
🚀 Quick Start
use ringkernel_core::prelude::*;
// Enterprise runtime with production preset
let runtime = RuntimeBuilder::new()
.production()
.build()?;
// API key authentication
let auth = ApiKeyAuth::new()
.add_key("sk-prod-abc123", Identity::new("service-a"));
// Rate limiting
let limiter = RateLimiterBuilder::new()
.algorithm(RateLimitAlgorithm::TokenBucket)
.rate(1000)
.burst(100)
.build();Full Changelog
See CHANGELOG.md for complete details.
v0.3.0: Multi-Kernel Dispatch, Memory Pools, Global Reductions
RingKernel v0.3.0
GPU-native persistent actor model framework for Rust. This release adds multi-kernel dispatch, memory pools, global reduction primitives, and two new crates.
Highlights
- 21 crates published to crates.io - Full workspace now available
- 825+ tests across the workspace
- cudarc 0.18.2 and wgpu 27.0 support
New Features
Multi-Kernel Dispatch and Persistent Message Routing
#[derive(PersistentMessage)]macro for GPU kernel dispatchKernelDispatchercomponent with builder pattern and metrics- CUDA handler dispatch code generator (
CudaDispatchTable) - Queue tiering system (
QueueTier,QueueFactory,QueueMonitor)
Memory Pool Management
StratifiedMemoryPoolwith 5 size buckets (256B to 64KB)AnalyticsContextfor grouped buffer lifecyclePressureHandlerfor memory pressure monitoring- CUDA
ReductionBufferCacheand WebGPUStagingBufferPool
Global Reduction Primitives
ReductionOpenum: Sum, Min, Max, And, Or, Xor, ProductReductionBuffer<T>using mapped memory (zero-copy host read)- Multi-phase kernel execution with
SyncMode(Cooperative, SoftwareBarrier, MultiLaunch) - PageRank example with dangling node handling
CUDA NVRTC Compilation
compile_ptx()function for runtime CUDA compilation- Downstream crates can compile CUDA without direct cudarc dependency
Domain System
- 20 business domains with reserved type ID ranges
#[message(domain = "FraudDetection")]attribute- Domains: GraphAnalytics, FraudDetection, ProcessIntelligence, Banking, etc.
New Crates
ringkernel-montecarlo- Philox RNG, antithetic variates, control variates, importance samplingringkernel-graph- CSR matrix, BFS, SCC (Tarjan/Kosaraju), Union-Find, SpMV
Breaking Changes
- cudarc API updated to 0.18.2 (module loading, kernel launch builder pattern)
- wgpu API updated to 27.0 (Arc-based resources)
Installation
[dependencies]
ringkernel = "0.3.0"
# Optional backends
ringkernel-cuda = "0.3.0"
ringkernel-wgpu = "0.3.0"Documentation
Full Changelog: v0.2.0...v0.3.0
RingKernel v0.2.0
What's Changed
- Claude/persistent kernel implementation d nc3 o by @mivertowski in #9
Full Changelog: v0.1.3...v0.2.0
v0.1.3 - Dependency Updates & CI Fixes
Highlights
- wgpu 27.0 - Major update with Arc-based resource tracking (~40% performance improvement in some workloads)
- Dependency updates - tokio 1.48, axum 0.8, tonic 0.14, egui 0.31, winit 0.30
- CI/CD fixes - Workspace builds without CUDA/nvcc installed
What's Changed
Dependencies Updated
| Package | From | To |
|---|---|---|
| wgpu | 0.19 | 27.0 |
| tokio | 1.35 | 1.48 |
| thiserror | 1.0 | 2.0 |
| axum | 0.7 | 0.8 |
| tower | 0.4 | 0.5 |
| tonic | 0.11 | 0.14 |
| prost | 0.12 | 0.14 |
| egui/egui-wgpu/egui-winit | 0.27 | 0.31 |
| winit | 0.29 | 0.30 |
| glam | 0.27 | 0.29 |
| metal | 0.27 | 0.31 |
| arrow | 52 | 54 |
| polars | 0.39 | 0.46 |
| rayon | 1.10 | 1.11 |
| actix-rt | 2.9 | 2.10 |
Deferred Updates
- iced: Kept at 0.13 (0.14 requires major application API rewrite)
- rkyv: Kept at 0.7 (0.8 has incompatible data format)
CI/CD Improvements
- CUDA features are now opt-in (not default)
- Workspace builds succeed without nvcc installed
- Feature-gated CUDA tests with
#[cfg(feature = "cuda")]
See CHANGELOG.md for full details.
v0.1.2
Release v0.1.2 - **WaveSim3D** - 3D acoustic wave simulation with realistic physics - Full 3D FDTD wave propagation solver - Binaural audio rendering with HRTF support - Volumetric ray marching visualization - GPU-native actor system for distributed simulation - Expanded GPU intrinsics from ~45 to 120+ operations across 13 categories - Atomic operations: and, or, xor, inc, dec - 3D stencil intrinsics: up, down, at(dx, dy, dz) - Warp match/reduce operations (Volta+/SM 8.0+) - Bit manipulation, memory, special, and timing ops - 171 tests (up from 143) - Added required-features to CUDA-only wavesim binaries - Updated GitHub Actions release workflow See CHANGELOG.md for full details.
v0.1.1 - AccNet & ProcInt Showcase Applications
What's New
New Showcase Applications
AccNet - GPU-Accelerated Accounting Network Analytics
- Network visualization with force-directed graph layout
- Fraud detection: circular flows, threshold clustering, Benford's Law violations
- GAAP compliance checking for accounting rule violations
- Temporal analysis for seasonality, trends, and behavioral anomalies
- GPU kernels: Suspense detection, GAAP violation, Benford analysis, PageRank
ProcInt - GPU-Accelerated Process Intelligence
- DFG (Directly-Follows Graph) mining from event streams
- Pattern detection: bottlenecks, loops, rework, long-running activities
- Conformance checking with fitness and precision metrics
- Timeline view with partial order traces and concurrent activity visualization
- Multi-sector templates: Healthcare, Manufacturing, Finance, IT
- GPU kernels: DFG construction, pattern detection, partial order derivation, conformance checking
Changes
- Updated showcase documentation with AccNet and ProcInt sections
- Updated CI workflow to exclude CUDA tests on runners without GPU hardware
Fixes
- Fixed 14 clippy warnings in ringkernel-accnet
- Fixed benchmark API compatibility in ringkernel-accnet
- Fixed code formatting issues across showcase applications
Run the Applications
# AccNet - Accounting Network Analytics
cargo run -p ringkernel-accnet --release
# ProcInt - Process Intelligence
cargo run -p ringkernel-procint --releaseFull Changelog: v0.1.0...v0.1.1
RingKernel v0.1.0
RingKernel v0.1.0 - Initial Release
A GPU-native persistent actor model framework for Rust.
Highlights
- Persistent GPU Kernels: GPU compute units as long-running actors that maintain state between invocations
- Lock-free Message Queues: High-performance host↔GPU and kernel-to-kernel communication
- Hybrid Logical Clocks (HLC): Causal ordering across distributed GPU operations
- Multiple Backends: CPU, CUDA, WebGPU support
- Zero-copy Serialization: rkyv-based message passing
- Rust-to-GPU Transpilers: Write GPU kernels in Rust DSL, transpile to CUDA C or WGSL
Crates
| Crate | Description |
|---|---|
| ringkernel | Main facade crate |
| ringkernel-core | Core traits, types, HLC, K2K, PubSub |
| ringkernel-derive | Proc macros (#[derive(RingMessage)], #[ring_kernel]) |
| ringkernel-cpu | CPU backend |
| ringkernel-cuda | NVIDIA CUDA backend |
| ringkernel-wgpu | WebGPU backend |
| ringkernel-cuda-codegen | Rust-to-CUDA transpiler |
| ringkernel-wgpu-codegen | Rust-to-WGSL transpiler |
| ringkernel-wavesim | Wave simulation demo |
| ringkernel-txmon | Transaction monitoring demo |
Quick Start
[dependencies]
ringkernel = "0.1"
tokio = { version = "1", features = ["full"] }For GPU backends:
ringkernel = { version = "0.1", features = ["cuda"] }
# or
ringkernel = { version = "0.1", features = ["wgpu"] }Documentation
- API Docs: https://docs.rs/ringkernel
- Guides: https://mivertowski.github.io/RustCompute/
Performance
Benchmarked on NVIDIA RTX Ada:
- CUDA Codegen: ~93B elem/sec (12,378x vs CPU)
- Message queue throughput: ~75M ops/sec
- HLC timestamp generation: <10ns per tick
What's Included
- 14 workspace crates
- 390+ tests
- 20+ examples
- Comprehensive documentation
- Educational simulation modes (WaveSim)
- Real-time fraud detection demo (TxMon)