perf: unified performance optimizations (simd186, reward distribution, O(1) lookups) by 7layermagik · Pull Request #195 · Overclock-Validator/mithril

7layermagik · 2026-01-24T19:48:54Z

Summary

Unified performance PR combining non-cache optimizations from multiple branches:

simd186 account memoization - avoid double-clone in loadAndValidateTxAccts
Reward distribution optimizations - memory pooling, thread safety, worker pool reuse
O(1) lookups - replace slice iterations with map lookups in hot paths
Clone stats profiling - track account clone vs modify ratios

Changes

1. Account Memoization (simd186)

Cache accounts loaded in Pass 1 to avoid re-cloning in Pass 2 of loadAndValidateTxAcctsSimd186.

2. Reward Distribution Optimizations

Optimize memory and pool usage during reward distribution
Add thread safety improvements
Reuse worker pool across reward partitions

3. O(1) Lookups and Capacity Hints

Change	Before	After
`newReservedAccts` check	O(n) slice search	O(1) map lookup
`programIDs` check in isWritable	O(n) per account	O(1) map, built once per tx
`writablePubkeys` in recordStakeAndVoteAccounts	O(n*m) nested loop	O(1) map lookup

Capacity hints added:

instructionAcctPubkeys: len(tx.Message.AccountKeys)
validatedLoaders: 4
ModifiedVoteStates: 8
pkToAcct: len(b.Transactions)*4
alreadyAdded: len(slotCtx.WritableAccts)

4. Clone Stats Profiling

Track per-transaction account clone vs modification rates:

clone stats: 15.2% modified (1523/10000 accts) | 45.3MB cloned, 6.8MB modified | avg/tx: 8.2 cloned, 1.2 modified

Files Changed

File	Changes
`pkg/replay/accounts.go`	simd186 memoization, capacity hints, clone stats
`pkg/replay/transaction.go`	O(1) lookups, capacity hints, clone stats
`pkg/replay/block.go`	Clone stats summary, capacity hint
`pkg/replay/rewards.go`	Reward distribution optimizations
`pkg/sealevel/sealevel.go`	Exported NewReservedAcctsSet, IsWritable signature
`pkg/rent/rent.go`	programIDSet for IsWritable
`pkg/replay/topsort_planner.go`	Capacity hint
`pkg/snapshot/build_db*.go`	Log message consistency

Test Plan

go build ./cmd/mithril/... ./pkg/replay/... passes
Run on mainnet - bank hashes match
Check 100-slot summary shows clone stats

🤖 Generated with Claude Code

…ateTxAccts When SIMD-186 is active, loadAndValidateTxAcctsSimd186 was loading each account twice: once for size accumulation (Pass 1) and once for building TransactionAccounts (Pass 2). Each GetAccount call clones the account, causing 2x allocations and data copies per account per transaction. Changes: - Add acctCache slice to store accounts from Pass 1 - Reuse cached accounts in Pass 2 instead of re-cloning - Replace programIdIdxs slice with isProgramIdx boolean mask for O(1) lookup (eliminates slices.Contains linear scan in hot loop) - Reuse cache in program validation loop via tx.Message.Instructions index Impact: ~50% reduction in account allocations/copies per transaction, reduced GC pressure during high-throughput replay. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add MarshalStakeStakeInto to write stake state directly into existing buffer, eliminating ~600MB of allocations during reward distribution - Remove unnecessary ants.Release() calls that were tearing down global ants state after each partition (4 occurrences) - Add InRewardsWindow flag to AccountsDb to skip caching stake accounts during reward distribution (prevents cache pollution from 1.25M one-shot reads) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add MarshalStakeStakeInto for zero-allocation stake serialization - Add InRewardsWindow atomic.Bool to skip stake account caching during rewards - Cache bypass on both read and write paths (prevents cache thrashing) - Remove unnecessary ants.Release() calls (4x) - Add docs/TODO.md tracking known issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add WorkerPool field to PartitionedRewardDistributionInfo - Add rewardDistributionTask struct to carry per-task context - Create pool once on first partition, reuse for all 243 partitions - Release pool when NumRewardPartitionsRemaining == 0 - Eliminates 243× pool create/destroy cycles during rewards Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- "Using snapshot file:" → "Using full snapshot:" - "Parsing manifest from {path}" → "Parsing full/incremental snapshot manifest..." - Remove redundant path repetition after initial "Using" lines Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Adds defensive bounds check to prevent panic if ProgramIDIndex is out of range. Falls back to GetAccount lookup for out-of-bounds indices (shouldn't happen for valid mainnet transactions). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Convert newReservedAccts slice to NewReservedAcctsSet map (exported from sealevel) - Change isWritable/IsWritable to take programIDSet map for O(1) lookup - Build programIDSet once per tx instead of calling GetProgramIDs per account - Convert writablePubkeys slice to map for recordStakeAndVoteAccounts - Add capacity hints to frequently-allocated maps: - instructionAcctPubkeys: len(tx.Message.AccountKeys) - validatedLoaders: 4 (usually ≤4 loaders) - ModifiedVoteStates: 8 - pkToAcct: len(b.Transactions)*4 - alreadyAdded: len(slotCtx.WritableAccts) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Track per-transaction account clone vs modification rates to quantify copy-on-write optimization potential: - TxAcctsCloned / TxAcctsClonedBytes: accounts loaded per tx - TxAcctsTouched / TxAcctsTouchedBytes: accounts actually modified - Shows modify ratio in 100-slot summary (e.g., "15% modified") This helps identify how much cloning overhead could be saved with lazy copy-on-write semantics. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Thread programIDSet from instrsAndAcctMetasFromTx through to NewRentStateInfo, eliminating 3 redundant builds per transaction: Before: programIDSet built in 4 places per tx - instrsAndAcctMetasFromTx - ProcessTransaction isWritable loop - NewRentStateInfo (pre-tx) - NewRentStateInfo (post-tx) After: programIDSet built once in instrsAndAcctMetasFromTx, passed to all consumers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Return txAcctMetas from loadAndValidateTxAccts and loadAndValidateTxAcctsSimd186 to avoid calling tx.AccountMetaList() twice per transaction. The function is already called during account loading, so we return and reuse that result in ProcessTransaction's writable account iteration. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add capacity hint for per-instruction acctMetas slice to avoid reallocation - Build writablePubkeySet while appending to writablePubkeys, eliminating the second loop over the slice Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove "slow path, prev block wrote my ALT" debug log - Remove docs/TODO.md (tracked issues moved elsewhere) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

7layermagik and others added 12 commits January 25, 2026 12:53

chore: remove debug log and stale TODO file

bac4bfa

- Remove "slow path, prev block wrote my ALT" debug log - Remove docs/TODO.md (tracked issues moved elsewhere) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

7layermagik force-pushed the 7layer/perf/unified branch from 0167b38 to bac4bfa Compare January 25, 2026 18:55

smcio approved these changes Jan 27, 2026

View reviewed changes

smcio merged commit 29b7025 into dev Jan 27, 2026
1 check passed

7layermagik deleted the 7layer/perf/unified branch January 28, 2026 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: unified performance optimizations (simd186, reward distribution, O(1) lookups)#195

perf: unified performance optimizations (simd186, reward distribution, O(1) lookups)#195
smcio merged 12 commits intodevfrom
7layer/perf/unified

7layermagik commented Jan 24, 2026 •

edited by smcio

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

7layermagik commented Jan 24, 2026 • edited by smcio Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Account Memoization (simd186)

2. Reward Distribution Optimizations

3. O(1) Lookups and Capacity Hints

4. Clone Stats Profiling

Files Changed

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

7layermagik commented Jan 24, 2026 •

edited by smcio

Loading