perf: unified performance optimizations (simd186, reward distribution, O(1) lookups)#195
Merged
perf: unified performance optimizations (simd186, reward distribution, O(1) lookups)#195
Conversation
…ateTxAccts When SIMD-186 is active, loadAndValidateTxAcctsSimd186 was loading each account twice: once for size accumulation (Pass 1) and once for building TransactionAccounts (Pass 2). Each GetAccount call clones the account, causing 2x allocations and data copies per account per transaction. Changes: - Add acctCache slice to store accounts from Pass 1 - Reuse cached accounts in Pass 2 instead of re-cloning - Replace programIdIdxs slice with isProgramIdx boolean mask for O(1) lookup (eliminates slices.Contains linear scan in hot loop) - Reuse cache in program validation loop via tx.Message.Instructions index Impact: ~50% reduction in account allocations/copies per transaction, reduced GC pressure during high-throughput replay. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add MarshalStakeStakeInto to write stake state directly into existing buffer, eliminating ~600MB of allocations during reward distribution - Remove unnecessary ants.Release() calls that were tearing down global ants state after each partition (4 occurrences) - Add InRewardsWindow flag to AccountsDb to skip caching stake accounts during reward distribution (prevents cache pollution from 1.25M one-shot reads) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add MarshalStakeStakeInto for zero-allocation stake serialization - Add InRewardsWindow atomic.Bool to skip stake account caching during rewards - Cache bypass on both read and write paths (prevents cache thrashing) - Remove unnecessary ants.Release() calls (4x) - Add docs/TODO.md tracking known issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add WorkerPool field to PartitionedRewardDistributionInfo - Add rewardDistributionTask struct to carry per-task context - Create pool once on first partition, reuse for all 243 partitions - Release pool when NumRewardPartitionsRemaining == 0 - Eliminates 243× pool create/destroy cycles during rewards Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- "Using snapshot file:" → "Using full snapshot:"
- "Parsing manifest from {path}" → "Parsing full/incremental snapshot manifest..."
- Remove redundant path repetition after initial "Using" lines
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds defensive bounds check to prevent panic if ProgramIDIndex is out of range. Falls back to GetAccount lookup for out-of-bounds indices (shouldn't happen for valid mainnet transactions). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Convert newReservedAccts slice to NewReservedAcctsSet map (exported from sealevel) - Change isWritable/IsWritable to take programIDSet map for O(1) lookup - Build programIDSet once per tx instead of calling GetProgramIDs per account - Convert writablePubkeys slice to map for recordStakeAndVoteAccounts - Add capacity hints to frequently-allocated maps: - instructionAcctPubkeys: len(tx.Message.AccountKeys) - validatedLoaders: 4 (usually ≤4 loaders) - ModifiedVoteStates: 8 - pkToAcct: len(b.Transactions)*4 - alreadyAdded: len(slotCtx.WritableAccts) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Track per-transaction account clone vs modification rates to quantify copy-on-write optimization potential: - TxAcctsCloned / TxAcctsClonedBytes: accounts loaded per tx - TxAcctsTouched / TxAcctsTouchedBytes: accounts actually modified - Shows modify ratio in 100-slot summary (e.g., "15% modified") This helps identify how much cloning overhead could be saved with lazy copy-on-write semantics. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Thread programIDSet from instrsAndAcctMetasFromTx through to NewRentStateInfo, eliminating 3 redundant builds per transaction: Before: programIDSet built in 4 places per tx - instrsAndAcctMetasFromTx - ProcessTransaction isWritable loop - NewRentStateInfo (pre-tx) - NewRentStateInfo (post-tx) After: programIDSet built once in instrsAndAcctMetasFromTx, passed to all consumers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Return txAcctMetas from loadAndValidateTxAccts and loadAndValidateTxAcctsSimd186 to avoid calling tx.AccountMetaList() twice per transaction. The function is already called during account loading, so we return and reuse that result in ProcessTransaction's writable account iteration. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add capacity hint for per-instruction acctMetas slice to avoid reallocation - Build writablePubkeySet while appending to writablePubkeys, eliminating the second loop over the slice Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove "slow path, prev block wrote my ALT" debug log - Remove docs/TODO.md (tracked issues moved elsewhere) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
0167b38 to
bac4bfa
Compare
smcio
approved these changes
Jan 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Unified performance PR combining non-cache optimizations from multiple branches:
Changes
1. Account Memoization (simd186)
Cache accounts loaded in Pass 1 to avoid re-cloning in Pass 2 of
loadAndValidateTxAcctsSimd186.2. Reward Distribution Optimizations
3. O(1) Lookups and Capacity Hints
newReservedAcctscheckprogramIDscheck in isWritablewritablePubkeysin recordStakeAndVoteAccountsCapacity hints added:
instructionAcctPubkeys:len(tx.Message.AccountKeys)validatedLoaders: 4ModifiedVoteStates: 8pkToAcct:len(b.Transactions)*4alreadyAdded:len(slotCtx.WritableAccts)4. Clone Stats Profiling
Track per-transaction account clone vs modification rates:
Files Changed
pkg/replay/accounts.gopkg/replay/transaction.gopkg/replay/block.gopkg/replay/rewards.gopkg/sealevel/sealevel.gopkg/rent/rent.gopkg/replay/topsort_planner.gopkg/snapshot/build_db*.goTest Plan
go build ./cmd/mithril/... ./pkg/replay/...passes🤖 Generated with Claude Code