Skip to content

perf: unified performance optimizations (simd186, reward distribution, O(1) lookups)#195

Merged
smcio merged 12 commits intodevfrom
7layer/perf/unified
Jan 27, 2026
Merged

perf: unified performance optimizations (simd186, reward distribution, O(1) lookups)#195
smcio merged 12 commits intodevfrom
7layer/perf/unified

Conversation

@7layermagik
Copy link

@7layermagik 7layermagik commented Jan 24, 2026

Summary

Unified performance PR combining non-cache optimizations from multiple branches:

  • simd186 account memoization - avoid double-clone in loadAndValidateTxAccts
  • Reward distribution optimizations - memory pooling, thread safety, worker pool reuse
  • O(1) lookups - replace slice iterations with map lookups in hot paths
  • Clone stats profiling - track account clone vs modify ratios

Changes

1. Account Memoization (simd186)

Cache accounts loaded in Pass 1 to avoid re-cloning in Pass 2 of loadAndValidateTxAcctsSimd186.

2. Reward Distribution Optimizations

  • Optimize memory and pool usage during reward distribution
  • Add thread safety improvements
  • Reuse worker pool across reward partitions

3. O(1) Lookups and Capacity Hints

Change Before After
newReservedAccts check O(n) slice search O(1) map lookup
programIDs check in isWritable O(n) per account O(1) map, built once per tx
writablePubkeys in recordStakeAndVoteAccounts O(n*m) nested loop O(1) map lookup

Capacity hints added:

  • instructionAcctPubkeys: len(tx.Message.AccountKeys)
  • validatedLoaders: 4
  • ModifiedVoteStates: 8
  • pkToAcct: len(b.Transactions)*4
  • alreadyAdded: len(slotCtx.WritableAccts)

4. Clone Stats Profiling

Track per-transaction account clone vs modification rates:

clone stats: 15.2% modified (1523/10000 accts) | 45.3MB cloned, 6.8MB modified | avg/tx: 8.2 cloned, 1.2 modified

Files Changed

File Changes
pkg/replay/accounts.go simd186 memoization, capacity hints, clone stats
pkg/replay/transaction.go O(1) lookups, capacity hints, clone stats
pkg/replay/block.go Clone stats summary, capacity hint
pkg/replay/rewards.go Reward distribution optimizations
pkg/sealevel/sealevel.go Exported NewReservedAcctsSet, IsWritable signature
pkg/rent/rent.go programIDSet for IsWritable
pkg/replay/topsort_planner.go Capacity hint
pkg/snapshot/build_db*.go Log message consistency

Test Plan

  • go build ./cmd/mithril/... ./pkg/replay/... passes
  • Run on mainnet - bank hashes match
  • Check 100-slot summary shows clone stats

🤖 Generated with Claude Code

7layermagik and others added 12 commits January 25, 2026 12:53
…ateTxAccts

When SIMD-186 is active, loadAndValidateTxAcctsSimd186 was loading each
account twice: once for size accumulation (Pass 1) and once for building
TransactionAccounts (Pass 2). Each GetAccount call clones the account,
causing 2x allocations and data copies per account per transaction.

Changes:
- Add acctCache slice to store accounts from Pass 1
- Reuse cached accounts in Pass 2 instead of re-cloning
- Replace programIdIdxs slice with isProgramIdx boolean mask for O(1) lookup
  (eliminates slices.Contains linear scan in hot loop)
- Reuse cache in program validation loop via tx.Message.Instructions index

Impact: ~50% reduction in account allocations/copies per transaction,
reduced GC pressure during high-throughput replay.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add MarshalStakeStakeInto to write stake state directly into existing
  buffer, eliminating ~600MB of allocations during reward distribution
- Remove unnecessary ants.Release() calls that were tearing down global
  ants state after each partition (4 occurrences)
- Add InRewardsWindow flag to AccountsDb to skip caching stake accounts
  during reward distribution (prevents cache pollution from 1.25M one-shot reads)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add MarshalStakeStakeInto for zero-allocation stake serialization
- Add InRewardsWindow atomic.Bool to skip stake account caching during rewards
- Cache bypass on both read and write paths (prevents cache thrashing)
- Remove unnecessary ants.Release() calls (4x)
- Add docs/TODO.md tracking known issues

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add WorkerPool field to PartitionedRewardDistributionInfo
- Add rewardDistributionTask struct to carry per-task context
- Create pool once on first partition, reuse for all 243 partitions
- Release pool when NumRewardPartitionsRemaining == 0
- Eliminates 243× pool create/destroy cycles during rewards

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- "Using snapshot file:" → "Using full snapshot:"
- "Parsing manifest from {path}" → "Parsing full/incremental snapshot manifest..."
- Remove redundant path repetition after initial "Using" lines

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds defensive bounds check to prevent panic if ProgramIDIndex is
out of range. Falls back to GetAccount lookup for out-of-bounds
indices (shouldn't happen for valid mainnet transactions).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Convert newReservedAccts slice to NewReservedAcctsSet map (exported from sealevel)
- Change isWritable/IsWritable to take programIDSet map for O(1) lookup
- Build programIDSet once per tx instead of calling GetProgramIDs per account
- Convert writablePubkeys slice to map for recordStakeAndVoteAccounts
- Add capacity hints to frequently-allocated maps:
  - instructionAcctPubkeys: len(tx.Message.AccountKeys)
  - validatedLoaders: 4 (usually ≤4 loaders)
  - ModifiedVoteStates: 8
  - pkToAcct: len(b.Transactions)*4
  - alreadyAdded: len(slotCtx.WritableAccts)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Track per-transaction account clone vs modification rates to quantify
copy-on-write optimization potential:

- TxAcctsCloned / TxAcctsClonedBytes: accounts loaded per tx
- TxAcctsTouched / TxAcctsTouchedBytes: accounts actually modified
- Shows modify ratio in 100-slot summary (e.g., "15% modified")

This helps identify how much cloning overhead could be saved with
lazy copy-on-write semantics.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Thread programIDSet from instrsAndAcctMetasFromTx through to
NewRentStateInfo, eliminating 3 redundant builds per transaction:

Before: programIDSet built in 4 places per tx
- instrsAndAcctMetasFromTx
- ProcessTransaction isWritable loop
- NewRentStateInfo (pre-tx)
- NewRentStateInfo (post-tx)

After: programIDSet built once in instrsAndAcctMetasFromTx,
passed to all consumers.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Return txAcctMetas from loadAndValidateTxAccts and loadAndValidateTxAcctsSimd186
to avoid calling tx.AccountMetaList() twice per transaction. The function is
already called during account loading, so we return and reuse that result in
ProcessTransaction's writable account iteration.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add capacity hint for per-instruction acctMetas slice to avoid reallocation
- Build writablePubkeySet while appending to writablePubkeys, eliminating
  the second loop over the slice

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove "slow path, prev block wrote my ALT" debug log
- Remove docs/TODO.md (tracked issues moved elsewhere)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@smcio smcio merged commit 29b7025 into dev Jan 27, 2026
1 check passed
@7layermagik 7layermagik deleted the 7layer/perf/unified branch January 28, 2026 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants