-
Notifications
You must be signed in to change notification settings - Fork 69
fix(consensus): fix private chain initialization #1987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev-upgrade
Are you sure you want to change the base?
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Fixes consensus v2 edge cases by switching vote signature verification back to using snapshot-derived masternode lists (to avoid requiring the proposed block header to be present) and tightening epoch/chain initialization logic to prevent underflow during TC epoch resolution.
Changes:
- Verify vote signatures against
snapshot.NextEpochCandidatesinstead ofepochInfo.Masternodes. - Adjust TC epoch lookup loop to avoid going below epoch 0 (intended).
- Improve error messages and enrich mining logs with round context; minor gaslimit handling tweak for genesis parent.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| consensus/XDPoS/utils/pool.go | Adds Pool.Inspect() helper to log pool contents. |
| consensus/XDPoS/engines/engine_v2/vote.go | Changes vote verification to use snapshot candidates; adds additional logging. |
| consensus/XDPoS/engines/engine_v2/verifyHeader.go | Minor whitespace-only change. |
| consensus/XDPoS/engines/engine_v2/utils.go | Improves GetBlockByEpochNumber error messages with context. |
| consensus/XDPoS/engines/engine_v2/timeout.go | Modifies TC epoch selection loop condition to address underflow. |
| consensus/XDPoS/engines/engine_v2/mining.go | Adds round to “your turn” info logs. |
| consensus/XDPoS/engines/engine_v2/engine.go | Aligns gaslimit verification behavior around genesis parent handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| log.Warn("[VerifyVoteMessage] check", "verified", verified, "error", err, "votedBlockHash", vote.ProposedBlockInfo.Hash.Hex(), "number", vote.ProposedBlockInfo.Number, "round", vote.ProposedBlockInfo.Round) | ||
|
|
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log.Warn("[VerifyVoteMessage] check", ...) runs for every vote verification (including successful ones), which can create extremely noisy logs on busy networks and hide actionable warnings. Consider removing it or lowering to Debug/Trace, and/or only logging when err != nil or !verified.
| } | ||
| log.Info("[getTCEpochInfo] Init epochInfo", "number", epochBlockInfo.Number, "round", epochRound, "tcRound", timeoutRound, "tcEpoch", tempTCEpoch) | ||
| for epochBlockInfo.Round > timeoutRound { | ||
| for epochBlockInfo.Round > timeoutRound || tempTCEpoch > 0 { |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop condition epochBlockInfo.Round > timeoutRound || tempTCEpoch > 0 does not prevent tempTCEpoch-- from underflowing when tempTCEpoch == 0 but epochBlockInfo.Round > timeoutRound is still true (e.g., if timeoutRound is very small). This can wrap tempTCEpoch to math.MaxUint64 and cause incorrect epoch lookups. Guard the decrement (break when tempTCEpoch == 0) or change the loop logic so the decrement only happens when tempTCEpoch > 0 (e.g., use && plus a final iteration that can reach epoch 0 safely).
| for epochBlockInfo.Round > timeoutRound || tempTCEpoch > 0 { | |
| for epochBlockInfo.Round > timeoutRound || tempTCEpoch > 0 { | |
| if tempTCEpoch == 0 { | |
| break | |
| } |
| if err != nil && parent.Number.Uint64() != 0 { // skip genesis block | ||
| return err | ||
| } | ||
|
|
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is trailing whitespace on this blank line; running gofmt should remove it. Some CI/lint setups will fail on whitespace-only diffs.
| p.lock.RLock() | ||
| defer p.lock.RUnlock() | ||
|
|
||
| for poolKey, objMap := range p.objList { | ||
| log.Info("[Inspect] Pool Key:", "poolKey", poolKey, "numObjects", len(objMap)) | ||
| for objHash, obj := range objMap { | ||
| log.Info("[Inspect] Object Hash:", "objHash", objHash.Hex(), "signer", obj.GetSigner().Hex()) | ||
| } | ||
| } |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inspect() holds the pool's read lock while emitting per-object log.Info calls. Logging can be slow and will block writers for the duration of the iteration, which can impact consensus responsiveness under load. Consider copying the minimal data you need (keys/counts/signer) under the lock, then releasing the lock before logging (or lower the log level / make it conditional).
| p.lock.RLock() | |
| defer p.lock.RUnlock() | |
| for poolKey, objMap := range p.objList { | |
| log.Info("[Inspect] Pool Key:", "poolKey", poolKey, "numObjects", len(objMap)) | |
| for objHash, obj := range objMap { | |
| log.Info("[Inspect] Object Hash:", "objHash", objHash.Hex(), "signer", obj.GetSigner().Hex()) | |
| } | |
| } | |
| type poolInfo struct { | |
| poolKey string | |
| numObjects int | |
| } | |
| type objInfo struct { | |
| poolKey string | |
| objHash string | |
| signer string | |
| } | |
| p.lock.RLock() | |
| poolInfos := make([]poolInfo, 0, len(p.objList)) | |
| var objInfos []objInfo | |
| for poolKey, objMap := range p.objList { | |
| poolInfos = append(poolInfos, poolInfo{ | |
| poolKey: poolKey, | |
| numObjects: len(objMap), | |
| }) | |
| for objHash, obj := range objMap { | |
| objInfos = append(objInfos, objInfo{ | |
| poolKey: poolKey, | |
| objHash: objHash.Hex(), | |
| signer: obj.GetSigner().Hex(), | |
| }) | |
| } | |
| } | |
| p.lock.RUnlock() | |
| for _, info := range poolInfos { | |
| log.Info("[Inspect] Pool Key:", "poolKey", info.poolKey, "numObjects", info.numObjects) | |
| } | |
| for _, info := range objInfos { | |
| log.Info("[Inspect] Object Hash:", "objHash", info.objHash, "signer", info.signer) | |
| } |
| snapshot, err := x.getSnapshot(chain, vote.GapNumber, true) | ||
| if err != nil { | ||
| log.Error("[VerifyVoteMessage] Fail to get epochInfo when verifying vote message!", "blockNum", vote.ProposedBlockInfo.Number, "blockHash", vote.ProposedBlockInfo.Hash, "voteHash", vote.Hash(), "voteGapNumber", vote.GapNumber, "err", err) | ||
| log.Error("[VerifyVoteMessage] fail to get snapshot for a vote message", "blockNum", vote.ProposedBlockInfo.Number, "blockHash", vote.ProposedBlockInfo.Hash, "voteHash", vote.Hash(), "error", err.Error()) | ||
| return false, err | ||
| } |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VerifyVoteMessage now always calls getSnapshot and returns an error if the gap header/snapshot isn't available. This breaks the previously supported timing condition where votes can arrive before required chain data is present (and will also make the existing TestVerifyVoteMessage_HeaderNotPresent fail). Consider deferring verification when the gap header/snapshot isn't available yet (return (false, nil) and log at Debug/Trace) instead of logging an error and returning a hard failure.
ecfb357 to
233f6ec
Compare
| snapshot, err := x.getSnapshot(chain, vote.GapNumber, true) | ||
| if err != nil { | ||
| log.Error("[VerifyVoteMessage] Fail to get epochInfo when verifying vote message!", "blockNum", vote.ProposedBlockInfo.Number, "blockHash", vote.ProposedBlockInfo.Hash, "voteHash", vote.Hash(), "voteGapNumber", vote.GapNumber, "err", err) | ||
| log.Error("[VerifyVoteMessage] fail to get snapshot for a vote message", "blockNum", vote.ProposedBlockInfo.Number, "blockHash", vote.ProposedBlockInfo.Hash, "voteHash", vote.Hash(), "error", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does "error", err work in this line ?
| // Ensure gas settings are bounded | ||
| if err := misc.VerifyGaslimit(parent.GasLimit, header.GasLimit); err != nil { | ||
| err = misc.VerifyGaslimit(parent.GasLimit, header.GasLimit) | ||
| if err != nil && parent.Number.Uint64() != 0 { // skip genesis block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parent.Number.Sign() != 0 has better performance than parent.Number.Uint64() != 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in another PR
20d3ebc to
01d7c1d
Compare
12f3998 to
92a712d
Compare
Proposed changes
Revert to using masternodes list from snapshot during vote verification.
This is due to
epochInfo.Masternodesrequires target block to be downloaded in database but during voting sometimes vote messages arrive before block download. Causing delay in minting blocks.Chain initialization underflow,
tempTCEpochshould have lowest value of0.ref: #1628 #1850
Types of changes
What types of changes does your code introduce to XDC network?
Put an
✅in the boxes that applyImpacted Components
Which part of the codebase this PR will touch base on,
Put an
✅in the boxes that applyChecklist
Put an
✅in the boxes once you have confirmed below actions (or provide reasons on not doing so) that