-
Notifications
You must be signed in to change notification settings - Fork 245
feat(tracing): Add Store, P2P and Config tracing #2972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add OpenTelemetry tracing for all RPC server methods:
- ExecutorService: InitChain, GetTxs, ExecuteTxs, SetFinal
- StoreService: GetBlock, GetState, GetMetadata, GetGenesisDaHeight, GetP2PStoreInfo
- P2PService: GetPeerInfo, GetNetInfo
- ConfigService: GetNamespace, GetSignerInfo
All implementations follow decorator pattern with comprehensive test coverage.
Tracer names: ev-node/{service}-service
Summary of ChangesHello @chatton, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the observability of the system by integrating OpenTelemetry for distributed tracing. It introduces tracing capabilities for the Store, P2P, and Config RPC services, allowing for detailed insights into their execution flow and performance. The tracing can be toggled via configuration, providing flexibility for different environments. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Claude finished @chatton's task in 1m 56s —— View job Code Review CompleteI've performed a comprehensive code review of the tracing implementation. Overall, this is a solid addition that will significantly improve observability. However, there are several issues that need to be addressed. Critical Issues1. Missing per-store tracing attributes in GetP2PStoreInfo (pkg/rpc/server/tracing.go:140-158)The Recommendation: Add attributes for each store's height and label: span.SetAttributes(
attribute.Int("store_count", len(res.Msg.Stores)),
)
for i, store := range res.Msg.Stores {
span.SetAttributes(
attribute.String(fmt.Sprintf("store_%d_label", i), store.Label),
attribute.Int64(fmt.Sprintf("store_%d_height", i), int64(store.Height)),
)
}High Priority Issues2. Simplified GetBlock span attribute logic (pkg/rpc/server/tracing.go:32-70)The current implementation is already clean after applying Gemini's suggestion to consolidate the attribute building logic. Good job on this pattern - building attributes before starting the span is the right approach. However, note that the hex encoding is done inline which is good for consistency. 3. Significant code duplication across decorators (pkg/rpc/server/tracing.go:18-270)There's substantial boilerplate repeated across all traced methods:
Recommendation: Extract a generic helper function to reduce duplication. While this adds some complexity, it will make the code much more maintainable. Example pattern: func traceCall[Req, Res any](
ctx context.Context,
tracer trace.Tracer,
spanName string,
attrs []attribute.KeyValue,
call func(context.Context) (*connect.Response[Res], error),
enrichSpan func(*connect.Response[Res], trace.Span),
) (*connect.Response[Res], error) {
ctx, span := tracer.Start(ctx, spanName, trace.WithAttributes(attrs...))
defer span.End()
res, err := call(ctx)
if err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, err.Error())
return nil, err
}
if enrichSpan != nil {
enrichSpan(res, span)
}
return res, nil
}This pattern would reduce each traced method from ~20 lines to ~5-10 lines. 4. Missing test coverage for per-store attributes (pkg/rpc/server/tracing_test.go:290-316)The test Recommendation: Add assertions like: requireAttribute(t, attrs, "store_0_label", "Header Store")
requireAttribute(t, attrs, "store_0_height", int64(100))
requireAttribute(t, attrs, "store_1_label", "Data Store")
requireAttribute(t, attrs, "store_1_height", int64(99))Code Quality & Architecture5. Excellent use of the decorator pattern (pkg/rpc/server/tracing.go:18-30, 160-172, 217-229)The decorator pattern is the right choice here. The 6. Good tracer naming convention (pkg/rpc/server/tracing.go:28, 170, 227)Using distinct tracer names for each service ( 7. Appropriate span attribute choices (throughout tracing.go)The attributes captured are meaningful and relevant:
Good use of hex encoding for binary data to make traces human-readable. 8. Missing context propagation commentWhile the code correctly propagates context through the traced calls, there's no documentation explaining that this enables distributed tracing across service boundaries. Consider adding a package-level comment explaining this. Testing9. Comprehensive test coverage structure (pkg/rpc/server/tracing_test.go)Good test organization:
10. Good use of OpenTelemetry SDK test utilities (pkg/rpc/server/tracing_test.go:113-138)Using 11. Error test coverage (lines 182-202, 348-365, 425-442, 469-486)Good coverage of error scenarios. Error tests verify that:
One minor improvement: Consider verifying the error message is captured correctly. Performance & Resource Efficiency12. Efficient attribute allocation (throughout tracing.go)Attributes are built incrementally and passed to 13. Potential allocation concern in GetBlock (pkg/rpc/server/tracing.go:60-67)The block size calculation iterates over all transactions to sum their sizes. This is fine for observability, but worth noting that it adds O(n) overhead for tracing. Consider documenting this or making it conditional on a verbosity setting if block sizes become very large. Security14. Sensitive data in traces (pkg/rpc/server/tracing.go:267)The signer address is included in traces as a hex-encoded string. While this is likely public information in a blockchain context, verify that this is intentional and documented. Consider whether this could leak any sensitive information in your threat model. 15. No PII concerns detectedThe other traced data (hashes, heights, peer IDs, namespace IDs) are all technical metadata without personal information. Good. Integration16. Clean integration in server.go (pkg/rpc/server/server.go:385-389)The conditional wrapping based on 17. Dependency updates look reasonable (execution/grpc/go.mod, execution/grpc/go.sum)The dependency changes appear to be version bumps. No security concerns noted, but ensure you've reviewed the changelogs for breaking changes. Documentation18. Missing package-level documentationThe
Recommendation: Add a comprehensive package or file comment. 19. Function documentation is adequateIndividual functions have clear names and the decorator functions have brief comments. Could be expanded but acceptable. SummaryStrengths:
Must Fix:
Nice to Have: Recommendations Priority
Overall, this is quality work that meaningfully improves the observability of the RPC layer. The main concerns are the missing per-store attributes and code duplication, both of which should be addressed before merge. |
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces OpenTelemetry tracing for the Store, P2P, and Config services. The implementation uses a decorator pattern to wrap the existing service handlers, which is a good approach. The changes are well-tested, with a new test file covering the tracing logic. I've identified a few areas for improvement: a bug in attribute key generation, an opportunity to simplify the code in one of the traced methods, a suggestion to reduce code duplication across the tracing decorators, and a gap in the test coverage. Overall, this is a great addition for observability.
pkg/rpc/server/tracing.go
Outdated
| ) | ||
| for i, store := range res.Msg.Stores { | ||
| span.SetAttributes( | ||
| attribute.Int64("store_"+string(rune(i))+"_height", int64(store.Height)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bug in how the attribute key is constructed. string(rune(i)) will convert the integer index i into its corresponding Unicode character, not its string representation. For example, for i=0, it will produce a null character (\x00), which is likely not what you intend and may cause issues with tracing backends.
You should use strconv.Itoa(i) to convert the integer to a string. This will require adding an import for the strconv package.
attribute.Int64("store_"+strconv.Itoa(i)+"_height", int64(store.Height)),
pkg/rpc/server/tracing.go
Outdated
| var height uint64 | ||
| switch identifier := req.Msg.Identifier.(type) { | ||
| case *pb.GetBlockRequest_Height: | ||
| height = identifier.Height | ||
| case *pb.GetBlockRequest_Hash: | ||
| // for hash-based queries, we'll add the hash as an attribute | ||
| } | ||
|
|
||
| ctx, span := t.tracer.Start(ctx, "StoreService.GetBlock", | ||
| trace.WithAttributes( | ||
| attribute.Int64("height", int64(height)), | ||
| ), | ||
| ) | ||
| defer span.End() | ||
|
|
||
| // add hash attribute if hash-based query | ||
| if hashIdentifier, ok := req.Msg.Identifier.(*pb.GetBlockRequest_Hash); ok { | ||
| span.SetAttributes(attribute.String("hash", hex.EncodeToString(hashIdentifier.Hash))) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for setting span attributes based on the request identifier can be simplified. Currently, you're using a switch to get the height, and then a separate if to get the hash, which involves checking the type of req.Msg.Identifier twice. This can be done more cleanly in a single switch statement that builds the attributes before starting the span. This will make the code more readable and efficient.
var attrs []attribute.KeyValue
switch identifier := req.Msg.Identifier.(type) {
case *pb.GetBlockRequest_Height:
attrs = append(attrs, attribute.Int64("height", int64(identifier.Height)))
case *pb.GetBlockRequest_Hash:
if identifier.Hash != nil {
attrs = append(attrs, attribute.String("hash", hex.EncodeToString(identifier.Hash)))
}
}
ctx, span := t.tracer.Start(ctx, "StoreService.GetBlock", trace.WithAttributes(attrs...))
defer span.End()| res, err := t.inner.GetState(ctx, req) | ||
| if err != nil { | ||
| span.RecordError(err) | ||
| span.SetStatus(codes.Error, err.Error()) | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block of code for handling errors from the inner service call and updating the span is repeated in almost every traced method in this file. To improve maintainability and reduce boilerplate, consider extracting this logic into a generic helper function.
For example, you could create a function that takes the context, request, the inner function call, and returns the response and error, handling the span creation and error recording internally. Here's a conceptual example:
func traceUnary[Req, Res any](
tracer trace.Tracer,
ctx context.Context,
spanName string,
req *connect.Request[Req],
call func(context.Context, *connect.Request[Req]) (*connect.Response[Res], error),
// ... other params for attributes
) (*connect.Response[Res], error) {
ctx, span := tracer.Start(ctx, spanName)
defer span.End()
res, err := call(ctx, req)
if err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, err.Error())
return nil, err
}
// ... set success attributes
return res, nil
}Applying this pattern would make the tracing decorators much more concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having a bit of duplication isn't too bad to avoid a generics function that takes a function as an arg, makes things harder to reason about, but if anyone else wants this to be implemented I can do it.
| attrs := span.Attributes() | ||
| requireAttribute(t, attrs, "store_count", 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is missing assertions for the per-store attributes (store_0_height, store_1_height). The current test only checks for store_count. Adding assertions for individual store heights would have caught the attribute key generation bug in GetP2PStoreInfo and would make the test more robust.
attrs := span.Attributes()
requireAttribute(t, attrs, "store_count", 2)
requireAttribute(t, attrs, "store_0_height", int64(100))
requireAttribute(t, attrs, "store_1_height", int64(99))
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2972 +/- ##
==========================================
+ Coverage 57.77% 58.11% +0.34%
==========================================
Files 98 99 +1
Lines 9394 9543 +149
==========================================
+ Hits 5427 5546 +119
- Misses 3364 3388 +24
- Partials 603 609 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
da.test
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good catch, I branched off a previous PR which I removed this in but it stayed around in here apparently!
Overview
Adding additional tracing within
NewServiceHandler