Optimize tasks for memory efficiency#41
Open
divyanshu-tiwari wants to merge 4 commits intocontext-at-pipeline-levelfrom
Open
Optimize tasks for memory efficiency#41divyanshu-tiwari wants to merge 4 commits intocontext-at-pipeline-levelfrom
divyanshu-tiwari wants to merge 4 commits intocontext-at-pipeline-levelfrom
Conversation
- use byte buffer pools for http response bodies - free-up memory once xpath processing is done - immediately close readers in file and http tasks to release resources - avoid unnecessary byte-string-byte conversions
Contributor
There was a problem hiding this comment.
Pull request overview
This PR targets memory/resource efficiency in pipeline tasks by reducing unnecessary allocations and ensuring readers/bodies are closed promptly during repeated operations.
Changes:
- HTTP task: introduces a pooled
bytes.Bufferfor JSON encoding in pagination logic, reuses a singlehttp.Clientacross retries, and switches internal response handling fromstringto[]byte. - File task: extracts file read+close logic into a helper to avoid
deferin a loop. - XPath task: clears
Record.Dataafter extraction to encourage earlier garbage collection.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| internal/pkg/pipeline/task/http/http.go | Adds buffer pooling and client reuse; changes internal response body representation to []byte. |
| internal/pkg/pipeline/task/file/file.go | Refactors file reading into a helper to ensure per-iteration closes. |
| internal/pkg/pipeline/task/xpath/xpath.go | Attempts to reduce memory retention by clearing input record data after parsing/extraction. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This pull request introduces optimizations and refactoring to improve resource management and performance in the file, HTTP, and XPath pipeline tasks. The main changes include improved buffer handling, better HTTP client reuse, and safer file reading patterns.
Resource management and performance optimizations:
sync.Poolforbytes.Bufferininternal/pkg/pipeline/task/http/http.goto reduce memory allocations during JSON encoding, and updated buffer handling in the pagination logic to use this pool. [1] [2] [3]internal/pkg/pipeline/task/http/http.goto instantiate the client once per call, enabling connection pooling and proxy reuse across retries. [1] [2]internal/pkg/pipeline/task/xpath/xpath.goto release the original HTML payload after processing, allowing buffers to be garbage collected sooner.Code refactoring for safer file reading:
internal/pkg/pipeline/task/file/file.goby introducing a helper functionreadFileContent, ensuring that file readers are closed promptly within loops. [1] [2]Types of changes
Checklist