Skip to content

Comments

Optimize tasks for memory efficiency#41

Open
divyanshu-tiwari wants to merge 4 commits intocontext-at-pipeline-levelfrom
optimise-tasks
Open

Optimize tasks for memory efficiency#41
divyanshu-tiwari wants to merge 4 commits intocontext-at-pipeline-levelfrom
optimise-tasks

Conversation

@divyanshu-tiwari
Copy link
Contributor

@divyanshu-tiwari divyanshu-tiwari commented Feb 11, 2026

Description

This pull request introduces optimizations and refactoring to improve resource management and performance in the file, HTTP, and XPath pipeline tasks. The main changes include improved buffer handling, better HTTP client reuse, and safer file reading patterns.

Resource management and performance optimizations:

  • Introduced a sync.Pool for bytes.Buffer in internal/pkg/pipeline/task/http/http.go to reduce memory allocations during JSON encoding, and updated buffer handling in the pagination logic to use this pool. [1] [2] [3]
  • Modified the HTTP client creation in internal/pkg/pipeline/task/http/http.go to instantiate the client once per call, enabling connection pooling and proxy reuse across retries. [1] [2]
  • Changed response body handling in HTTP calls to read and close the body immediately, reducing resource leaks and improving memory management.
  • Added logic in internal/pkg/pipeline/task/xpath/xpath.go to release the original HTML payload after processing, allowing buffers to be garbage collected sooner.

Code refactoring for safer file reading:

  • Refactored file reading in internal/pkg/pipeline/task/file/file.go by introducing a helper function readFileContent, ensuring that file readers are closed promptly within loops. [1] [2]

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation and I have updated the documentation accordingly.
  • I have added tests to cover my changes.

- use byte buffer pools for http response bodies
- free-up memory once xpath processing is done
- immediately close readers in file and http tasks to release resources
- avoid unnecessary byte-string-byte conversions
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets memory/resource efficiency in pipeline tasks by reducing unnecessary allocations and ensuring readers/bodies are closed promptly during repeated operations.

Changes:

  • HTTP task: introduces a pooled bytes.Buffer for JSON encoding in pagination logic, reuses a single http.Client across retries, and switches internal response handling from string to []byte.
  • File task: extracts file read+close logic into a helper to avoid defer in a loop.
  • XPath task: clears Record.Data after extraction to encourage earlier garbage collection.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
internal/pkg/pipeline/task/http/http.go Adds buffer pooling and client reuse; changes internal response body representation to []byte.
internal/pkg/pipeline/task/file/file.go Refactors file reading into a helper to ensure per-iteration closes.
internal/pkg/pipeline/task/xpath/xpath.go Attempts to reduce memory retention by clearing input record data after parsing/extraction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

divyanshu-tiwari and others added 2 commits February 12, 2026 10:07
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant