Skip to content

Conversation

@fferrandis
Copy link

@fferrandis fferrandis commented Dec 15, 2025

This pull request introduces significant improvements and refactoring to the buffer matrix management for erasure coding, focusing on more efficient handling of data fragments, especially for cases where the last subgroup (stripe) is not a full block. The changes include a new buffer format, more flexible offset calculations, improved test coverage, and a new C header for backend interoperability.

Key changes:

Buffer Matrix Refactoring and New Format Support

  • Introduced a new BufferInfo struct to encapsulate buffer metadata, separated from the main BufferMatrix data structure, and refactored related logic for clarity and maintainability. Added methods to support both old and new buffer layouts, with explicit UseNewFormat and UseOldFormat setters and corresponding offset calculation logic. (buffer.go [1] [2] [3] [4] [5] [6]

  • Implemented logic to handle the last subgroup differently when the data size is not a multiple of the block size, ensuring correct data alignment and minimizing wasted space in the last fragments. (buffer.go [1] [2] [3]

Expanded and Improved Testing

  • Removed outdated or redundant tests and benchmarks, and added comprehensive new tests to cover the new buffer format, offset calculations, subgroup size computations, and block mapping. This ensures correctness of the new logic and robust coverage for edge cases. (buffer_test.go [1] [2] [3]

Backend Interoperability

  • Added a new C header file backend.h defining the encode_chunk_context structure and a suite of utility functions to facilitate interaction between Go and C code for erasure coding backends. (backend.h backend.hR1-R70)

Minor Improvements

  • Added a // nolint:gosec comment to suppress a linter warning for pointer arithmetic in memalign.go. (memalign.go memalign.goR9)

These changes collectively modernize the buffer management for erasure coding, improve performance for non-aligned data sizes, and lay groundwork for further backend integration.

References: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

@bert-e
Copy link

bert-e commented Dec 15, 2025

Hello fferrandis,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link

bert-e commented Dec 15, 2025

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

@fferrandis fferrandis force-pushed the feature/HD-4220-last-block-size branch 3 times, most recently from 159884f to 6c8187c Compare December 15, 2025 09:14
- separate C implementations in dedicated files
- improve encoding and decoding functionality
- add new tests
- improvements to the encoding and decoding processes
@fferrandis fferrandis force-pushed the feature/HD-4220-last-block-size branch from 6c8187c to 181e135 Compare December 15, 2025 09:59
- enable new buffer style
- rework all tests to only use *Matrix
@fferrandis fferrandis force-pushed the feature/HD-4220-last-block-size branch from 041a7c1 to 12234f8 Compare December 26, 2025 12:52
@fferrandis fferrandis force-pushed the feature/HD-4220-last-block-size branch 2 times, most recently from 4bbbda1 to 44ad647 Compare January 1, 2026 12:06
@fferrandis fferrandis force-pushed the feature/HD-4220-last-block-size branch from 44ad647 to fd5d424 Compare January 7, 2026 12:56
Store in EncodeData structure the realsize
With old format, it contains the FragLen itself
With new format, this is sticked to the real data.

In DecodeMatrix, we also manage the last stripe size to ensure we read the
header.ChunkSize value instead of reading piecesize bytes
@fferrandis fferrandis force-pushed the feature/HD-4220-last-block-size branch from fd5d424 to 5decc2c Compare January 7, 2026 12:59
@@ -0,0 +1,408 @@
#include <liberasurecode/erasurecode.h>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only a move from code that was directly inline in GO file to a proper dedicated C file.

max: params.MaxBlockSize,
}
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we dont use jerasure, this is removed. (this new feature can't work with jerasure anyway)

wg.Add(int(ctx.number_of_subgroup))

for i := 0; i < int(ctx.number_of_subgroup); i++ {
go func(nth int) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main diff of the encode part. We make a distinction when it comes to the last lines/stripes. All the stripes are supposed to be "chunksize" long, but not the last one because of the new implementation

C.my_liberasurecode_encode_buffermatrix_cleanup(
backend.libecDesc, C.size_t(ctx.frags_len), ctx.datas, ctx.codings)
}},
}, totLen},
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we store the real len of the slice. We could also reslice it to this len, but it was easier from the debug point of view to keep the slice with the same slice regardless of the usage of the new format, and store the real len alongside.

}}, nil
}

// EncodeMatrix encodes data in small subpart of chunkSize bytes
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EncodeMatrix isn't used in hdcontroller. So all the tests using it has been updated to replace the usage of EncodeMatrix by EncodeMatrixWithBufferMatrix


var totLen int64
for i := 0; i < chunkInfo.NrChunk; i++ {
vect := make([][]byte, len(frags))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when decoding stripes, we can't assume that all stripes are "chunksized long". When using the new format, the last stripe of a fragment is often smaller (that is the prupose of this PR and this new format). The nextbound is computed accordingly.

* .. [-[*]- -]
* p4 [-[*]- -]
*
*/
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is mainly variable renaming to help to understand what we are manipulating in the matrix. That why we have lines, celles and columns.

totalLines := (maxDataSize + lineSize - 1) / lineSize
isLastStripe := (lineEnd == totalLines-1)

/* When wrapping around, we read the full groups. */
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main change of GetRangeMatrix : we are going to read a full group (ie a full line of stripes) in case of two condition :

  • first one already existed
  • the second one is new and is true when data is in the last stripe line.
    As we cannot guess how the data is stored (using the old format or the new one) we chose to read all the stripes and then, based on the stripes header, get properly the data

Copy link
Author

@fferrandis fferrandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments to help the review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants