-
Notifications
You must be signed in to change notification settings - Fork 98
[2025秋季][T1-1-15] JoeLin2333 #913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[2025秋季][T1-1-15] JoeLin2333 #913
Conversation
JoeLin2333
commented
Jan 11, 2026
[HONOR_CODE.md](https://github.com/user-attachments/files/24553025/HONOR_CODE.md)
[REFERENCE.md](https://github.com/user-attachments/files/24553026/REFERENCE.md)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request implements five new operators (reciprocal, cdist, binary_cross_entropy_with_logits, atanh, and addcmul) across multiple hardware backends (NVIDIA, Moore, MetaX, CPU). The implementation includes comprehensive test coverage and follows the existing project patterns.
Changes:
- Adds 5 new operators with full backend support (CPU, NVIDIA, Moore, MetaX)
- Implements Python test files for each operator with multiple test cases
- Adds C++ test infrastructure integration
- Registers operators in the op_register.py file
- Implements InfiniCore Python bindings for the new operators
Reviewed changes
Copilot reviewed 112 out of 112 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| test/infiniop/reciprocal.py | Test file for reciprocal operator with various shapes and inplace variants |
| test/infiniop/cdist.py | Test file for cdist operator with different norms (p=1.0, 2.0, inf) |
| test/infiniop/binary_cross_entropy_with_logits.py | Test file for BCE with logits supporting weight, pos_weight, and reduction |
| test/infiniop/atanh.py | Test file for atanh operator with value clamping for stability |
| test/infiniop/addcmul.py | Test file for addcmul operator with different scalar values |
| test/infiniop/libinfiniop/op_register.py | Operator registration with C API function signatures |
| src/infiniop/ops/*/operator.cc | Operator dispatchers for all backends |
| src/infiniop/ops//nvidia/.cu | NVIDIA CUDA kernel implementations |
| src/infiniop/ops//moore/.mu | Moore MUSA kernel implementations |
| src/infiniop/ops//metax/.maca | MetaX kernel implementations |
| src/infiniop/ops//cpu/.cc | CPU implementations with OpenMP parallelization |
| src/infiniop-test/src/ops/*.cpp | C++ test integration files |
| src/infinicore/pybind11/ops/*.hpp | Python binding definitions |
| src/infinicore/ops/*/reciprocal.cc | InfiniCore operator dispatch logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * This file contains the Reciprocal operation implementation for the MUSA backend. | ||
| * | ||
| * It follows the consistent code structure to ensure alignment across different | ||
| * hardware platforms within the Moore Threads (MUSA) ecosystem. |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment incorrectly refers to "MUSA backend" when this file is for MetaX. The comment should be updated to reflect the correct backend platform (MetaX) to avoid confusion.
| * This file contains the Atanh operation implementation for the MUSA backend. | ||
| * | ||
| * It follows the consistent code structure to ensure alignment across different | ||
| * hardware platforms within the Moore Threads (MUSA) ecosystem. |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment incorrectly refers to "MUSA backend" when this file is for MetaX. The comment should be updated to reflect the correct backend platform (MetaX) to avoid confusion.
| * This file contains the Addcmul operation implementation for the MUSA backend. | ||
| * Formula: out = input + value * tensor1 * tensor2 | ||
| */ |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment incorrectly refers to "MUSA backend" when this file is for MetaX. The comment should be updated to reflect the correct backend platform (MetaX) to avoid confusion.
| target: Tensor of the same shape as input with values between 0 and 1. | ||
| weight: Optional rescaling weight for each loss component. | ||
| pos_weight: Optional weight for positive examples (must be broadcastable). | ||
| reduction: Specfies the reduction to apply: 'none' | 'mean' | 'sum'. |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "Specfies" should be "Specifies".
| target: Target tensor. | ||
| weight: Optional sample weight. | ||
| pos_weight: Optional positive class weight. | ||
| reduction: Specfies the reduction to apply. |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "Specfies" should be "Specifies".
| @@ -0,0 +1,156 @@ | |||
| import torch | |||
| import ctypes | |||
| from ctypes import c_uint64, c_float, c_double | |||
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'c_float' is not used.
|
|
||
| from ctypes import c_int32, c_void_p, c_size_t, POINTER, c_float | ||
|
|
||
| from ctypes import c_int32, c_void_p, c_size_t, POINTER, c_float, c_double, c_uint64 |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'c_uint64' is not used.
| constexpr int BCE_MAX_DIMS = 8; | ||
|
|
||
| struct BCETensorInfoDevice { | ||
| size_t ndim; | ||
| size_t shape[BCE_MAX_DIMS]; | ||
| ptrdiff_t strides[BCE_MAX_DIMS]; | ||
| }; | ||
|
|
||
| static inline BCETensorInfoDevice make_device_info(const BCETensorInfo &info) { | ||
| BCETensorInfoDevice dev{}; | ||
| dev.ndim = info.ndim; | ||
| for (size_t i = 0; i < info.ndim && i < static_cast<size_t>(BCE_MAX_DIMS); ++i) { | ||
| dev.shape[i] = info.dims[i]; | ||
| dev.strides[i] = info.stride[i]; | ||
| } | ||
| return dev; |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with the other backends, BCETensorInfoDevice here stores shape/stride in arrays of length BCE_MAX_DIMS, yet make_device_info records dev.ndim = info.ndim even when info.ndim might exceed this limit. In that case, the subsequent calls to indexToOffset will iterate over dev.ndim and index out of bounds on shape/strides, leading to undefined behavior and possible out-of-bounds accesses to METAX device memory. Add a hard check that rejects tensors with ndim > BCE_MAX_DIMS (or otherwise constrain ndim used in the kernel) to ensure indexing stays within the allocated metadata arrays.
| constexpr int BCE_MAX_DIMS = 8; | ||
|
|
||
| struct BCETensorInfoDevice { | ||
| size_t ndim; | ||
| size_t shape[BCE_MAX_DIMS]; | ||
| ptrdiff_t strides[BCE_MAX_DIMS]; | ||
| }; | ||
|
|
||
| static inline BCETensorInfoDevice make_device_info(const BCETensorInfo &info) { | ||
| BCETensorInfoDevice dev{}; | ||
| dev.ndim = info.ndim; | ||
| for (size_t i = 0; i < info.ndim && i < static_cast<size_t>(BCE_MAX_DIMS); ++i) { | ||
| dev.shape[i] = info.dims[i]; | ||
| dev.strides[i] = info.stride[i]; | ||
| } | ||
| return dev; |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BCETensorInfoDevice stores shapes and strides in fixed-size arrays of length BCE_MAX_DIMS, but make_device_info copies info.ndim into dev.ndim without enforcing info.ndim <= BCE_MAX_DIMS. When info.ndim exceeds 8, later calls to indexToOffset using dev.ndim will index past the end of shape/strides, leading to undefined memory reads and potentially out-of-bounds device memory accesses when computing tensor offsets. Add an explicit check (and fail descriptor creation) when info.ndim is greater than BCE_MAX_DIMS, or clamp ndim consistently so the kernel never iterates beyond the initialized entries.
| constexpr int BCE_MAX_DIMS = 8; | ||
|
|
||
| struct BCETensorInfoDevice { | ||
| size_t ndim; | ||
| size_t shape[BCE_MAX_DIMS]; | ||
| ptrdiff_t strides[BCE_MAX_DIMS]; | ||
| }; | ||
|
|
||
| static inline BCETensorInfoDevice make_device_info(const BCETensorInfo &info) { | ||
| BCETensorInfoDevice dev{}; | ||
| dev.ndim = info.ndim; | ||
| for (size_t i = 0; i < info.ndim && i < static_cast<size_t>(BCE_MAX_DIMS); ++i) { | ||
| dev.shape[i] = info.dims[i]; | ||
| dev.strides[i] = info.stride[i]; | ||
| } | ||
| return dev; |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BCETensorInfoDevice in this backend also uses fixed-size shape/strides arrays of length BCE_MAX_DIMS, but make_device_info sets dev.ndim = info.ndim without enforcing that info.ndim is within this limit. If a tensor with more than 8 dimensions is passed, indexToOffset will iterate up to dev.ndim and read beyond the arrays, which can corrupt indexing calculations and cause out-of-bounds accesses to the BCE input/output buffers on the MUSA device. Guard against this by rejecting descriptors with ndim > BCE_MAX_DIMS (or otherwise bounding ndim) before launching the kernel.