Skip to content

Conversation

@JoeLin2333
Copy link

Addcmul_metax_SUMMARY Addcmul_moore_SUMMARY Addcmul_nvidia_SUMMARY Addcmul_tianshu_SUMMARY_1 Addcmul_tianshu_SUMMARY_2 Atanh_metax_SUMMARY Atanh_moore_SUMMARY Atanh_nvidia_SUMMARY Atanh_Tianshu_SUMMARY binary_cross_entropy_with_logits_metax_SUMMARY binary_cross_entropy_with_logits_moore_SUMMARY binary_cross_entropy_with_logits_nvidia_SUMMARY binary_cross_entropy_with_logits_Tianshu_SUMMARY cdist_metax_SUMMARY cdist_moore_SUMMARY cdist_nvidia_SUMMARY cdist_tianshu_SUMMARY [HONOR_CODE.md](https://github.com/user-attachments/files/24553025/HONOR_CODE.md) metax_SUMMARY moore_SUMMARY nvidia_SUMMARY reciprocal_metax_SUMMARY reciprocal_moore_SUMMARY reciprocal_nvidia_SUMMARY reciprocal_tianshu_SUMMARY [REFERENCE.md](https://github.com/user-attachments/files/24553026/REFERENCE.md)

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements five new operators (reciprocal, cdist, binary_cross_entropy_with_logits, atanh, and addcmul) across multiple hardware backends (NVIDIA, Moore, MetaX, CPU). The implementation includes comprehensive test coverage and follows the existing project patterns.

Changes:

  • Adds 5 new operators with full backend support (CPU, NVIDIA, Moore, MetaX)
  • Implements Python test files for each operator with multiple test cases
  • Adds C++ test infrastructure integration
  • Registers operators in the op_register.py file
  • Implements InfiniCore Python bindings for the new operators

Reviewed changes

Copilot reviewed 112 out of 112 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
test/infiniop/reciprocal.py Test file for reciprocal operator with various shapes and inplace variants
test/infiniop/cdist.py Test file for cdist operator with different norms (p=1.0, 2.0, inf)
test/infiniop/binary_cross_entropy_with_logits.py Test file for BCE with logits supporting weight, pos_weight, and reduction
test/infiniop/atanh.py Test file for atanh operator with value clamping for stability
test/infiniop/addcmul.py Test file for addcmul operator with different scalar values
test/infiniop/libinfiniop/op_register.py Operator registration with C API function signatures
src/infiniop/ops/*/operator.cc Operator dispatchers for all backends
src/infiniop/ops//nvidia/.cu NVIDIA CUDA kernel implementations
src/infiniop/ops//moore/.mu Moore MUSA kernel implementations
src/infiniop/ops//metax/.maca MetaX kernel implementations
src/infiniop/ops//cpu/.cc CPU implementations with OpenMP parallelization
src/infiniop-test/src/ops/*.cpp C++ test integration files
src/infinicore/pybind11/ops/*.hpp Python binding definitions
src/infinicore/ops/*/reciprocal.cc InfiniCore operator dispatch logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +5 to +8
* This file contains the Reciprocal operation implementation for the MUSA backend.
*
* It follows the consistent code structure to ensure alignment across different
* hardware platforms within the Moore Threads (MUSA) ecosystem.
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment incorrectly refers to "MUSA backend" when this file is for MetaX. The comment should be updated to reflect the correct backend platform (MetaX) to avoid confusion.

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +8
* This file contains the Atanh operation implementation for the MUSA backend.
*
* It follows the consistent code structure to ensure alignment across different
* hardware platforms within the Moore Threads (MUSA) ecosystem.
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment incorrectly refers to "MUSA backend" when this file is for MetaX. The comment should be updated to reflect the correct backend platform (MetaX) to avoid confusion.

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +7
* This file contains the Addcmul operation implementation for the MUSA backend.
* Formula: out = input + value * tensor1 * tensor2
*/
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment incorrectly refers to "MUSA backend" when this file is for MetaX. The comment should be updated to reflect the correct backend platform (MetaX) to avoid confusion.

Copilot uses AI. Check for mistakes.
target: Tensor of the same shape as input with values between 0 and 1.
weight: Optional rescaling weight for each loss component.
pos_weight: Optional weight for positive examples (must be broadcastable).
reduction: Specfies the reduction to apply: 'none' | 'mean' | 'sum'.
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "Specfies" should be "Specifies".

Copilot uses AI. Check for mistakes.
target: Target tensor.
weight: Optional sample weight.
pos_weight: Optional positive class weight.
reduction: Specfies the reduction to apply.
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "Specfies" should be "Specifies".

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,156 @@
import torch
import ctypes
from ctypes import c_uint64, c_float, c_double
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'c_float' is not used.

Copilot uses AI. Check for mistakes.

from ctypes import c_int32, c_void_p, c_size_t, POINTER, c_float

from ctypes import c_int32, c_void_p, c_size_t, POINTER, c_float, c_double, c_uint64
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'c_uint64' is not used.

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +32
constexpr int BCE_MAX_DIMS = 8;

struct BCETensorInfoDevice {
size_t ndim;
size_t shape[BCE_MAX_DIMS];
ptrdiff_t strides[BCE_MAX_DIMS];
};

static inline BCETensorInfoDevice make_device_info(const BCETensorInfo &info) {
BCETensorInfoDevice dev{};
dev.ndim = info.ndim;
for (size_t i = 0; i < info.ndim && i < static_cast<size_t>(BCE_MAX_DIMS); ++i) {
dev.shape[i] = info.dims[i];
dev.strides[i] = info.stride[i];
}
return dev;
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with the other backends, BCETensorInfoDevice here stores shape/stride in arrays of length BCE_MAX_DIMS, yet make_device_info records dev.ndim = info.ndim even when info.ndim might exceed this limit. In that case, the subsequent calls to indexToOffset will iterate over dev.ndim and index out of bounds on shape/strides, leading to undefined behavior and possible out-of-bounds accesses to METAX device memory. Add a hard check that rejects tensors with ndim > BCE_MAX_DIMS (or otherwise constrain ndim used in the kernel) to ensure indexing stays within the allocated metadata arrays.

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +32
constexpr int BCE_MAX_DIMS = 8;

struct BCETensorInfoDevice {
size_t ndim;
size_t shape[BCE_MAX_DIMS];
ptrdiff_t strides[BCE_MAX_DIMS];
};

static inline BCETensorInfoDevice make_device_info(const BCETensorInfo &info) {
BCETensorInfoDevice dev{};
dev.ndim = info.ndim;
for (size_t i = 0; i < info.ndim && i < static_cast<size_t>(BCE_MAX_DIMS); ++i) {
dev.shape[i] = info.dims[i];
dev.strides[i] = info.stride[i];
}
return dev;
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BCETensorInfoDevice stores shapes and strides in fixed-size arrays of length BCE_MAX_DIMS, but make_device_info copies info.ndim into dev.ndim without enforcing info.ndim <= BCE_MAX_DIMS. When info.ndim exceeds 8, later calls to indexToOffset using dev.ndim will index past the end of shape/strides, leading to undefined memory reads and potentially out-of-bounds device memory accesses when computing tensor offsets. Add an explicit check (and fail descriptor creation) when info.ndim is greater than BCE_MAX_DIMS, or clamp ndim consistently so the kernel never iterates beyond the initialized entries.

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +30
constexpr int BCE_MAX_DIMS = 8;

struct BCETensorInfoDevice {
size_t ndim;
size_t shape[BCE_MAX_DIMS];
ptrdiff_t strides[BCE_MAX_DIMS];
};

static inline BCETensorInfoDevice make_device_info(const BCETensorInfo &info) {
BCETensorInfoDevice dev{};
dev.ndim = info.ndim;
for (size_t i = 0; i < info.ndim && i < static_cast<size_t>(BCE_MAX_DIMS); ++i) {
dev.shape[i] = info.dims[i];
dev.strides[i] = info.stride[i];
}
return dev;
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BCETensorInfoDevice in this backend also uses fixed-size shape/strides arrays of length BCE_MAX_DIMS, but make_device_info sets dev.ndim = info.ndim without enforcing that info.ndim is within this limit. If a tensor with more than 8 dimensions is passed, indexToOffset will iterate up to dev.ndim and read beyond the arrays, which can corrupt indexing calculations and cause out-of-bounds accesses to the BCE input/output buffers on the MUSA device. Guard against this by rejecting descriptors with ndim > BCE_MAX_DIMS (or otherwise bounding ndim) before launching the kernel.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant