[2025秋季][T1-1-15] JoeLin2333 #913

JoeLin2333 · 2026-01-11T15:09:16Z

binary_cross_entropy_with_logits_metax_SUMMARY

binary_cross_entropy_with_logits_moore_SUMMARY

binary_cross_entropy_with_logits_nvidia_SUMMARY

binary_cross_entropy_with_logits_Tianshu_SUMMARY

[HONOR_CODE.md](https://github.com/user-attachments/files/24553025/HONOR_CODE.md)

[REFERENCE.md](https://github.com/user-attachments/files/24553026/REFERENCE.md)

Copilot

Pull request overview

This pull request implements five new operators (reciprocal, cdist, binary_cross_entropy_with_logits, atanh, and addcmul) across multiple hardware backends (NVIDIA, Moore, MetaX, CPU). The implementation includes comprehensive test coverage and follows the existing project patterns.

Changes:

Adds 5 new operators with full backend support (CPU, NVIDIA, Moore, MetaX)
Implements Python test files for each operator with multiple test cases
Adds C++ test infrastructure integration
Registers operators in the op_register.py file
Implements InfiniCore Python bindings for the new operators

Reviewed changes

Copilot reviewed 112 out of 112 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
test/infiniop/reciprocal.py	Test file for reciprocal operator with various shapes and inplace variants
test/infiniop/cdist.py	Test file for cdist operator with different norms (p=1.0, 2.0, inf)
test/infiniop/binary_cross_entropy_with_logits.py	Test file for BCE with logits supporting weight, pos_weight, and reduction
test/infiniop/atanh.py	Test file for atanh operator with value clamping for stability
test/infiniop/addcmul.py	Test file for addcmul operator with different scalar values
test/infiniop/libinfiniop/op_register.py	Operator registration with C API function signatures
src/infiniop/ops/*/operator.cc	Operator dispatchers for all backends
src/infiniop/ops//nvidia/.cu	NVIDIA CUDA kernel implementations
src/infiniop/ops//moore/.mu	Moore MUSA kernel implementations
src/infiniop/ops//metax/.maca	MetaX kernel implementations
src/infiniop/ops//cpu/.cc	CPU implementations with OpenMP parallelization
src/infiniop-test/src/ops/*.cpp	C++ test integration files
src/infinicore/pybind11/ops/*.hpp	Python binding definitions
src/infinicore/ops/*/reciprocal.cc	InfiniCore operator dispatch logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-11T15:14:18Z

src/infiniop/ops/reciprocal/metax/reciprocal_metax_kernel.h

+ * This file contains the Reciprocal operation implementation for the MUSA backend.
+ *
+ * It follows the consistent code structure to ensure alignment across different
+ * hardware platforms within the Moore Threads (MUSA) ecosystem.


The comment incorrectly refers to "MUSA backend" when this file is for MetaX. The comment should be updated to reflect the correct backend platform (MetaX) to avoid confusion.

Copilot · 2026-01-11T15:14:18Z

src/infiniop/ops/atanh/metax/atanh_metax_kernel.h

+ * This file contains the Atanh operation implementation for the MUSA backend.
+ *
+ * It follows the consistent code structure to ensure alignment across different
+ * hardware platforms within the Moore Threads (MUSA) ecosystem.


The comment incorrectly refers to "MUSA backend" when this file is for MetaX. The comment should be updated to reflect the correct backend platform (MetaX) to avoid confusion.

Copilot · 2026-01-11T15:14:18Z

src/infiniop/ops/addcmul/metax/addcmul_metax_kernel.h

+ * This file contains the Addcmul operation implementation for the MUSA backend.
+ * Formula: out = input + value * tensor1 * tensor2
+ */


The comment incorrectly refers to "MUSA backend" when this file is for MetaX. The comment should be updated to reflect the correct backend platform (MetaX) to avoid confusion.

Copilot · 2026-01-11T15:14:18Z

src/infinicore/pybind11/ops/binary_cross_entropy_with_logits.hpp

+    target: Tensor of the same shape as input with values between 0 and 1.
+    weight: Optional rescaling weight for each loss component.
+    pos_weight: Optional weight for positive examples (must be broadcastable).
+    reduction: Specfies the reduction to apply: 'none' | 'mean' | 'sum'.


Typo: "Specfies" should be "Specifies".

Copilot · 2026-01-11T15:14:19Z

src/infinicore/pybind11/ops/binary_cross_entropy_with_logits.hpp

+    target: Target tensor.
+    weight: Optional sample weight.
+    pos_weight: Optional positive class weight.
+    reduction: Specfies the reduction to apply.


Typo: "Specfies" should be "Specifies".

Copilot · 2026-01-11T15:14:20Z

test/infiniop/cdist.py

@@ -0,0 +1,156 @@
+import torch
+import ctypes
+from ctypes import c_uint64, c_float, c_double


Import of 'c_float' is not used.

Copilot · 2026-01-11T15:14:21Z

test/infiniop/libinfiniop/op_register.py


-from ctypes import c_int32, c_void_p, c_size_t, POINTER, c_float
-
+from ctypes import c_int32, c_void_p, c_size_t, POINTER, c_float, c_double, c_uint64


Import of 'c_uint64' is not used.

Copilot · 2026-01-11T15:14:21Z

...iniop/ops/binary_cross_entropy_with_logits/metax/binary_cross_entropy_with_logits_metax.maca

+constexpr int BCE_MAX_DIMS = 8;
+
+struct BCETensorInfoDevice {
+    size_t ndim;
+    size_t shape[BCE_MAX_DIMS];
+    ptrdiff_t strides[BCE_MAX_DIMS];
+};
+
+static inline BCETensorInfoDevice make_device_info(const BCETensorInfo &info) {
+    BCETensorInfoDevice dev{};
+    dev.ndim = info.ndim;
+    for (size_t i = 0; i < info.ndim && i < static_cast<size_t>(BCE_MAX_DIMS); ++i) {
+        dev.shape[i] = info.dims[i];
+        dev.strides[i] = info.stride[i];
+    }
+    return dev;


As with the other backends, BCETensorInfoDevice here stores shape/stride in arrays of length BCE_MAX_DIMS, yet make_device_info records dev.ndim = info.ndim even when info.ndim might exceed this limit. In that case, the subsequent calls to indexToOffset will iterate over dev.ndim and index out of bounds on shape/strides, leading to undefined behavior and possible out-of-bounds accesses to METAX device memory. Add a hard check that rejects tensors with ndim > BCE_MAX_DIMS (or otherwise constrain ndim used in the kernel) to ensure indexing stays within the allocated metadata arrays.

Copilot · 2026-01-11T15:14:21Z

...iniop/ops/binary_cross_entropy_with_logits/nvidia/binary_cross_entropy_with_logits_nvidia.cu

+constexpr int BCE_MAX_DIMS = 8;
+
+struct BCETensorInfoDevice {
+    size_t ndim;
+    size_t shape[BCE_MAX_DIMS];
+    ptrdiff_t strides[BCE_MAX_DIMS];
+};
+
+static inline BCETensorInfoDevice make_device_info(const BCETensorInfo &info) {
+    BCETensorInfoDevice dev{};
+    dev.ndim = info.ndim;
+    for (size_t i = 0; i < info.ndim && i < static_cast<size_t>(BCE_MAX_DIMS); ++i) {
+        dev.shape[i] = info.dims[i];
+        dev.strides[i] = info.stride[i];
+    }
+    return dev;


BCETensorInfoDevice stores shapes and strides in fixed-size arrays of length BCE_MAX_DIMS, but make_device_info copies info.ndim into dev.ndim without enforcing info.ndim <= BCE_MAX_DIMS. When info.ndim exceeds 8, later calls to indexToOffset using dev.ndim will index past the end of shape/strides, leading to undefined memory reads and potentially out-of-bounds device memory accesses when computing tensor offsets. Add an explicit check (and fail descriptor creation) when info.ndim is greater than BCE_MAX_DIMS, or clamp ndim consistently so the kernel never iterates beyond the initialized entries.

Copilot · 2026-01-11T15:14:21Z

...nfiniop/ops/binary_cross_entropy_with_logits/moore/binary_cross_entropy_with_logits_moore.mu

+constexpr int BCE_MAX_DIMS = 8;
+
+struct BCETensorInfoDevice {
+    size_t ndim;
+    size_t shape[BCE_MAX_DIMS];
+    ptrdiff_t strides[BCE_MAX_DIMS];
+};
+
+static inline BCETensorInfoDevice make_device_info(const BCETensorInfo &info) {
+    BCETensorInfoDevice dev{};
+    dev.ndim = info.ndim;
+    for (size_t i = 0; i < info.ndim && i < static_cast<size_t>(BCE_MAX_DIMS); ++i) {
+        dev.shape[i] = info.dims[i];
+        dev.strides[i] = info.stride[i];
+    }
+    return dev;


BCETensorInfoDevice in this backend also uses fixed-size shape/strides arrays of length BCE_MAX_DIMS, but make_device_info sets dev.ndim = info.ndim without enforcing that info.ndim is within this limit. If a tensor with more than 8 dimensions is passed, indexToOffset will iterate up to dev.ndim and read beyond the arrays, which can corrupt indexing calculations and cause out-of-bounds accesses to the BCE input/output buffers on the MUSA device. Guard against this by rejecting descriptors with ndim > BCE_MAX_DIMS (or otherwise bounding ndim) before launching the kernel.

T1-1-15

aafdd92

JoeLin2333 requested review from a team and Copilot January 11, 2026 15:09

Copilot started reviewing on behalf of JoeLin2333 January 11, 2026 15:09 View session

Copilot AI reviewed Jan 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[2025秋季][T1-1-15] JoeLin2333 #913

[2025秋季][T1-1-15] JoeLin2333 #913

Uh oh!

JoeLin2333 commented Jan 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		from ctypes import c_int32, c_void_p, c_size_t, POINTER, c_float

		from ctypes import c_int32, c_void_p, c_size_t, POINTER, c_float, c_double, c_uint64

[2025秋季][T1-1-15] JoeLin2333 #913

Are you sure you want to change the base?

[2025秋季][T1-1-15] JoeLin2333 #913

Uh oh!

Conversation

JoeLin2333 commented Jan 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant