Enable building and running on macOS #3711

taalexander · 2026-01-04T01:46:55Z

This PR adds macOS support for CUDA-Q, addressing platform-specific differences in linking, symbol visibility, shell compatibility, and library handling. The changes should enable building and running CUDA-Q on macOS with Apple Silicon (arm64) and Intel (x86_64) architectures with the test suite passing (outside of several minor limitations noted below).

This is a large PR and full Python support also requires Python wheels and CI enablement. I have structured this PR such that it should not impact existing Linux builds. I recommend we treat this as phase one and merge after review/passing CI. We will then follow up with Python wheel and CI PRs to complete support.

update: This PR is now based on #3693 to prepare for it's imminent merger

PRs:

Enable building and running on macOS #3711
Python wheel support for macOS
CI support for macOS

I've tried my best to summarize the contents of the PR below:

1. Build System (CMake)

Platform Detection & Configuration

Added macOS sysroot detection via xcrun --show-sdk-path for C++ stdlib headers
Configured platform-appropriate linker flags (--no-as-needed for Linux, alternatives for macOS)
Added CUDAQ_LIBCXX_PATH and CUDAQ_SYSROOT_PATH configuration for cross-platform header discovery
Updated rpath handling: @executable_path on macOS vs $ORIGIN on Linux
Changed CMAKE_INSTALL_RPATH to use semicolon separators on macOS

Library Linking Changes

Moved MLIR/LLVM dependencies from PUBLIC to PRIVATE in cudaq-mlir-runtime to reduce symbol visibility issues due to two-level namespace
Updated plugin extension handling: .dylib on macOS vs .so on Linux
Added platform-conditional linking for circular dependencies (--start-group not available on Apple ld)
Force two-level namespace linking for cudaq-common to prevent symbol collisions with OpenSSL

New `cudaq-utils` Library

Created a new low-level utilities library which resolves a circular dependency where cudaq-operator needs complex_matrix functions but is built before libcudaq

2. Two-Level Namespace Workarounds

macOS uses two-level namespace linking by default, where symbols are bound to specific libraries. This causes issues with LLVM/MLIR's static initializer pattern (PassRegistry, TargetRegistry, cl::Options).

Workarounds Implemented

Workaround	Location	Purpose
`flat_namespace` linker flag	`CMakeLists.txt`	Global symbol visibility
`force_load` for LLVM CodeGen	`cmake/BuildHelpers.cmake`	Ensures static initializers run in correct library context
`add_lib_loading_macos_workaround`	`cmake/BuildHelpers.cmake`	Helper for Python extension targets to ensure proper library loading order
Symbol unexport list	`lib/Support/Config/CMakeLists.txt`	Hides LLVM/MLIR symbols from `CUDAQTargetConfigUtil` to prevent symbol collisions
Explicit `InitializeNativeTarget`	`CUDAQuantumExtension.cpp`	macOS-only target registration for Python extension to workaround issues where the targets were registered to the wrong registry copy
Execution manager override API	`execution_manager.h`	Allows explicit manager setting across library boundaries to manage behaviour with execution manager default symbol resolution

Future Removal Pathway

In later versions of clang the DYLIB linking issues have been fixed to ensure MLIR library links will all be rerouted against the dylibs that are built. We should consider moving to these single MLIR/LLVM dylibs at this point to avoid multiple linkage issues.

3. Platform Portability Fixes

Type Size Differences

unsigned long is 8 bytes on macOS/arm64 but 4 bytes on some Linux systems so we switch usages to std::uint64_t
Updated CCTypes.cpp to use explicit size types where needed

Library Path Handling

TargetConfig.cpp: Handle .dylib vs .so extensions
fixup-linkage.cpp: Added handling for define weak linkage (macOS clang emits some functions with weak linkage)

Shell Compatibility (POSIX)

Replaced bash commands with POSIX-compatible alternatives:

|& → 2>&1 | for stderr piping
Fixed mktemp template usage (macOS requires XXXXXX suffix)
Updated shebang and array handling in shell scripts

Standard Library Differences

std::vector<bool> has different internal layout between libc++ and libstdc++
Added explicit extern char **environ declaration in MQPUUtils.cpp (POSIX requires explicit declaration on macOS)

Other

Updated Stim CMakeLists.txt to use platform-appropriate symbol hiding syntax

4. Third-Party Patches

Added patches in tpls/customizations/ for compatibility:

Patch	File	Purpose
LLVM idempotent option category	`llvm/idempotent_option_category.diff`	Makes `cl::OptionCategory` registration idempotent to handle multiple LLVM copies registering the same category (avoids assertion failures)
Pybind11 LTO flag fix	`pybind11/pybind11Common.cmake.diff`	Fixes pybind/pybind11#5098 - incorrect `-flto=` flag generation for Clang

Xtensor xio.hpp Workaround

Removed #include <xtensor/xio.hpp> in molecule.cpp and replaced with manual printing
Workaround for clang 17-18 template ambiguity with svector's rebind_container (LLVM #91504)

5. (Pre-existing) Bug Fixes

Note most of these would have been caught by static code analysis. The majority of the bugs were likely a result of more aggressive allocator on OSX.

File	Fix
`RegToMem.cpp`	Added proper `WalkResult` return after `op->erase()` to prevent iterator invalidation
`LoopUnrollPatterns.inc`	Fixed iterator handling in loop unrolling pattern
`ResetBeforeReuse.cpp`	Fixed stale pointer bug caused by canonicalization in Quantinuum pass
`CombineMeasurements.cpp`	Clarified success return for erased unused measurements
`QuakeToLLVM.cpp`	Fixed variadic argument duplication in controlled rotation codegen - rotation parameters were passed twice to `invokeRotationWithControlQubits`, causing crashes on ARM64 Darwin (masked on x86_64 due to ABI differences in variadic float handling)

6. Python Bindings

Added #ifdef __APPLE__ conditional for InitializeNativeTarget calls
Updated CMake to use add_lib_loading_macos_workaround for Python extension targets
Fixed Python virtual environment stdlib availability on macOS

7. Documentation & Developer Setup

Updated Dev_Setup.md for developer environment setup
Added requirements-dev.txt for Python development dependencies
Updated Building.md with platform-specific notes
Updated scripts/install_toolchain.sh and scripts/install_prerequisites.sh for macOS build instructions
Updated scripts/build_cudaq.sh with improved macOS build support
Changed OpenSSL build to use CMake on macOS to avoid pkg-config resolution issues with flat namespace

8. Test Updates

Updated test RUN lines to use platform-appropriate flags (calling_convention.cpp, infinite_loop.cpp, kernel exec transform tests)
Replaced .so with %cudaq_plugin_ext substitution for cross-platform tests
Added DISCOVERY_TIMEOUT 120 to backend unit tests (likely just required for my slow machine)
Split qvector_init_from_vector.cpp to separate large array test (qvector_init_large_array.cpp) which is skipped on macOS due to stack size
Fixed cudaq-qpud linking to enforce two-level namespace for braket backend tests

Known Limitations

Stack Size

macOS has a smaller default stack size (8MB) compared to Linux. Some tests with large stack allocations (e.g., large array initializations) may fail. The qvector_init_large_array.cpp test is currently skipped on macOS for this reason. Future work may address this via ulimit -s or code refactoring.

flat_namespace

The flat_namespace linker flag can cause symbol collisions with system libraries. We should work toward removing this in a follow-up PR.

LLVM cl::OptionCategory Duplicate Registration (Requires LLVM Patch)

Both libcudaq and cudaq-mlir-runtime link LLVM and therefore each contain their own copy of LLVM's cl::OptionCategory static globals. When both libraries are loaded, LLVM's default behavior asserts on duplicate category names.

The idempotent_option_category.diff patch makes registration idempotent, allowing the same category to be registered multiple times without assertion failures. This is a workaround—the proper fix would be to restructure the libraries so only one contains LLVM command-line infrastructure, but that requires more significant refactoring.

C++ Exception Handling in JIT-compiled Code (macOS ARM64)

On macOS ARM64 (Apple Silicon), C++ exceptions thrown from JIT-compiled code
cannot be caught by user code. The exception will terminate the program instead
of unwinding to the catch block. This is could be due to improper exception handling
and for now we have xfailed targettests/execution/estimate_resources_sample_in_choice.cpp
which explicitly tests this capability.

Testing

All tests passing on x86 with ctest --output-on-failure except one related to exception handling as detailed above and has been marked XFAIL.

end is Python. Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

The kernel builder implementation is still assuming it can just call some function whenever there is an apply_call, which is incorrect. As the apply_call could be calling a decorator, all the preconditions of a decorator call *must* be met, which entails resolving any lambda lifted arguments in the immediate context. Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

…ls.decorator [python redesign] Teach kernel builder how to call kernel decorators.

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Signed-off-by: Bettina Heim <heimb@outlook.com>

…omain-update [features/python.redesign] Refactor `cudaq.vqe` in `test_chemistry`

…n-match

…match [features/python.redesign] Port some of the old logic to handle simulator precision

For some reason, the test directories were split into two separate directory structures. This makes it confusing for maintenance and is just plain silly. This PR merges the two redundant subtrees. In the future, any PR that introduces new redundant subdirectories should be met with a "changes requested". Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

This patch fixed a codegen issue for when a closure contained a single value when lowering to QIR as the transport layer. Add regression test. Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

a downstream cascade of side-effects that results in a loop analysis failure and apply specialization being pessimistic. Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

…tion test. Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Signed-off-by: Bettina Heim <heimb@outlook.com>

…zation-fix [features/python.redesign] Fixes for subtle segfaults

Signed-off-by: Bettina Heim <heimb@outlook.com>

Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

…sion mismatch.

C++ exceptions thrown from JIT-compiled code cannot be caught on macOS ARM64 (Apple Silicon) due to a known upstream LLVM bug in libunwind. This affects features like estimate_resources when used with callbacks that invoke JIT-compiled kernels. Tests are marked XFAIL/UNSUPPORTED until the upstream issue is resolved. Upstream issue: llvm/llvm-project#49036 Added documentation of this limitation in Building.md.

The rotation parameter was being passed twice to invokeRotationWithControlQubits and invokeU3RotationWithControlQubits: once as a fixed argument, then again as part of the variadic arguments via funcArgs.append(instOperands.begin(), ...). The symptom ultimately that gave this away was encoded PI being observed as a pointer location. This caused crashes on ARM64 Darwin where all variadic args go on the stack - the extra parameter shifted every subsequent argument, causing va_arg to read the parameter's raw bits as pointers. On x86_64, this bug was masked because the ABI stores floating-point and integer variadic args in separate areas, so the extra double didn't affect pointer argument retrieval. The fix is to skip the already-added parameter(s) when appending variadic operands. A regression test has been added to ensure the parameter arguments are not added twice to the function call.

The Python redesign branch refactored pipeline APIs: - Removed createStatePreparation() - Renamed createLambdaLiftingPass() to createLambdaLifting() - Renamed createPreDeviceCodeLoaderPipeline() to createPythonAOTPipeline() Using the upstream version to match the new APIs.

- Update SimulationState.h: cudaq/utils/matrix.h -> cudaq/operators/matrix.h - Use upstream kernel_utils.h with common/DeviceCodeRegistry.h include

DeleteStates, ReplaceStateWithKernel, and StatePreparation passes were removed in the Python redesign branch. Remove their test files as well.

The Python redesign fixed the threading issue that caused pthread exhaustion on macOS. Dynamics tests now pass without skip markers. Signed-off-by: Thomas Alexander <talexander@nvidia.com>

taalexander · 2026-01-15T13:26:59Z

I have pulled out the bugs that were identified and fixed into separate PRs #3748 #3752 #3755 #3761.

schweitzpgi · 2026-01-15T17:00:49Z

You're right. It's not defined in a standard header file. My bad. From: Thomas Alexander ***@***.***> Sent: Thursday, January 15, 2026 8:53 AM To: NVIDIA/cuda-quantum ***@***.***> Cc: Eric Schweitz ***@***.***>; Review requested ***@***.***> Subject: Re: [NVIDIA/cuda-quantum] Enable building and running on macOS (PR #3711) @taalexander commented on this pull request.

________________________________ In runtime/cudaq/platform/mqpu/helpers/MQPUUtils.cpp<#3711 (comment)>:

@@ -25,6 +25,10 @@

#include "cuda_runtime_api.h" #endif +// On macOS, environ is not automatically declared; POSIX requires explicit +// declaration +extern char **environ; I'm certainly no expert on this but the POSIX standard<https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html> seems to just make this available as extern with no header? There is a header library which exists but it is mac specific and would require macros to inject (Which we could use although I think the current approach is better). - Reply to this email directly, view it on GitHub<#3711 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFIOX7KEHRMBTVVVJT5KZAL4G7AYFAVCNFSM6AAAAACQTZTOSWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTMNRWGYYTCNRZG4>. You are receiving this because your review was requested.Message ID: ***@***.******@***.***>>

schweitzpgi and others added 30 commits December 8, 2025 09:08

Should not be calling mergeAllCallableClosures() unless the front

f1926a4

end is Python. Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

Fixes to check lines.

2cc817f

Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

Fix more check lines.

77c7eb0

Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

More check lines.

f488f06

Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

More check fixes.

25772af

Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

Port some of the old logic to handle simulator precision

5d2b2ba

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Formatting.

b9c2ef7

Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

Endaround the spelling checker.

cd7ecc8

Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

Merge pull request NVIDIA#3667 from schweitzpgi/es-kernel.builder.cal…

805362f

…ls.decorator [python redesign] Teach kernel builder how to call kernel decorators.

cudaq.vqe is migrated to cuda-qx, use optimizer for the test

b0788cf

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

kernel features tests pass - one tests disabled due to crash

9c52235

Signed-off-by: Bettina Heim <heimb@outlook.com>

Merge pull request NVIDIA#3669 from 1tnguyen/tnguyen/test-chemistry-d…

63d5e6d

…omain-update [features/python.redesign] Refactor `cudaq.vqe` in `test_chemistry`

Merge branch 'features/python.redesign.0' into tnguyen/state-precisio…

5a0da1b

…n-match

Merge pull request NVIDIA#3668 from 1tnguyen/tnguyen/state-precision-…

2e0fff6

…match [features/python.redesign] Port some of the old logic to handle simulator precision

Fix for issue NVIDIA#3638. (NVIDIA#3649)

47e8a4d

This patch fixed a codegen issue for when a closure contained a single value when lowering to QIR as the transport layer. Add regression test. Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

Remove replace state with kernel pass.

50f1d29

Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

[NFC] cleanup the test. formatting, etc.

fa73a0e

Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

Flip the default of constant propagation when lambda lifting to prevent

5c58cc0

a downstream cascade of side-effects that results in a loop analysis failure and apply specialization being pessimistic. Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

Fix for C++ callable arguments, argument conversion, and phase_estima…

f7700ea

…tion test. Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

Guard against nullptr from quake::ApplyOp::getCallee

611aead

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

One more subtle bug: unsafe lambda capture by reference

9022899

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Remove remote-mqpu state test

c0e6a50

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

grabbing more from PR 3537

28755bd

Signed-off-by: Bettina Heim <heimb@outlook.com>

Merge branch 'features/python.redesign.0' into python_feature

bd22e2a

Merge pull request NVIDIA#3675 from 1tnguyen/tnguyen/apply-op-specili…

68ae8a8

…zation-fix [features/python.redesign] Fixes for subtle segfaults

addressing fixme in visit_name

1ab0f43

Signed-off-by: Bettina Heim <heimb@outlook.com>

Remove the state preparation pass. No longer supported.

63a2f5e

Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

Remove useless check of an unmaintained global dict.

f314c25

Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

taalexander added 17 commits January 12, 2026 15:21

Update tests for macOS arm.

eab58b4

Fix macos compiler version propagation to nvq++ for vendored LLVM ver…

a4dfa5b

…sion mismatch.

XFail JIT exception handling on MacOS arm.

45e245e

Remove unnecessary Cmake force loads.

bb076ad

Fix include paths after matrix.h move and registry refactor

8e886a7

- Update SimulationState.h: cudaq/utils/matrix.h -> cudaq/operators/matrix.h - Use upstream kernel_utils.h with common/DeviceCodeRegistry.h include

Use upstream py_alt_launch_kernel with Python redesign APIs

bb69f6e

Fix rebase bugs from improper rebasing.

d6a378b

Don't pass cuda cmake variables on macos.

0e9b927

Remove tests for passes deleted in Python redesign

02c36cf

DeleteStates, ReplaceStateWithKernel, and StatePreparation passes were removed in the Python redesign branch. Remove their test files as well.

Fix qalloc_initialization rebase regex issues.

2c27ba5

Fix additional tests that got incorrectly taken in the rebase.

3c81a26

Clang-format.

ef56022

Markdown linting.

c169058

Update allowlist.

7b99d9a

taalexander force-pushed the osx-cuda-quantum-support branch from 755e37f to 7b99d9a Compare January 13, 2026 14:16

Remove obsolete thread limits documentation

5a0afaf

The Python redesign fixed the threading issue that caused pthread exhaustion on macOS. Dynamics tests now pass without skip markers. Signed-off-by: Thomas Alexander <talexander@nvidia.com>

taalexander added 2 commits January 15, 2026 11:18

Remove macros in vector sizing.

43d69a5

Splice strings instead of make them macro dependent.

6619dee

taalexander added 2 commits January 15, 2026 13:12

Use simpler xtensor workaround.

a20b240

Add OpenMP support on mac.

b1c5cfb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable building and running on macOS #3711

Enable building and running on macOS #3711

Uh oh!

taalexander commented Jan 4, 2026 •

edited

Loading

Uh oh!

taalexander commented Jan 15, 2026

Uh oh!

schweitzpgi commented Jan 15, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Enable building and running on macOS #3711

Are you sure you want to change the base?

Enable building and running on macOS #3711

Uh oh!

Conversation

taalexander commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Build System (CMake)

Platform Detection & Configuration

Library Linking Changes

New cudaq-utils Library

2. Two-Level Namespace Workarounds

Workarounds Implemented

Future Removal Pathway

3. Platform Portability Fixes

Type Size Differences

Library Path Handling

Shell Compatibility (POSIX)

Standard Library Differences

Other

4. Third-Party Patches

Xtensor xio.hpp Workaround

5. (Pre-existing) Bug Fixes

6. Python Bindings

7. Documentation & Developer Setup

8. Test Updates

Known Limitations

Stack Size

flat_namespace

LLVM cl::OptionCategory Duplicate Registration (Requires LLVM Patch)

C++ Exception Handling in JIT-compiled Code (macOS ARM64)

Testing

Uh oh!

taalexander commented Jan 15, 2026

Uh oh!

schweitzpgi commented Jan 15, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

taalexander commented Jan 4, 2026 •

edited

Loading

New `cudaq-utils` Library