-
Notifications
You must be signed in to change notification settings - Fork 323
Enable building and running on macOS #3711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
taalexander
wants to merge
288
commits into
NVIDIA:main
Choose a base branch
from
taalexander:osx-cuda-quantum-support
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
end is Python. Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
The kernel builder implementation is still assuming it can just call some function whenever there is an apply_call, which is incorrect. As the apply_call could be calling a decorator, all the preconditions of a decorator call *must* be met, which entails resolving any lambda lifted arguments in the immediate context. Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>
Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
…ls.decorator [python redesign] Teach kernel builder how to call kernel decorators.
Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>
Signed-off-by: Bettina Heim <heimb@outlook.com>
…omain-update [features/python.redesign] Refactor `cudaq.vqe` in `test_chemistry`
…match [features/python.redesign] Port some of the old logic to handle simulator precision
For some reason, the test directories were split into two separate directory structures. This makes it confusing for maintenance and is just plain silly. This PR merges the two redundant subtrees. In the future, any PR that introduces new redundant subdirectories should be met with a "changes requested". Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
This patch fixed a codegen issue for when a closure contained a single value when lowering to QIR as the transport layer. Add regression test. Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
a downstream cascade of side-effects that results in a loop analysis failure and apply specialization being pessimistic. Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
…tion test. Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>
Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>
Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>
Signed-off-by: Bettina Heim <heimb@outlook.com>
…zation-fix [features/python.redesign] Fixes for subtle segfaults
Signed-off-by: Bettina Heim <heimb@outlook.com>
Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>
C++ exceptions thrown from JIT-compiled code cannot be caught on macOS ARM64 (Apple Silicon) due to a known upstream LLVM bug in libunwind. This affects features like estimate_resources when used with callbacks that invoke JIT-compiled kernels. Tests are marked XFAIL/UNSUPPORTED until the upstream issue is resolved. Upstream issue: llvm/llvm-project#49036 Added documentation of this limitation in Building.md.
The rotation parameter was being passed twice to invokeRotationWithControlQubits and invokeU3RotationWithControlQubits: once as a fixed argument, then again as part of the variadic arguments via funcArgs.append(instOperands.begin(), ...). The symptom ultimately that gave this away was encoded PI being observed as a pointer location. This caused crashes on ARM64 Darwin where all variadic args go on the stack - the extra parameter shifted every subsequent argument, causing va_arg to read the parameter's raw bits as pointers. On x86_64, this bug was masked because the ABI stores floating-point and integer variadic args in separate areas, so the extra double didn't affect pointer argument retrieval. The fix is to skip the already-added parameter(s) when appending variadic operands. A regression test has been added to ensure the parameter arguments are not added twice to the function call.
The Python redesign branch refactored pipeline APIs: - Removed createStatePreparation() - Renamed createLambdaLiftingPass() to createLambdaLifting() - Renamed createPreDeviceCodeLoaderPipeline() to createPythonAOTPipeline() Using the upstream version to match the new APIs.
- Update SimulationState.h: cudaq/utils/matrix.h -> cudaq/operators/matrix.h - Use upstream kernel_utils.h with common/DeviceCodeRegistry.h include
DeleteStates, ReplaceStateWithKernel, and StatePreparation passes were removed in the Python redesign branch. Remove their test files as well.
755e37f to
7b99d9a
Compare
The Python redesign fixed the threading issue that caused pthread exhaustion on macOS. Dynamics tests now pass without skip markers. Signed-off-by: Thomas Alexander <talexander@nvidia.com>
This was referenced Jan 14, 2026
Collaborator
Author
Collaborator
|
You're right. It's not defined in a standard header file. My bad.
From: Thomas Alexander ***@***.***>
Sent: Thursday, January 15, 2026 8:53 AM
To: NVIDIA/cuda-quantum ***@***.***>
Cc: Eric Schweitz ***@***.***>; Review requested ***@***.***>
Subject: Re: [NVIDIA/cuda-quantum] Enable building and running on macOS (PR #3711)
@taalexander commented on this pull request.
________________________________
In runtime/cudaq/platform/mqpu/helpers/MQPUUtils.cpp<#3711 (comment)>:
@@ -25,6 +25,10 @@
#include "cuda_runtime_api.h"
#endif
+// On macOS, environ is not automatically declared; POSIX requires explicit
+// declaration
+extern char **environ;
I'm certainly no expert on this but the POSIX standard<https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html> seems to just make this available as extern with no header?
There is a header library which exists but it is mac specific and would require macros to inject (Which we could use although I think the current approach is better).
-
Reply to this email directly, view it on GitHub<#3711 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFIOX7KEHRMBTVVVJT5KZAL4G7AYFAVCNFSM6AAAAACQTZTOSWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTMNRWGYYTCNRZG4>.
You are receiving this because your review was requested.Message ID: ***@***.******@***.***>>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds macOS support for CUDA-Q, addressing platform-specific differences in linking, symbol visibility, shell compatibility, and library handling. The changes should enable building and running CUDA-Q on macOS with Apple Silicon (arm64) and Intel (x86_64) architectures with the test suite passing (outside of several minor limitations noted below).
This is a large PR and full Python support also requires Python wheels and CI enablement. I have structured this PR such that it should not impact existing Linux builds. I recommend we treat this as phase one and merge after review/passing CI. We will then follow up with Python wheel and CI PRs to complete support.
update: This PR is now based on #3693 to prepare for it's imminent merger
PRs:
I've tried my best to summarize the contents of the PR below:
1. Build System (CMake)
Platform Detection & Configuration
xcrun --show-sdk-pathfor C++ stdlib headers--no-as-neededfor Linux, alternatives for macOS)CUDAQ_LIBCXX_PATHandCUDAQ_SYSROOT_PATHconfiguration for cross-platform header discovery@executable_pathon macOS vs$ORIGINon LinuxCMAKE_INSTALL_RPATHto use semicolon separators on macOSLibrary Linking Changes
cudaq-mlir-runtimeto reduce symbol visibility issues due to two-level namespace.dylibon macOS vs.soon Linux--start-groupnot available on Apple ld)cudaq-commonto prevent symbol collisions with OpenSSLNew
cudaq-utilsLibraryCreated a new low-level utilities library which resolves a circular dependency where
cudaq-operatorneedscomplex_matrixfunctions but is built beforelibcudaq2. Two-Level Namespace Workarounds
macOS uses two-level namespace linking by default, where symbols are bound to specific libraries. This causes issues with LLVM/MLIR's static initializer pattern (PassRegistry, TargetRegistry, cl::Options).
Workarounds Implemented
flat_namespacelinker flagCMakeLists.txtforce_loadfor LLVM CodeGencmake/BuildHelpers.cmakeadd_lib_loading_macos_workaroundcmake/BuildHelpers.cmakelib/Support/Config/CMakeLists.txtCUDAQTargetConfigUtilto prevent symbol collisionsInitializeNativeTargetCUDAQuantumExtension.cppexecution_manager.hFuture Removal Pathway
In later versions of clang the DYLIB linking issues have been fixed to ensure MLIR library links will all be rerouted against the dylibs that are built. We should consider moving to these single MLIR/LLVM dylibs at this point to avoid multiple linkage issues.
3. Platform Portability Fixes
Type Size Differences
unsigned longis 8 bytes on macOS/arm64 but 4 bytes on some Linux systems so we switch usages to std::uint64_tCCTypes.cppto use explicit size types where neededLibrary Path Handling
TargetConfig.cpp: Handle.dylibvs.soextensionsfixup-linkage.cpp: Added handling fordefine weaklinkage (macOS clang emits some functions with weak linkage)Shell Compatibility (POSIX)
Replaced bash commands with POSIX-compatible alternatives:
|&→2>&1 |for stderr pipingmktemptemplate usage (macOS requiresXXXXXXsuffix)Standard Library Differences
std::vector<bool>has different internal layout between libc++ and libstdc++extern char **environdeclaration inMQPUUtils.cpp(POSIX requires explicit declaration on macOS)Other
4. Third-Party Patches
Added patches in
tpls/customizations/for compatibility:llvm/idempotent_option_category.diffcl::OptionCategoryregistration idempotent to handle multiple LLVM copies registering the same category (avoids assertion failures)pybind11/pybind11Common.cmake.diff-flto=flag generation for ClangXtensor xio.hpp Workaround
#include <xtensor/xio.hpp>inmolecule.cppand replaced with manual printingrebind_container(LLVM #91504)5. (Pre-existing) Bug Fixes
Note most of these would have been caught by static code analysis. The majority of the bugs were likely a result of more aggressive allocator on OSX.
RegToMem.cppWalkResultreturn afterop->erase()to prevent iterator invalidationLoopUnrollPatterns.incResetBeforeReuse.cppCombineMeasurements.cppQuakeToLLVM.cppinvokeRotationWithControlQubits, causing crashes on ARM64 Darwin (masked on x86_64 due to ABI differences in variadic float handling)6. Python Bindings
#ifdef __APPLE__conditional forInitializeNativeTargetcallsadd_lib_loading_macos_workaroundfor Python extension targets7. Documentation & Developer Setup
Dev_Setup.mdfor developer environment setuprequirements-dev.txtfor Python development dependenciesBuilding.mdwith platform-specific notesscripts/install_toolchain.shandscripts/install_prerequisites.shfor macOS build instructionsscripts/build_cudaq.shwith improved macOS build support8. Test Updates
RUNlines to use platform-appropriate flags (calling_convention.cpp,infinite_loop.cpp, kernel exec transform tests).sowith%cudaq_plugin_extsubstitution for cross-platform testsDISCOVERY_TIMEOUT 120to backend unit tests (likely just required for my slow machine)qvector_init_from_vector.cppto separate large array test (qvector_init_large_array.cpp) which is skipped on macOS due to stack sizecudaq-qpudlinking to enforce two-level namespace for braket backend testsKnown Limitations
Stack Size
macOS has a smaller default stack size (8MB) compared to Linux. Some tests with large stack allocations (e.g., large array initializations) may fail. The
qvector_init_large_array.cpptest is currently skipped on macOS for this reason. Future work may address this viaulimit -sor code refactoring.flat_namespace
The
flat_namespacelinker flag can cause symbol collisions with system libraries. We should work toward removing this in a follow-up PR.LLVM cl::OptionCategory Duplicate Registration (Requires LLVM Patch)
Both
libcudaqandcudaq-mlir-runtimelink LLVM and therefore each contain their own copy of LLVM'scl::OptionCategorystatic globals. When both libraries are loaded, LLVM's default behavior asserts on duplicate category names.The
idempotent_option_category.diffpatch makes registration idempotent, allowing the same category to be registered multiple times without assertion failures. This is a workaround—the proper fix would be to restructure the libraries so only one contains LLVM command-line infrastructure, but that requires more significant refactoring.C++ Exception Handling in JIT-compiled Code (macOS ARM64)
On macOS ARM64 (Apple Silicon), C++ exceptions thrown from JIT-compiled code
cannot be caught by user code. The exception will terminate the program instead
of unwinding to the catch block. This is could be due to improper exception handling
and for now we have xfailed
targettests/execution/estimate_resources_sample_in_choice.cppwhich explicitly tests this capability.
Testing
All tests passing on x86 with
ctest --output-on-failureexcept one related to exception handling as detailed above and has been marked XFAIL.