Add PyCapsule Type Support and Type Hint Enhancements for AggregateUDF in DataFusion Python Bindings#1277
Merged
kosiew merged 14 commits intoapache:mainfrom Nov 8, 2025
Merged
Conversation
Implement fallback for PyCapsule-backed providers, ensuring type checkers are satisfied without protocol-aware stubs. Update typing imports and cast PyCapsule inputs in AggregateUDF.from_pycapsule for precise constructor typing.
…zation signatures
Introduce a _PyCapsule typing protocol to enable type checkers to recognize PyCapsule-based registrations. Restrict the AggregateUDF udaf overload to the PyCapsule protocol and update from_pycapsule to wrap raw capsule inputs using the internal binding directly.
Introduce a utility to validate PyCapsules and convert them into reusable DataFusion aggregate UDFs. Update PyAggregateUDF.from_pycapsule to handle raw PyCapsule inputs, leverage the new helper, and maintain existing provider fallback and error handling.
kosiew
commented
Oct 15, 2025
Comment on lines
-35
to
-39
| r"\b(?:pub\s+)?(?:struct|enum)\s+" | ||
| r"(?P<name>[A-Za-z_][A-Za-z0-9_]*)", |
Contributor
Author
There was a problem hiding this comment.
not related to this PR but this came up as a Ruff error.
timsaucer
reviewed
Oct 28, 2025
python/datafusion/user_defined.py
Outdated
Comment on lines
48
to
49
| class _PyCapsule: | ||
| """Lightweight typing proxy for CPython ``PyCapsule`` objects.""" |
Contributor
Author
There was a problem hiding this comment.
_typeshed.CapsuleType only exists for static type checkers (TYPE_CHECKING), so inside the else: branch (runtime, not TYPE_CHECKING) we provide a lightweight runtime proxy to keep the _PyCapsule symbol defined.
Without the class _PyCapsule in the else branch, the following will error:
from typing import TYPE_CHECKING, TypeGuard
if TYPE_CHECKING:
from _typeshed import CapsuleType as _PyCapsule
def is_capsule(obj: object) -> TypeGuard[_PyCapsule]:
return hasattr(obj, "__capsule__")Traceback (most recent call last):
File "/Users/kosiew/GitHub/datafusion-python/examples/example_fail.py", line 6, in <module>
def is_capsule(obj: object) -> TypeGuard[_PyCapsule]:
^^^^^^^^^^
NameError: name '_PyCapsule' is not defined
Member
There was a problem hiding this comment.
Yes, but I think this is okay because if the same file appropriately sets future annotations it has no isse:
from __future__ import annotations
from typing import TYPE_CHECKING, TypeGuard
if TYPE_CHECKING:
from _typeshed import CapsuleType as _PyCapsule
def is_capsule(obj: object) -> TypeGuard[_PyCapsule]:
return hasattr(obj, "__capsule__")The first line prevents this error.
Contributor
Author
There was a problem hiding this comment.
Removed
class _PyCapsule
timsaucer
approved these changes
Nov 6, 2025
Contributor
Author
|
@timsaucer |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
udaffunction #1237Rationale for this change
The current
AggregateUDF.udafandAggregateUDF.from_pycapsulemethods in the DataFusion Python API lack proper type hinting and handling for CPythonPyCapsuleobjects. This omission causes static type checking tools (e.g., mypy) to fail when users register UDAFs originating from external providers such asgeodatafusion, even though the runtime behavior functions correctly.This PR addresses the gap by explicitly supporting PyCapsule types both in type hints and runtime checks. By doing so, it improves type safety, developer experience, and code clarity while maintaining full backward compatibility.
example from #1237
Before
After
What changes are included in this PR?
TypeGuardfunction_is_pycapsule()for lightweight PyCapsule type validation._PyCapsuleproxy class for static typing compatibility in non-type-checking contexts.AggregateUDF.__init__andAggregateUDF.udaf()to includeAggregateUDFExportable | _PyCapsuleargument types.AggregateUDF.from_pycapsule()to support direct PyCapsule initialization.PyAggregateUDF::from_pycapsule()logic to delegate PyCapsule validation to a new helper functionaggregate_udf_from_capsule()for cleaner handling.Are these changes tested?
Yes:
Are there any user-facing changes?
Yes, minor improvements:
These changes are fully backward-compatible and non-breaking for existing user code.