Skip to content

Replace stdlib enum with something more performant #1557

@mdboom

Description

@mdboom

Python stdlib's enum library has notoriously poor performance. It would be nice to switch to something with fewer performance pitfalls. Measurement with Python 3.15's sampling profiler shows that approximately 23% of the time importing cuda.bindings.driver is spent on creating enums. There are also (smaller) overheads accessing them from Cython, since they are implemented in pure Python. (See #1543 -- there have reportedly been other performance issues over time).

The obvious alternative is to use Cython's cpdef enum syntax. It generates a C-style enum which is theoretically more performant, but for the Python exposure, it uses the same stdlib enum. This means we still have the massive performance penalty at import and on the Python side, and an additional layer of translation between the two. (Not to mention the fact we already have a simple C exposure generated in our cy* layer). Measurement of a prototype shows this actually has even worse performance, with enum creation time becoming 50% of import time.

It should be easy to create something that meets the basic needs of an enum type without these performance overheads. The tricky bit is that the API surface of stdlib enums is surprisingly large and includes some unusual things. We will need to decide which we care about and which we are willing to forego -- maybe doing an analysis of our user's code if possible.

Here is roughly the API surface we would need to cover (feel free to add if I missed something):

  • For the enum "container":
    • The __members__ attribute is a mapping from name to enum values
    • __contains__ works for base values, i.e. 5 in container
    • __getitem__ works for names, i.e. Container["ENUM_VALUE"]
    • __iter__ works to iterate over all enum values
    • __len__ returns the number of enumeration values
  • For the enum "values":
    • A helpful __repr__, e.g. <Container.VALUE_ONE: 0>
    • Inherits from int so numeric operations work, most importantly to bitwise or flags, e.g. BIT_FIELD_1 | BIT_FIELD_2
    • .value gives the underlying int, .name gives the name
    • All of the sibling enumeration values are available as members on each value. This is an unusual thing to do, but it's possible some of our users rely on it. e.g. y = Container.VALUE_ONE; assert y.VALUE_TWO == Container.VALUE_TWO
    • isinstance(value, Container) == True. This is kind of a weird implementation detail, but again, something our users may rely on.

Though the generated code carefully adds comments to each of the values, these comments don't make it into the extension at all. We have the opportunity to improve that here and make help(value) return a proper docstring for probably no performance penalty (other than binary size).

Metadata

Metadata

Assignees

No one assigned

    Labels

    cuda.bindingsEverything related to the cuda.bindings moduleenhancementAny code-related improvementstriageNeeds the team's attention

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions