Conversation
|
This and #1253 appear to be duplicates. @kosiew and @ntjohnson1 do you have thoughts about merging or closing one? |
#1253 looks way more comprehensive so should be merged. Would potentially be nice to cherry-pick my 2nd, 3rd, and 4th commit from here to mark every existing class as frozen to make it less likely we add non-frozen classes in the future |
|
@ntjohnson1
I'll work on this. |
|
Closed in favor of #1253 |
|
It's unfortunate that both PRs were opened at the same time. Sorry about that duplicated work @ntjohnson1 |
…rrow errors (#1253) * Refactor schema, config, dataframe, and expression classes to use RwLock and Mutex for interior mutability * Add error handling to CaseBuilder methods to preserve builder state * Refactor to use parking_lot for interior mutability in schema, config, dataframe, and conditional expression modules * Add concurrency tests for SqlSchema, Config, and DataFrame * Add tests for CaseBuilder to ensure builder state is preserved on success * Add test for independent handles in CaseBuilder to verify behavior * Fix CaseBuilder to preserve state correctly in when() method * Refactor to use named constant for boolean literals in test_expr.py * fix ruff errors * Refactor to introduce type aliases for cached batches in dataframe.rs * Cherry pick from #1252 * Add most expr - cherry pick from #1252 * Add source root - cherry pick #1252 * Fix license comment formatting in config.rs * Refactor caching logic to use a local variable for IPython environment check * Add test for ensuring exposed pyclasses default to frozen * Add PyO3 class mutability guidelines reference to contributor guide * Mark boolean expression classes as frozen for immutability * Refactor PyCaseBuilder methods to eliminate redundant take/store logic * Refactor PyConfig methods to improve readability by encapsulating configuration reads * Resolve patch apply conflicts for CaseBuilder concurrency improvements - Added CaseBuilderHandle guard that keeps the underlying CaseBuilder alive while holding the mutex and restores it on drop - Updated when, otherwise, and end methods to operate through the guard and consume the builder explicitly - This prevents transient None states during concurrent access and improves thread safety * Resolve Config optimization conflicts for improved read/write concurrency - Released Config read guard before converting values to Python objects in get and get_all - Ensures locks are held only while collecting scalar entries, not during expensive Python object conversion - Added regression test that runs Config.get_all and Config.set concurrently to guard against read/write contention regressions - Improves overall performance by reducing lock contention in multi-threaded scenarios * Refactor PyConfig get methods for improved readability and performance * Refactor test_expr.py to replace positional boolean literals with named constants for improved linting compliance * fix ruff errors * Add license header to test_pyclass_frozen.py for compliance * Alternate approach to case expression * Replace case builter with keeping the expressions and then applying as required * Update unit tests * Refactor case and when functions to utilize PyCaseBuilder for improved clarity and functionality * Update src/expr/conditional_expr.rs --------- Co-authored-by: ntjohnson1 <24689722+ntjohnson1@users.noreply.github.com> Co-authored-by: Tim Saucer <timsaucer@gmail.com>
Which issue does this PR close?
Covers most of #1250
Rationale for this change
Previous PR fixed inability to use python threads around SessionContext #1248
Suggested that we mark all possible classes frozen to be more explicit/better thread support.
What changes are included in this PR?
Arc<Mutexas a strategy to manageSyncandfrozenRwLockis better since we might be reading configs more than writingpyo3[(get,set)]would require new wrapper code for each member if we think frozen makes senseArc<Mutexworks decently.I don't expect anything here to be a major performance delta. It wasn't clear if there was a good way to check that. The benchmarks look more interested in datafusion vs other tools rather than comparing branches.
Are there any user-facing changes?
There are no user-facing python changes. I'm don't think the wrapper code is considered public apis.