Support types other than String and Int for partition columns#1154
Support types other than String and Int for partition columns#1154timsaucer merged 6 commits intoapache:mainfrom
Conversation
timsaucer
left a comment
There was a problem hiding this comment.
Thank you! This is a very nice addition. I have one concern about the breaking change without giving users time to adjust. Otherwise I would like to get this into DF48 release.
python/datafusion/context.py
Outdated
| name: str, | ||
| path: str | pathlib.Path, | ||
| table_partition_cols: list[tuple[str, str]] | None = None, | ||
| table_partition_cols: list[tuple[str, pa.DataType]] | None = None, |
There was a problem hiding this comment.
To avoid a breaking change, we could make the type hint
table_partition_cols: list[tuple[str, pa.DataType]] | list[tuple[str, str]] | None = None
And then in the python code just below we can do type checking of the table_partition_cols and coerce to pyarrow data type. Alternatively we could overload the method signature and mark the old one as deprecated for a couple of releases. Either way we could avoid a breaking change without giving users the opportunity to upgrade at their schedule.
There was a problem hiding this comment.
That's fair, I opted for overloading and raising the deprecation warning
timsaucer
left a comment
There was a problem hiding this comment.
Thank you for an excellent contribution!
Which issue does this PR close?
Closes #1155
Rationale for this change
Following datafusion PR4221, this PR exposes the functionality in the python bindings
What changes are included in this PR?
Change of the
table_partition_colsarguments inPySessionContext.register_listing_table,PySessionContext.register_parquet,PySessionContext.register_avro,PySessionContext.register_json,PySessionContext.read_parquet,PySessionContext.read_avro,PySessionContext.read_jsonAre there any user-facing changes?
Yes, the signature of the functions above is changed