feat: allow DataFrame.filter to accept SQL strings#1276
feat: allow DataFrame.filter to accept SQL strings#1276timsaucer merged 1 commit intoapache:mainfrom
Conversation
|
if @timsaucer agrees, can we expand the scope from filter and include other similar methods which are not to hard to implement, i think join_on has expression |
That being said, I am not at all opposed to evaluating other places in |
I missed that important case
@K-dash would you be interested in investigating ? |
|
FWIW I did a quick test with this: --- a/python/datafusion/dataframe.py
+++ b/python/datafusion/dataframe.py
@@ -424,7 +424,9 @@ class DataFrame:
df = df.select("a", col("b"), col("a").alias("alternate_a"))
"""
- exprs_internal = expr_list_to_raw_expr_list(exprs)
+ expr_list = [self.parse_sql_expr(e) if isinstance(e, str) else e for e in exprs]
+
+ exprs_internal = expr_list_to_raw_expr_list(expr_list)
return DataFrame(self.df.select(*exprs_internal))With that you can do |
|
Thanks for sharing the snippet—being able to call |
should we roll back df.select_expr and do this instead @timsaucer , it makes sense to me to do it |
Yes, but no. The problem with that snippet is that I think it will fail for people (like me) who have column names that are not sql parseable. They should still work as turning into a column expression. |
milenkovicm
left a comment
There was a problem hiding this comment.
I think this makes sense but lets wait for @timsaucer
timsaucer
left a comment
There was a problem hiding this comment.
Thank you @K-dash and @milenkovicm !
Which issue does this PR close?
Closes #1273
Rationale for this change
Users have requested Spark-like support for
DataFrame.filter("a > 1")so they can reuse existing SQL predicate strings without converting them to expression objects.What changes are included in this PR?
DataFrame.filterto normalize SQL string predicates viaparse_sql_exprbefore dispatching to the internal API.Are there any user-facing changes?
DataFrame.filternow accepts SQL string predicates in addition toExprobjects, and the documentation reflects this capability. No breaking API changes.