-
Notifications
You must be signed in to change notification settings - Fork 181
Use Calcite's validation system for type checking & coercion #4892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds a Calcite-based validation pipeline and PPL-specific type-coercion/validation: introduces validator, coercion rules, convertlets, Rel<->Sql converters/shuttles, operator-table provider, and many UDF operand-metadata/type-utility changes. Removes legacy coercion/type-checker framework and updates large test fixtures from JSON to YAML. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client as Client
participant QS as QueryService
participant Rel as PPL RelNode
participant RelToSql as OpenSearchRelToSqlConverter
participant Val as PplValidator
participant SqlToRel as OpenSearchSqlToRelConverter
participant Planner as CalcitePlanner
Client->>QS: submit PPL query
QS->>Rel: build initial RelNode
QS->>RelToSql: convert RelNode → SqlNode
RelToSql-->>QS: SqlNode
QS->>Val: validate SqlNode (PplValidator + PplTypeCoercion + PplTypeCoercionRule)
alt validation succeeds
Val-->>QS: validated SqlNode
QS->>SqlToRel: convert validated SqlNode → RelNode
SqlToRel-->>QS: validated RelNode
else validation tolerant failure
Val-->>QS: tolerated exception
QS-->>QS: fallback to original RelNode
end
QS->>Planner: convert validated RelNode → executable plan
Planner-->>Client: execute plan / produce results
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
e085f81 to
fc6dd27
Compare
…checking Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> # Conflicts: # core/src/main/java/org/opensearch/sql/executor/QueryService.java # Conflicts: # core/src/main/java/org/opensearch/sql/executor/QueryService.java
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
… logics Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- 2 more ITs passed in PPLBuiltinFunctionIT Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> # Conflicts: # core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
- this fix testRand, where desrialization of sarg does not restore its type - todo: update the toRex in ExtendedRelJson to the align with the latest version Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…estamp; (time, timestamp) -> timestamp (1240/1599) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…2/1872) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- allow type cast - rewrite call to sql compare to custom ip comapre Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> # Conflicts: # core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…1356/1599 | 1476/1915) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…d in mvindex's implementation (1580/2015) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…iting (1579/2015) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…pe hint Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…e inference (1701/2015) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
|
This PR validate the whole SQL including type compatible, SQL grammer, field access and others. However what we need is only type compatibility since the relbuilder will not build an incorrect plan. I'm thinking if we can only convert recall to sqlnode and invoke validateCall to do that part of validation(should include type coercion), and then convert it back to recall. |
It's kind of tricky, the validation of these are deeply entangled. E.g. if a function call is at a very bottom level of the AST, the validator has to go down the AST and validate all the nodes it comes across to build a proper validation scope for the function call. If we want to try to build them for these function calls, we may need to replicate those steps, which does not differ much from the original validation. |
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/user/ppl/interfaces/endpoint.md (1)
56-64: Fix duplicate "Description" section and add documentation for SAFE_CAST behavior in explain output.The Explain section has a duplicate "### Description" heading at lines 58 and 61 that should be consolidated into a single introduction.
Additionally, while the examples show
SAFE_CASTin the logical plan output (e.g.,SAFE_CAST(30:BIGINT)), there is no explanation of what SAFE_CAST is, how it differs from CAST, or why it appears in the output. For users upgrading from previous versions and any automated tooling that parses explain output, clarification would be valuable:
- Explain that SAFE_CAST is used in logical plans for type-safe casting with malformed string handling
- Note that the physical plan shows the optimized form without SAFE_CAST (as seen in examples where logical shows
SAFE_CAST(30:BIGINT)but physical shows>($5, 30))- Consider a brief note that this represents a change in output format for users parsing explain responses
🤖 Fix all issues with AI agents
In @core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java:
- Around line 129-168: The getValidator method in CalcitePlanContext currently
calls connection.createStatement() without ensuring the Statement is closed;
change the try/catch that obtains CalciteServerStatement to a try-with-resources
that declares a Statement (e.g., try (Statement stmt =
connection.createStatement()) { CalciteServerStatement statement =
stmt.unwrap(CalciteServerStatement.class); ... } ), move the validator creation
into that try block so the Statement is closed automatically, and remove the old
catch/throw pattern while preserving the existing IllegalStateException and
validator initialization logic.
🧹 Nitpick comments (5)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java (1)
458-465: AddverifySchema()for consistency with other BETWEEN tests.All other BETWEEN tests in this file (e.g.,
testBetween,testBetweenWithDifferentTypes) verify the schema before checking data rows. This test should follow the same pattern for consistency.♻️ Proposed fix
@Test public void testBetweenWithMixedTypes() throws IOException { JSONObject actual = executeQuery( String.format( "source=%s | where age between '35' and 38 | fields firstname, age", TEST_INDEX_BANK)); + verifySchema(actual, schema("firstname", "string"), schema("age", "int")); verifyDataRows(actual, rows("Hattie", 36), rows("Elinor", 36)); }docs/user/ppl/interfaces/endpoint.md (1)
122-124: Clarify the different constant representations across plan formats.The same literal constant
30appears in three different forms across the extended format output:
- Logical plan:
SAFE_CAST(30:BIGINT)(line 122)- Physical plan:
30:BIGINT(line 123)- Extended Java code:
30L(line 124)While these representations serve different purposes (logical abstraction, physical optimization, runtime code), users may find the variations confusing without explanation.
Consider adding a brief explanation in the documentation about how the same operation is represented differently across plan stages, or reference Calcite's documentation on plan representations.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
1810-1818: Note the TODO comment regarding pushdown behavior.The TODO on line 1810 indicates that the chart with
useother=truescenario currently doesn't achieve pushdown as expected. This is flagged for future investigation.Consider creating a tracking issue for this if one doesn't already exist, to ensure it's addressed in a follow-up.
Would you like me to help create a GitHub issue to track this pushdown optimization opportunity?
core/src/main/java/org/opensearch/sql/executor/QueryService.java (1)
283-341: Well-structured validation pipeline with comprehensive JavaDoc.The
validatemethod correctly implements the RelNode → SqlNode → validate → RelNode round-trip as per PR objectives. A few observations:
Line 311:
Objects.requireNonNull(rewritten)will throw NPE ifSqlRewriteShuttlereturns null. Consider whether a more descriptive exception would aid debugging.Line 314: Silent fallback for tolerant validation exceptions may make debugging difficult. Consider adding debug-level logging when falling back.
The method is at the ~50-line threshold per coding guidelines. The current organization is logical, but if it grows further, consider extracting the
SqlToRelConverterconfiguration into a helper.💡 Optional: Add debug logging for validation fallback
} catch (CalciteContextException e) { if (ValidationUtils.tolerantValidationException(e)) { + log.debug("Tolerant validation fallback triggered: {}", e.getMessage()); return relNode; } throw new ExpressionEvaluationException(e.getMessage(), e); }core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)
84-93: Static operatorTableProvider requires careful initialization ordering.The static
operatorTableProvidermust be set before anyCalcitePlanContextcallsgetValidator(). TheIllegalStateExceptionat line 141-142 guards against this, but consider documenting the initialization requirement in the class JavaDoc.📝 Optional: Add class-level documentation for initialization order
/** * ...existing class doc... * * <p><b>Initialization:</b> {@link #setOperatorTableProvider(SqlOperatorTableProvider)} * must be called during module initialization before any context creates a validator. */ public class CalcitePlanContext {
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (76)
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.javacore/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.javacore/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javadocs/user/ppl/interfaces/endpoint.mdinteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.javainteg-test/src/test/resources/expectedOutput/calcite/access_struct_subfield_with_item.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/composite_date_histogram_daily.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/composite_terms.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/date_histogram_minute_agg.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/keyword_in_range.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/sort_keyword_can_match_shortcut.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/sort_keyword_no_can_match_shortcut.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_asc_with_match.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_desc_with_match.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q11.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q12.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q13.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q14.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q15.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q22.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q23.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q31.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q32.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q43.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_bin_aligntime.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_bin_span.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_eval_max.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_eval_min.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_fillnull_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_fillnull_value_syntax.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_time_string.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_push.yaml
✅ Files skipped from review due to trivial changes (1)
- integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
🚧 Files skipped from review as they are similar to previous changes (12)
- integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_fillnull_value_syntax.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_time_string.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_fillnull_push.yaml
- integ-test/src/test/resources/expectedOutput/calcite/big5/sort_keyword_no_can_match_shortcut.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_eval_max.yaml
- core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java
- integ-test/src/test/resources/expectedOutput/calcite/access_struct_subfield_with_item.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml
- integ-test/src/test/resources/expectedOutput/calcite/big5/composite_date_histogram_daily.yaml
🧰 Additional context used
📓 Path-based instructions (8)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javacore/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.javacore/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javacore/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.javacore/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
integ-test/**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
End-to-end scenarios need integration tests in
integ-test/module
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
⚙️ CodeRabbit configuration file
integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources
- Verify test data files exist in integ-test/src/test/resources/
- Check test assertions are meaningful and specific
- Validate tests clean up resources after execution
- Ensure tests are independent and can run in any order
- Flag tests that reference non-existent indices (e.g., EMP)
- Verify integration tests are in correct module (integ-test/)
- Check tests can be run with ./gradlew :integ-test:integTest
- Ensure proper test data setup and teardown
- Validate end-to-end scenario coverage
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
Name integration tests with
*IT.javasuffix in OpenSearch SQL
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify NULL input tests for all new functions
- Check boundary condition tests (min/max values, empty inputs)
- Validate error condition tests (invalid inputs, exceptions)
- Ensure multi-document tests for per-document operations
- Flag smoke tests without meaningful assertions
- Check test naming follows pattern: test
- Verify test data is realistic and covers edge cases
- Verify test coverage for new business logic
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javacore/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.javacore/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javacore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
integ-test/src/test/resources/**/*
⚙️ CodeRabbit configuration file
integ-test/src/test/resources/**/*: - Verify test data is realistic and representative
- Check data format matches expected schema
- Ensure test data covers edge cases and boundary conditions
Files:
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_eval_min.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/composite_terms.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q32.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_bin_span.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q31.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/keyword_in_range.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/date_histogram_minute_agg.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_asc_with_match.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q23.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q11.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q14.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q22.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q15.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q12.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/sort_keyword_can_match_shortcut.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q13.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_bin_aligntime.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_desc_with_match.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q43.yaml
core/src/main/java/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit
Files:
core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.javacore/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java: - Flag methods >50 lines - this file is known to be hard to read
- Suggest extracting complex logic into helper methods
- Check for code organization and logical grouping
- Validate all RelNode types are handled
Files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yamlcore/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.javainteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_eval_min.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/composite_terms.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q32.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1.yamlinteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.javainteg-test/src/test/resources/expectedOutput/calcite/explain_bin_span.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q31.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yamlcore/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javainteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yamlcore/src/main/java/org/opensearch/sql/executor/QueryService.javainteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/keyword_in_range.yamlcore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javainteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/date_histogram_minute_agg.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yamldocs/user/ppl/interfaces/endpoint.mdinteg-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_asc_with_match.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q23.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q11.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q14.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q22.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q15.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q12.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/sort_keyword_can_match_shortcut.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q13.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_bin_aligntime.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_desc_with_match.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q43.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javacore/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.javacore/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javacore/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.javacore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-29T05:32:11.893Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:11.893Z
Learning: In opensearch-project/sql, when creating custom Calcite RelNode classes that extend EnumerableLimitSort or other Calcite RelNode types, always override the `copy` method. Without overriding copy, the class will downgrade to its parent class type during copy operations, losing the custom implementation.
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yamlcore/src/main/java/org/opensearch/sql/executor/QueryService.javainteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yamldocs/user/ppl/interfaces/endpoint.mdinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yamldocs/user/ppl/interfaces/endpoint.mdinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yamlcore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javainteg-test/src/test/resources/expectedOutput/calcite/explain_bin_aligntime.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Development requires JDK 21 for the OpenSearch SQL project
Applied to files:
core/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
🧬 Code graph analysis (2)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java (1)
integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java (1)
MatcherUtils(43-502)
core/src/main/java/org/opensearch/sql/executor/QueryService.java (9)
core/src/main/java/org/opensearch/sql/calcite/validate/OpenSearchSparkSqlDialect.java (1)
OpenSearchSparkSqlDialect(28-113)core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplRelToSqlNodeConverter.java (1)
PplRelToSqlNodeConverter(36-172)core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplSqlToRelConverter.java (1)
PplSqlToRelConverter(24-92)core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/PplRelToSqlRelShuttle.java (1)
PplRelToSqlRelShuttle(28-83)core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SkipRelValidationShuttle.java (1)
SkipRelValidationShuttle(85-175)core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SqlRewriteShuttle.java (1)
SqlRewriteShuttle(21-69)common/src/main/java/org/opensearch/sql/common/utils/QueryContext.java (1)
QueryContext(18-87)core/src/main/java/org/opensearch/sql/exception/CalciteUnsupportedException.java (1)
CalciteUnsupportedException(8-17)core/src/main/java/org/opensearch/sql/exception/ExpressionEvaluationException.java (1)
ExpressionEvaluationException(9-18)
| public SqlValidator getValidator() { | ||
| if (validator == null) { | ||
| synchronized (this) { | ||
| // Double-Checked Locking for thread-safety | ||
| if (validator == null) { | ||
| final CalciteServerStatement statement; | ||
| try { | ||
| statement = connection.createStatement().unwrap(CalciteServerStatement.class); | ||
| } catch (SQLException e) { | ||
| throw new RuntimeException(e); | ||
| } | ||
| if (operatorTableProvider == null) { | ||
| throw new IllegalStateException( | ||
| "SqlOperatorTableProvider must be set before creating CalcitePlanContext"); | ||
| } | ||
| SqlValidator.Config validatorConfig = | ||
| SqlValidator.Config.DEFAULT | ||
| .withTypeCoercionRules(PplTypeCoercionRule.instance()) | ||
| .withTypeCoercionFactory(PplTypeCoercion::create) | ||
| // Use lenient conformance for PPL compatibility | ||
| .withConformance(OpenSearchSparkSqlDialect.DEFAULT.getConformance()) | ||
| // Use Spark SQL's NULL collation (NULLs sorted LOW/FIRST) | ||
| .withDefaultNullCollation(NullCollation.LOW) | ||
| // This ensures that coerced arguments are replaced with cast version in sql | ||
| // select list because coercion is performed during select list expansion during | ||
| // sql validation. Affects 4356.yml | ||
| // See SqlValidatorImpl#validateSelectList and AggConverter#translateAgg | ||
| .withIdentifierExpansion(true); | ||
| validator = | ||
| PplValidator.create( | ||
| statement, | ||
| config, | ||
| operatorTableProvider.getOperatorTable(), | ||
| TYPE_FACTORY, | ||
| validatorConfig); | ||
| } | ||
| } | ||
| } | ||
| return validator; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
rg -n "CalciteServerStatement" --type java -B 2 -A 2Repository: opensearch-project/sql
Length of output: 5211
🏁 Script executed:
# Check if CalciteServerStatement is defined in this codebase or imported from Calcite
rg -n "class CalciteServerStatement" --type javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check for try-with-resources usage with statements in similar code
rg -n "createStatement()" --type java -B 2 -A 2 | head -50Repository: opensearch-project/sql
Length of output: 4675
🏁 Script executed:
# Look at the imports and see where CalciteServerStatement comes from
head -50 core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javaRepository: opensearch-project/sql
Length of output: 2005
🏁 Script executed:
# Search for close() calls on statement variables
rg -n "statement\.close()" --type javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check CalciteToolsHelper usage after statement creation
sed -n '160,180p' core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.javaRepository: opensearch-project/sql
Length of output: 891
🏁 Script executed:
# Check how validator is used after it's created - is it long-lived?
sed -n '129,200p' core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javaRepository: opensearch-project/sql
Length of output: 2766
🏁 Script executed:
# Look at full method context of CalciteToolsHelper where statement is created
sed -n '140,180p' core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.javaRepository: opensearch-project/sql
Length of output: 1652
Wrap connection.createStatement() in try-with-resources for proper resource cleanup.
The CalciteServerStatement created on line 136 is never explicitly closed. While this pattern is used consistently elsewhere in the codebase (e.g., CalciteToolsHelper.java:168), the coding guidelines require using try-with-resources for proper resource cleanup. Either use try-with-resources or document why this Calcite framework object does not require explicit closing.
Example fix:
final CalciteServerStatement statement;
try (Statement stmt = connection.createStatement()) {
statement = stmt.unwrap(CalciteServerStatement.class);
// rest of initialization
}🤖 Prompt for AI Agents
In @core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java around
lines 129 - 168, The getValidator method in CalcitePlanContext currently calls
connection.createStatement() without ensuring the Statement is closed; change
the try/catch that obtains CalciteServerStatement to a try-with-resources that
declares a Statement (e.g., try (Statement stmt = connection.createStatement())
{ CalciteServerStatement statement = stmt.unwrap(CalciteServerStatement.class);
... } ), move the validator creation into that try block so the Statement is
closed automatically, and remove the old catch/throw pattern while preserving
the existing IllegalStateException and validator initialization logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/user/ppl/functions/collection.md (1)
165-201: Two TRANSFORM examples are appropriately disabled pending issue #4972, but note that MVMAP has the same pattern with three similarly disabled examples.The TRANSFORM function examples at lines 167–172 and 185–190 are marked with
ppl ignoreand reference issue #4972. This is a consistent pattern—other lambda-based functions like FORALL, EXISTS, FILTER, and REDUCE retain working examples, suggesting the issue specifically affects TRANSFORM and MVMAP transformations. Verify that issue #4972 adequately captures all affected examples (TRANSFORM: 2 examples; MVMAP: 3 examples at lines 819–824, 837–842, 859–864) to ensure comprehensive tracking.
🧹 Nitpick comments (1)
docs/user/ppl/functions/collection.md (1)
818-875: MVMAP examples disabled with same pattern as TRANSFORM.Three MVMAP examples are disabled, consistent with the TRANSFORM failures. Both functions involve array element transformations (lambdas for TRANSFORM, expressions for MVMAP).
Please verify that issue #4972 explicitly covers both TRANSFORM and MVMAP failures, as the root cause may be related to how the new Calcite validation handles array element-level operations.
Consider consolidating the TODO comments or creating a more descriptive issue title that clearly indicates "lambda and array transformation validation" to help future maintainers understand the scope.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/user/ppl/cmd/bin.mddocs/user/ppl/cmd/patterns.mddocs/user/ppl/functions/collection.mddocs/user/ppl/functions/conversion.md
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/user/ppl/cmd/patterns.md
- docs/user/ppl/functions/conversion.md
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (25, unit)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, doc)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (1)
docs/user/ppl/cmd/bin.md (1)
501-508: The example correctly uses theppl ignoredirective, which is the documented approach for marking code blocks that should be skipped during testing. This pattern is consistently used throughout the codebase (inlimitations.md,functions/collection.md, and others) to handle incomplete or broken examples—paired with HTML TODO comments explaining the issue. No changes are needed; the implementation follows established conventions.Likely an incorrect or invalid review comment.
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java (1)
1014-1017: Test assertion doesn't match documented intent.The test comment (lines 1004-1008) states it should verify "the error message clearly explains the limitation and suggests solutions," but the assertion now checks for an internal Calcite error (
resolving method 'minus[...]') that provides no user-friendly guidance.The TODO referencing issue #4973 acknowledges this is a temporary state. When addressing that issue, consider updating the error handling to surface a user-friendly message explaining that
binson timestamp fields requires pushdown to be enabled, rather than exposing internal Calcite method resolution failures.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
docs/user/ppl/cmd/bin.mddocs/user/ppl/cmd/patterns.mddocs/user/ppl/functions/collection.mddocs/user/ppl/functions/conversion.mdinteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.javainteg-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
- docs/user/ppl/cmd/bin.md
- docs/user/ppl/cmd/patterns.md
- integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
- docs/user/ppl/functions/conversion.md
- docs/user/ppl/functions/collection.md
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
integ-test/**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
End-to-end scenarios need integration tests in
integ-test/module
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
⚙️ CodeRabbit configuration file
integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources
- Verify test data files exist in integ-test/src/test/resources/
- Check test assertions are meaningful and specific
- Validate tests clean up resources after execution
- Ensure tests are independent and can run in any order
- Flag tests that reference non-existent indices (e.g., EMP)
- Verify integration tests are in correct module (integ-test/)
- Check tests can be run with ./gradlew :integ-test:integTest
- Ensure proper test data setup and teardown
- Validate end-to-end scenario coverage
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
Name integration tests with
*IT.javasuffix in OpenSearch SQL
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify NULL input tests for all new functions
- Check boundary condition tests (min/max values, empty inputs)
- Validate error condition tests (invalid inputs, exceptions)
- Ensure multi-document tests for per-document operations
- Flag smoke tests without meaningful assertions
- Check test naming follows pattern: test
- Verify test data is realistic and covers edge cases
- Verify test coverage for new business logic
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Verify AST generation for new PPL parser syntax
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (21, doc)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: test-sql-cli-integration (21)
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java (1)
1004-1017: Test assertion doesn't match stated verification intent.The comment states "Verify the error message clearly explains the limitation and suggests solutions" and the assertion message expects "clear error message about bins parameter requirements," but the actual check validates an internal Calcite error (
resolving method 'minus[class java.lang.String, class java.lang.String]'). This cryptic message won't help users understand that they need to enable pushdown for timestamp binning.The TODO referencing #4973 acknowledges this, but consider updating the assertion message (lines 1011-1013) to reflect what's actually being checked, so the test failure message is accurate if the internal error changes:
assertTrue( - "Expected clear error message about bins parameter requirements on timestamp fields, but" - + " got: " + "Expected Calcite method resolution error (tracked by #4973 for proper user message), but" + + " got: " + errorMessage,
🤖 Fix all issues with AI agents
In @docs/user/ppl/cmd/bin.md:
- Around line 501-502: Example 20 in the docs contains a non-functional ppl code
block marked only with an HTML TODO comment; either remove the entire Example 20
block until issue #4973 is resolved or make the limitation obvious to users by
replacing the HTML comment with a visible admonition (warning note) immediately
before the ```ppl ignore``` block that states the feature is not supported and
references issue #4973; ensure the visible note uses the same documentation
styling (admonition/warning) as other docs so users see it when rendered.
🧹 Nitpick comments (5)
integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_big_range_big_term_query.yaml (1)
1-1: Consider removing the leading blank line.YAML files typically don't start with a blank line. While this won't affect parsing, it's inconsistent with standard YAML formatting conventions.
Suggested fix
- calcite:integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml (1)
12-15: Physical plan demonstrates proper pushdown optimization.The expected physical plan correctly shows:
- Dual-level
EnumerableLimitfor system limit (10000) and user pagination (offset 100, fetch 10)- Aggregation pushed to OpenSearch with
multi_termssized at 110 to satisfyoffset + fetch- Boolean query structure combining term, range, script, and exists filters
One observation: the embedded base64-encoded scripts make it difficult to verify the serialized expressions are correct if they need future updates. Consider adding a comment in the test or documentation explaining how to decode/encode these blocks for maintenance purposes.
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml (1)
1-1: Minor: Leading empty line in YAML file.The file begins with an empty line. While this doesn't affect parsing, it's unconventional for YAML files. Consider removing for consistency with other expected output files, or ignore if this is a generated artifact.
integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml (1)
1-1: Unnecessary leading empty line.The empty line at the start of the file is not needed and could be removed for cleaner formatting.
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml (1)
1-1: Minor: Spurious blank line at file start.An empty line at the beginning of the YAML file is unnecessary and could be removed for cleaner formatting, though it won't affect functionality.
🧹 Suggested fix
- calcite: logical: |
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (45)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javadocs/user/ppl/cmd/bin.mddocs/user/ppl/cmd/patterns.mddocs/user/ppl/functions/collection.mddocs/user/ppl/functions/conversion.mddocs/user/ppl/interfaces/endpoint.mdinteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.javainteg-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_big_range_big_term_query.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_big_term_query.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_small_term_query.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/range_field_disjunction_big_range_small_term_query.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/range_numeric.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_true_not_pushed.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_text_type_no_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_agg_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_function_script_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_script_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_then_limit_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_10_filter_5_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_then_filter_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multisearch_basic.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_dedup_not_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_output.yaml
✅ Files skipped from review due to trivial changes (3)
- integ-test/src/test/resources/expectedOutput/calcite/explain_filter_agg_push.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_dedup_not_push.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_filter.yaml
🚧 Files skipped from review as they are similar to previous changes (22)
- integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_true_not_pushed.yaml
- docs/user/ppl/cmd/patterns.md
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_filter_function_script_push.yaml
- docs/user/ppl/functions/conversion.md
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having1.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having2.yaml
- docs/user/ppl/interfaces/endpoint.md
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yaml
- core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_text_type_no_push.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_filter_script_push.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_filter_then_limit_push.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_limit_10_filter_5_push.yaml
- integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_big_term_query.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
- integ-test/src/test/resources/expectedOutput/calcite/big5/range_numeric.yaml
🧰 Additional context used
📓 Path-based instructions (6)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
integ-test/**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
End-to-end scenarios need integration tests in
integ-test/module
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
⚙️ CodeRabbit configuration file
integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources
- Verify test data files exist in integ-test/src/test/resources/
- Check test assertions are meaningful and specific
- Validate tests clean up resources after execution
- Ensure tests are independent and can run in any order
- Flag tests that reference non-existent indices (e.g., EMP)
- Verify integration tests are in correct module (integ-test/)
- Check tests can be run with ./gradlew :integ-test:integTest
- Ensure proper test data setup and teardown
- Validate end-to-end scenario coverage
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
Name integration tests with
*IT.javasuffix in OpenSearch SQL
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify NULL input tests for all new functions
- Check boundary condition tests (min/max values, empty inputs)
- Validate error condition tests (invalid inputs, exceptions)
- Ensure multi-document tests for per-document operations
- Flag smoke tests without meaningful assertions
- Check test naming follows pattern: test
- Verify test data is realistic and covers edge cases
- Verify test coverage for new business logic
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
integ-test/src/test/resources/**/*
⚙️ CodeRabbit configuration file
integ-test/src/test/resources/**/*: - Verify test data is realistic and representative
- Check data format matches expected schema
- Ensure test data covers edge cases and boundary conditions
Files:
integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_then_filter_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_big_range_big_term_query.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multisearch_basic.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/range_field_disjunction_big_range_small_term_query.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_output.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_small_term_query.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.javainteg-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_then_filter_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_big_range_big_term_query.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multisearch_basic.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/range_field_disjunction_big_range_small_term_query.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_output.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_small_term_query.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multisearch_basic.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_output.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having3.yaml
📚 Learning: 2025-12-29T05:32:11.893Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:11.893Z
Learning: In opensearch-project/sql, when creating custom Calcite RelNode classes that extend EnumerableLimitSort or other Calcite RelNode types, always override the `copy` method. Without overriding copy, the class will downgrade to its parent class type during copy operations, losing the custom implementation.
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: WhiteSource Security Check
🔇 Additional comments (31)
integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml (2)
7-7: LGTM! ExplicitNULLS FIRSTaligns with PR objectives.The addition of
NULLS FIRSTto the window function correctly reflects the PR's goal of preserving null direction during RelNode ↔ SqlNode round trips. This makes the null ordering behavior explicit in the explain plan output, which is consistent with Calcite's validation system behavior.The test data is realistic and representative of the expected Calcite explain output format.
13-13: Physical plan update is consistent with logical plan changes.The physical plan correctly mirrors the logical plan's null ordering semantics. The OpenSearch aggregation query DSL with
"order":[{"_count":"desc"},{"_key":"asc"}]is the appropriate translation for the rare/top operation with the preserved null direction.integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_basic.yaml (1)
1-23: LGTM! Type annotation simplification aligns with Calcite validation changes.The removal of explicit
:VARCHARannotations from string literals ('young','adult') is consistent with the PR's migration to Calcite's type validation system. The plan semantics remain correct:
- Boundary conditions are properly handled (age < 30 for 'young', age >= 30 for 'adult' with no overlap).
- Filter pushdown to OpenSearch index scans is preserved.
- Aggregation and projection logic is intact.
integ-test/src/test/resources/expectedOutput/calcite/explain_limit_then_filter_push.yaml (2)
7-8: LGTM - LogicalProject addition is expected from SqlNode round-trip.The explicit
LogicalProjectwrapping the index scan is a consequence of the RelNode → SqlNode → RelNode validation round-trip introduced by this PR. The physical plan shows this doesn't impact optimization effectiveness—the push-down still correctly projects only[age]and appliesLIMIT->5at the scan level.
11-11: LGTM - Explicit BIGINT typing reflects Calcite's type coercion.The change from
30to30:BIGINTis the expected result of Calcite's validation system adding explicit type information to literals during coercion. This ensures type-safe comparison with theagefield.integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_big_range_big_term_query.yaml (1)
3-10: LGTM!The test data is well-structured and covers the expected behavior for the Calcite integration:
- The logical plan correctly represents the query structure with explicit type-safe range bounds (
>=($28, 1), <=($28, 100)).- The physical plan properly reflects the typed Sarg representation (
Sarg[[1L:BIGINT..100L:BIGINT]]:BIGINT), aligning with the PR's type coercion objectives.- The OpenSearch query builder correctly translates to a boolean query with term and range clauses, including boundary values.
- Edge cases are covered with inclusive range boundaries.
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml (1)
1-11: Test data structure looks correct for dedup expression scenario.The expected output covers:
- Logical plan: Properly demonstrates dedup via
ROW_NUMBER() OVER (PARTITION BY new_gender, new_state)with filter<= 2to keep top 2 rows per partition.- Physical plan: Shows aggregation pushdown to OpenSearch with composite buckets on the partition keys and
top_hitsto retrieve actual documents.A few observations:
- The physical plan on line 11 is extremely long (single line with Base64-encoded scripts), which is typical for these files but makes diff reviews difficult.
- The Base64 payloads encode
LOWERfunction metadata forgender.keywordandstate.keywordfields—consistent with theLOWER($4)andLOWER($7)expressions in the logical plan.Given the PR objectives mention "Investigate dedup optimization failure" as remaining work, this alternative expected output file likely represents an acceptable plan shape during the investigation. Please confirm this test case aligns with the expected behavior after the Calcite validation/coercion changes.
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml (1)
1-11: LGTM - Expected output correctly captures dedup-with-expressions pushdown.The test data is realistic and representative:
- Uses typical account index fields (account_number, firstname, balance, gender, state, etc.)
- Validates the ROW_NUMBER-based dedup logic with
keepempty=2(condition<=($19, 2))- Physical plan correctly shows composite aggregation pushdown with LOWER function scripts for
new_genderandnew_statetop_hitssize of 2 andmissing_bucket=falseproperly align with the logical plan's dedup semantics and IS NOT NULL filtersThe base64-encoded script content is expected for Calcite's serialized script handling.
integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_disjunction_big_range_small_term_query.yaml (2)
1-8: LGTM! Logical plan correctly represents the disjunction query.The expected output properly captures the OR condition between a term match and a numeric range. The plan structure (SystemLimit → Sort → Project → Filter → IndexScan) is appropriate for this query pattern.
Minor: Line 1 adds an empty line at the start of the file, which is a minor formatting inconsistency but doesn't affect functionality.
9-10: LGTM! Physical plan correctly pushes down the disjunction to OpenSearch.The pushdown context properly translates the OR condition:
- Calcite's
Sarg[[1L:BIGINT..100L:BIGINT]]representation for the range is appropriate- The OpenSearch DSL correctly uses
bool.shouldto handle the disjunction with bothtermandrangeclauses- Predicate pushdown is successfully demonstrated, which aligns with the PR objectives for Calcite validation integration
This test case effectively covers the edge case of mixed query types (term + range) in a disjunction.
docs/user/ppl/functions/collection.md (2)
165-201: LGTM! Properly documented known issue.The TRANSFORM examples are consistently marked with TODO comments and
ppl ignoreblocks, referencing issue #4972 for tracking. The expected outputs are preserved, which will facilitate re-enabling these examples once the issue is resolved.
818-875: LGTM! Consistent handling of MVMAP examples.All three MVMAP examples are uniformly marked with TODO comments and
ppl ignoreblocks, referencing the same issue #4972. The approach is consistent with the TRANSFORM examples and preserves all expected outputs for future validation.integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_small_term_query.yaml (1)
1-10: LGTM! Test data is realistic and covers expected edge cases.The expected output correctly validates the Calcite validation/coercion pipeline:
- The logical plan reflects the Sort → Project → Filter ordering consistent with PR objectives.
- The filter predicate uses compound range semantics (
AND(>=($28, 10), <=($28, 20))) and OR disjunction.- Field indices in the physical plan (
$13→metrics.size,$15→aws.cloudwatch.log_stream) correctly map to the projected fields.- The
Sarg[[10L:BIGINT..20L:BIGINT]]:BIGINTnotation aligns with explicit type-casting introduced by the validation system.The numeric bounds appearing as
10.0/20.0(floats) in the ES query vs10L:BIGINTin the Sarg is expected behavior for OpenSearch compatibility.integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml (2)
7-11: Redundant LogicalProject nodes appear in the plan.Lines 7 and 9 contain identical
LogicalProjectnodes projecting all 17 fields. This redundancy likely results from the RelNode → SqlNode → RelNode round-trip introduced by the Calcite validation phase. While functionally correct, this may indicate a missed optimization opportunity.Given the PR TODOs mention "Investigate dedup optimization failure," this is expected behavior for now and should be addressed as part of that follow-up.
Confirm the redundant projections don't impact query performance in production and are handled by downstream optimization passes.
13-13: BIGINT-typed Sarg representation is correct and consistent.The physical plan correctly uses
Sarg[(30L:BIGINT..40L:BIGINT)]:BIGINTfor the age range filter, and the JSONsourceBuildercorrectly translates this to:
"include_lower":false, "include_upper":falsematching the exclusive bounds(30..40)- Balance filter
>($0, 10000)correctly maps to"include_lower":falseThe field indexing after
PROJECT->[balance, age]is also correct:$0= balance,$1= age.integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having3.yaml (1)
4-13: LGTM! Type coercion changes correctly reflect Calcite validation behavior.The updated expected output properly captures the new Calcite-based type coercion:
Logical plan (line 5): The explicit
CAST(1000):DOUBLE NOT NULLfor comparing againstnew_avg(derived fromAVG, which returnsDOUBLE) is the expected implicit cast insertion from Calcite's type coercion.Physical plan (line 12): Typed literals (
1000.0E0:DOUBLE,1:BIGINT) correctly match the operand types of the comparisons.Consistency: The second comparison
>($4, 1)doesn't require a cast in the logical plan sinceCOUNT()returnsBIGINTand the literal1is compatible (typed as1:BIGINTin the physical plan).Test data is realistic, covers the type coercion edge case for numeric comparisons, and aligns with the PR's objective of migrating to Calcite's validation system.
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml (1)
1-11: Logical plan structure looks correct.The expected logical plan covers realistic analytics query patterns including:
- Pagination with offset/limit and system query size limits
- NULL handling with
IS NOT NULLpredicates- Complex filter conditions with OR clauses and date range comparisons
- Aggregation with COUNT() grouped by multiple fields
- SAFE_CAST usage aligns with the PR's type coercion approach for handling type mismatches
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml (2)
3-10: Logical plan structure is correct and consistent.The dedup plan correctly:
- Partitions
ROW_NUMBER()by$4, $5(new_gender, new_state) which are the LOWER expression results- Filters nulls on partition keys before windowing to ensure correct dedup behavior
- Maintains sort order for deterministic dedup results
The column references align properly with the projection schema defined in line 9.
11-12: Physical plan pushdown is consistent with logical plan.The OpenSearchRequestBuilder correctly reflects:
- Composite aggregation on
new_gender/new_statewith LOWER scripts matching the logical projectiontop_hitswithsize:2matching the dedup limit condition<=($6, 2)- Proper
_sourceincludes andscript_fieldsfor computed columnsThe test data is realistic and representative of Calcite's explain output for dedup with expressions.
integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml (2)
1-10: LGTM! Logical plan structure is consistent and correctly represents dedup semantics.The column indices are properly aligned:
$28correctly referencesmetrics.sizefor both the partition key and NULL filter$45correctly references_row_number_dedup_for the dedup filter (<= 1)- Outer projection indices match the inner projection field positions
The plan correctly handles the edge case of NULL partition keys with
IS NOT NULL($28)before windowing.
12-13: Physical plan pushdown correctly implements dedup semantics.The OpenSearch aggregation structure is appropriate:
- Composite aggregation with
top_hits(size=1)correctly deduplicates by taking first row permetrics.sizepartition_sourceincludes field list matches the logical projectionmissing_bucket:falsecorrectly excludes NULL partition keys (aligned with theIS NOT NULLfilter in logical plan)integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml (2)
5-9: Field indices correctly reflect added derived field.The field index shifts ([$14]-[$26] and join condition =($13, $20)) are consistent with the addition of the
$f13derived field in line 7. TheCAST(REX_EXTRACT(...)):VARCHARwrapper aligns with the PR's Calcite validation system that automatically inserts type casts during coercion.
20-21: Physical plan correctly propagates CAST to pushdown context.The
EnumerableCalcandCalciteEnumerableIndexScancorrectly incorporate theCAST(REX_EXTRACT(...)):VARCHARin both the projection and theSORT_EXPRpushdown. This ensures the sort-merge join operates on type-coerced values, consistent with the logical plan.integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml (2)
6-14: Logical plan structure looks correct.The plan properly handles:
- Empty string filtering on
Referer(line 13)- Domain extraction via
REGEXP_REPLACEwith pattern^https?://(?:www\.)?([^/]+)/.*$(line 12)- NULL filtering on computed
kfield (line 11)- Aggregation with
AVG,COUNT,MIN(line 9)- Post-aggregation filtering on count > 100000 (line 7)
The wide projection at line 12 covering all 111 clickbench fields is expected for the hits table schema.
18-19: Physical plan correctly pushes aggregation to OpenSearch.The pushdown context appropriately includes:
- Filter pushdown:
<>($0, '')and existence check on Referer- Composite aggregation for group-by on computed
k(domain extraction)- Sub-aggregations:
avgwithCHAR_LENGTHscript,top_hitsformin(Referer)The base64-encoded Calcite scripts enable server-side evaluation of
REGEXP_REPLACEandCHAR_LENGTHexpressions, which is the expected pattern for complex expression pushdown.integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml (2)
9-14: LGTM!The logical plan structure is internally consistent. Column references correctly trace through the plan:
- Aggregate outputs
state@0, city@1, avg_age@2- Project reorders to
avg_age@0, state@1, age2@2- Sort on
$1correctly sorts bystateThe test data covers relevant scenarios including aggregation, computed columns (
age2=[+($2, 2)]), NULL handling (ASC-nulls-first,IS NOT NULL), and deduplication with window functions.
21-21: LGTM!The physical plan correctly pushes down the query to OpenSearch:
- Filter
>($2, 30)maps to therangequery onage- Composite aggregation with
stateandcitybuckets matches the logical groupingSORT->[0 ASC FIRST]aligns with"order":"asc"and"missing_order":"first"on the state bucket- The
avg_agemetric aggregation is correctly nestedThe expected output accurately represents the Calcite enumerable index scan path with appropriate pushdown context.
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml (3)
6-14: Logical plan structure looks correct for the Calcite validation changes.The restructured plan correctly:
- Reorders projections to place the final column mapping (
l=[$0], c=[$1], CounterID=[$2]) after filtering.- Applies the filter
>($1, 100000)on the aggregated count before final projection.- Expands all 111 fields in the intermediate
LogicalProject(line 12) to preserve field indices during validation.- Maintains consistent field index references:
$103forCounterIDand$26forURL.The plan flow (scan → filter non-empty URL → project → filter non-null CounterID → project subset → aggregate → reproject → filter count → final project) is logically sound.
18-18: ExplicitBIGINTtype annotation reflects Calcite type coercion.The change from an implicit integer literal to
100000:BIGINTis expected behavior from the Calcite validation system, which now performs explicit type annotation during plan generation. This aligns with the PR's objective of migrating type checking/coercion to Calcite's validation system.
19-19: Script payload is valid and properly encoded.The Base64-encoded script decodes successfully to a valid JSON structure that correctly represents the
CHAR_LENGTHoperation with adynamicParam: 0reference. TheDIGESTSarray["URL"]properly maps to this dynamic parameter at index 0, confirming the field binding is correct.docs/user/ppl/cmd/bin.md (1)
503-508: No changes required. The example is already correctly marked as not-yet-implemented with an explicit TODO comment (<!-- TODO: Enable after fixing https://github.com/opensearch-project/sql/issues/4973 -->) and theignoreflag on the code block. This is the proper way to document planned features that are not yet functional.
I have updated a profiling result in the PR description. It's profiled with this script. Based on the result, the validation layer brings no significant performance degration. |
|
Hi @ykmr1224 , do you have any concern over this PR? I foresee that it will bring some impact to the schema-on-read solution, be it in a helpful or negative way. I'm worrying that this PR will make your solution difficult to proceed. |
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (3)
184-197: UnusedString.formatargument.The format string has only two placeholders (
%s) but three arguments are provided. The thirdTEST_INDEX_WORK_INFORMATIONargument at line 196 is never used.🔧 Suggested fix
assertYamlEqualsIgnoreId( expected, explainQueryYaml( String.format( "source = %s" + "| eval count_dept = [" + " source = %s | stats count(name)" + " ]" + "| fields name, count_dept", - TEST_INDEX_WORKER, TEST_INDEX_WORK_INFORMATION, TEST_INDEX_WORK_INFORMATION))); + TEST_INDEX_WORKER, TEST_INDEX_WORK_INFORMATION)));
199-212: UnusedString.formatargument.Same issue as above - the format string has only two
%splaceholders but three arguments are provided at line 211.🔧 Suggested fix
assertYamlEqualsIgnoreId( expected, explainQueryYaml( String.format( "source = %s" + "| where id > [" + " source = %s | stats count(name)" + " ] + 999" + "| fields name", - TEST_INDEX_WORKER, TEST_INDEX_WORK_INFORMATION, TEST_INDEX_WORK_INFORMATION))); + TEST_INDEX_WORKER, TEST_INDEX_WORK_INFORMATION)));
2016-2037: Missing@Testannotation - test will not execute.The
testComplexDedupmethod is missing the@Testannotation, so it will not be executed as part of the test suite.🔧 Suggested fix
+ @Test public void testComplexDedup() throws IOException { enabledOnlyWhenPushdownIsEnabled();
🧹 Nitpick comments (4)
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yaml (1)
11-12: Physical plan correctly pushes down aggregation and scripted fields.The physical plan appropriately translates to OpenSearch:
- Composite aggregation with
top_hitssize=2 matching the dedup count- Script fields for computed columns with correct DIGESTS referencing source fields (
gender.keyword,state.keyword)- Source includes/excludes properly configured
The single-line format for the physical plan JSON makes diff reviews challenging. Consider using YAML block scalar with literal style and formatting the JSON across multiple lines for improved readability and easier change tracking in future PRs—though this is a low-priority suggestion if exact-match testing requires this format.
integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java (1)
469-500: LGTM - well-structured overload following existing patterns.The implementation correctly:
- Validates non-empty input with a clear exception message
- Short-circuits on first successful match
- Propagates the last failure for diagnostic purposes
- Has comprehensive JavaDoc
The pattern is consistent with the existing two-argument overload at lines 460-467.
🔧 Optional: Add null validation for defensive programming
For better error messages when debugging test failures, consider adding a null check:
public static void assertYamlEqualsIgnoreId(List<String> expectedYamls, String actualYaml) { + if (expectedYamls == null) { + throw new IllegalArgumentException("expectedYamls must not be null"); + } if (expectedYamls.isEmpty()) { throw new IllegalArgumentException("expectedYamls list must not be empty"); }This is optional since the existing methods in this file don't perform extensive null checks either, and test utility callers are expected to provide valid inputs.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (2)
1811-1818: TODO comment indicates a known regression.The comment "TODO: Fix -- now it gets away from push-down" suggests that push-down optimization is not working as expected for this case. Consider creating an issue to track this.
Would you like me to open a GitHub issue to track this push-down optimization regression for the
chartcommand withuseother=true?
2066-2074: Non-deterministic plan output handling.The use of
List.of()with multiple expected plan files (explain_dedup_expr4.yaml, alternatives) handles legitimate optimizer non-determinism. This is an acceptable pattern, but consider documenting why multiple valid outputs exist (e.g., different but equivalent join orderings, filter placements).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.javainteg-test/src/test/java/org/opensearch/sql/util/MatcherUtils.javainteg-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative2.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
- integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java
🧰 Additional context used
📓 Path-based instructions (6)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
integ-test/**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
End-to-end scenarios need integration tests in
integ-test/module
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⚙️ CodeRabbit configuration file
integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources
- Verify test data files exist in integ-test/src/test/resources/
- Check test assertions are meaningful and specific
- Validate tests clean up resources after execution
- Ensure tests are independent and can run in any order
- Flag tests that reference non-existent indices (e.g., EMP)
- Verify integration tests are in correct module (integ-test/)
- Check tests can be run with ./gradlew :integ-test:integTest
- Ensure proper test data setup and teardown
- Validate end-to-end scenario coverage
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
Name integration tests with
*IT.javasuffix in OpenSearch SQL
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify NULL input tests for all new functions
- Check boundary condition tests (min/max values, empty inputs)
- Validate error condition tests (invalid inputs, exceptions)
- Ensure multi-document tests for per-document operations
- Flag smoke tests without meaningful assertions
- Check test naming follows pattern: test
- Verify test data is realistic and covers edge cases
- Verify test coverage for new business logic
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
integ-test/src/test/resources/**/*
⚙️ CodeRabbit configuration file
integ-test/src/test/resources/**/*: - Verify test data is realistic and representative
- Check data format matches expected schema
- Ensure test data covers edge cases and boundary conditions
Files:
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push.yaml
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (21, unit)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, doc)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (9)
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative1.yaml (1)
1-12: LGTM! Well-structured expected output for dedup-with-expressions scenario.The logical plan correctly models:
- ROW_NUMBER windowing partitioned by computed expressions (
$4,$5=new_gender,new_statefrom LOWER)- Null filtering on partition keys before dedup
- Sort order preservation (ASC-nulls-first, DESC-nulls-last)
The physical plan appropriately shows:
- Composite aggregation with scripted fields for the computed partition keys
- top_hits with size=2 matching the
<= 2dedup filter- Base64-encoded calcite scripts for OpenSearch execution
This alternative1 variant correctly demonstrates dedup partitioning on expression-derived columns.
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative2.yaml (1)
1-12: LGTM! Correctly represents an alternative dedup optimization path.This alternative2 variant differs from alternative1 in a meaningful way:
- Logical plan:
PARTITION BY $1, $3(originalgender,statefields) vs alternative1's$4, $5(computednew_gender,new_state)- Physical plan: Uses direct field terms (
"field":"gender.keyword") instead of scripted aggregation sourcesThe distinction correctly demonstrates that the optimizer may choose to partition on either the source fields or the computed expressions depending on the query context. Both the logical structure and OpenSearch request payload are consistent with this optimization choice.
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yaml (1)
1-10: Logical plan structure is correct and well-formed.The logical plan correctly represents the dedup + rename transformation:
ROW_NUMBER() OVER (PARTITION BY $4, $5)for deduplication partitioning<=($6, 2)filter for allowedDuplicates=2LOWER($4)andLOWER($7)for the rename expressions- Proper null handling with
IS NOT NULLfilters- Sort directions preserved (
ASC-nulls-first,DESC-nulls-last)integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push.yaml (2)
1-5: LGTM - Logical plan structure is correct.The logical plan correctly represents a count aggregation scenario:
LogicalSystemLimitenforces the query size limitLogicalAggregate(group=[{}], cnt=[COUNT()])represents a global COUNT aggregationCalciteLogicalIndexScanon the standard test indexThe hierarchy and notation follow Calcite conventions.
6-7: Physical plan correctly represents aggregation pushdown.The OpenSearchRequestBuilder configuration is appropriate for a count aggregation:
size=0correctly indicates no documents should be fetched (aggregation-only)track_total_hits=2147483647(Integer.MAX_VALUE) ensures accurate countingAGGREGATIONandLIMITpushdowns are properly captured in PushDownContextThe
rel#:LogicalAggregate.NONE.[]pattern appears to be ID-normalized for stable test comparisons (consistent withassertYamlEqualsIgnoreIdusage mentioned in the PR).integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml (1)
6-12: LGTM! Column references are now consistent across the plan.The updated expected output correctly reflects the fix where dedup partitioning and null checks now reference
$1, $3(gender, state) instead of$4, $5(new_gender, new_state). This aligns the PARTITION BY columns with the sort columns used in bothLogicalSystemLimit(line 3) andLogicalSort(line 8), and the physical plan's composite buckets ongender.keywordandstate.keywordare consistent with this change.integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (3)
32-49: LGTM!The test class properly extends
ExplainIT, enables Calcite mode, and loads the necessary test indices. The structure follows established integration test patterns.
444-482: LGTM!The timechart tests verify the correct calculation formulas for
per_second,per_minute,per_hour, andper_dayfunctions. The expressions correctly include the time unit conversions and TIMESTAMPDIFF calculations.
1306-1331: LGTM - proper resource cleanup pattern.Good use of try-finally blocks to ensure
resetQueryBucketSize()is called even if assertions fail. This pattern is consistently applied throughout the test class for test isolation.
… subclass for validation Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
1818-1825: Track the TODO for push-down regressionThe new TODO suggests a known push-down regression. Please link it to a tracking issue (or add one) so it doesn’t get lost.
If you want, I can draft the issue text or propose a targeted test to lock in the desired push-down behavior.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
1736-1835: Preserve explicit window ORDER BY when embedding collations.
embedExistingCollationsIntoOverunconditionally replaces the window's order keys with the plan collation, overwriting any explicit ORDER BY specified in the window. When a window already has orderKeys, they should be preserved. Consider only injecting plan collations when the window has no existing order keys.🛠️ Proposed fix
public RexNode visitOver(RexOver over) { RexWindow window = over.getWindow(); + ImmutableList<RexFieldCollation> orderKeys = + window.orderKeys.isEmpty() ? rexCollations : window.orderKeys; + if (orderKeys.isEmpty()) { + return over; + } return context.rexBuilder.makeOver( over.getType(), over.getAggOperator(), over.getOperands(), window.partitionKeys, - rexCollations, + orderKeys, window.getLowerBound(), window.getUpperBound(), window.isRows(), true, false, over.isDistinct(), over.ignoreNulls()); }core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
398-425: Fail fast on null external registrations and make duplicate detection atomic.Right now a null operator/aggFunction would surface as an NPE later, and
containsKey+putcan misreport duplicates under concurrency.🐛 Proposed fix (null guard + atomic warn)
public void registerExternalOperator(BuiltinFunctionName functionName, SqlOperator operator) { - if (externalFunctionRegistry.containsKey(functionName)) { - logger.warn( - String.format(Locale.ROOT, "Function %s is registered multiple times", functionName)); - } - externalFunctionRegistry.put(functionName, (builder, args) -> builder.makeCall(operator, args)); + if (operator == null) { + throw new IllegalArgumentException( + String.format(Locale.ROOT, "Operator cannot be null for function %s", functionName)); + } + FunctionImp impl = (builder, args) -> builder.makeCall(operator, args); + FunctionImp previous = externalFunctionRegistry.put(functionName, impl); + if (previous != null) { + logger.warn( + String.format(Locale.ROOT, "Function %s is registered multiple times", functionName)); + } } public void registerExternalAggOperator( BuiltinFunctionName functionName, SqlUserDefinedAggFunction aggFunction) { - if (aggExternalFunctionRegistry.containsKey(functionName)) { - logger.warn( - String.format( - Locale.ROOT, "Aggregate function %s is registered multiple times", functionName)); - } + if (aggFunction == null) { + throw new IllegalArgumentException( + String.format(Locale.ROOT, "Aggregate function cannot be null for %s", functionName)); + } AggHandler handler = (distinct, field, argList, ctx) -> UserDefinedFunctionUtils.makeAggregateCall( aggFunction, List.of(field), argList, ctx.relBuilder); - aggExternalFunctionRegistry.put(functionName, handler); + AggHandler previous = aggExternalFunctionRegistry.put(functionName, handler); + if (previous != null) { + logger.warn( + String.format( + Locale.ROOT, "Aggregate function %s is registered multiple times", functionName)); + } }
🤖 Fix all issues with AI agents
In
`@core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/AppendFunctionImpl.java`:
- Around line 49-50: Update the operand type checker in AppendFunctionImpl by
changing getOperandMetadata() to use OperandTypes.VARIADIC instead of
OperandTypes.SAME_VARIADIC so mixed-type variadic arguments are allowed
(matching AppendCore.collectElements() behavior); mirror the approach used in
MVAppendFunctionImpl and keep the existing return-type inference logic intact to
handle zero-arg cases.
In
`@core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java`:
- Around line 494-510: The rewriteCall in PPLBuiltinOperators.ATAN unsafely
casts SqlCall to SqlBasicCall causing ClassCastException for other SqlCall
subclasses (e.g., SqlCase); fix by first checking if call instanceof
SqlBasicCall and if so setOperator as before, otherwise rebuild and return a new
SqlBasicCall using the chosen SqlOperator (SqlStdOperatorTable.ATAN2 or
SqlStdOperatorTable.ATAN), passing call.getOperandList() and
call.getParserPosition() so operands and position are preserved; ensure you
reference the SqlOperator selection logic (call.getOperandList().size() == 2)
and preserve semantics when constructing the new SqlBasicCall.
In `@integ-test/src/test/resources/expectedOutput/calcite/agg_filter_nested.yaml`:
- Line 5: Add a test record with a null or missing author.name to the dataset
that generates this expected output so the CASE expression in LogicalProject
($f0=[CASE(<($7, 'K'), 1, null:INTEGER)]) is exercised for the null branch;
update the input test data used by the calcite/agg_filter_nested test (the
sample documents) to include one document where author is absent or author.name
is explicitly null, then rerun the test to regenerate/verify the expected output
includes the null branch behavior.
♻️ Duplicate comments (5)
docs/user/ppl/interfaces/endpoint.md (1)
123-125: Add a brief note explaining logical vs physical cast representation.The extended example now shows
30:BIGINTin the physical plan while the logical plan keeps>($5, 30). A short note would prevent confusion for users comparing plans or parsing explain output.core/src/main/java/org/opensearch/sql/executor/QueryService.java (1)
310-317: Add logging when validation is skipped for tolerant exceptions.When
ValidationUtils.tolerantValidationException(e)returns true and the originalrelNodeis returned, there's no logging to indicate that validation was bypassed. This makes it harder to troubleshoot queries that silently skip validation due to known Calcite limitations.🔍 Suggested logging for observability
try { validator.validate(Objects.requireNonNull(rewritten)); } catch (CalciteContextException e) { if (ValidationUtils.tolerantValidationException(e)) { + log.debug("Skipping validation due to known Calcite limitation: {}", e.getMessage()); return relNode; } throw new ExpressionEvaluationException(e.getMessage(), e); }core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)
140-145: Close the JDBC Statement created for validator initialization.Line 142 creates a
Statementthat is never closed; this can leak resources under load. Please use try‑with‑resources or retain and close it when the context is disposed.As per coding guidelines, use try-with-resources for resource cleanup.
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (2)
500-515: Null guard forregisterOperator(same as earlier note).Still missing a null check on
operatorbefore building the lambda.
844-855: Guard empty-arg ADD dispatch (same as earlier note).
Stream.of(args).allMatch(...)returnstruefor empty args, selecting CONCAT unexpectedly; add anargs.length > 0guard and document the dispatch pattern.
🧹 Nitpick comments (5)
core/src/main/java/org/opensearch/sql/executor/QueryService.java (1)
283-292: Add@throwsdocumentation for the exception case.Per coding guidelines, public/private methods with JavaDoc should document exceptions. The method throws
ExpressionEvaluationExceptionon validation failure.📝 Suggested JavaDoc update
/** * Validates a RelNode by converting it to SqlNode, performing validation, and converting back. * * <p>This process enables Calcite's type validation and implicit casting mechanisms to work on * PPL queries. * * `@param` relNode the relation node to validate * `@param` context the Calcite plan context containing the validator * `@return` the validated (and potentially modified) relation node + * `@throws` ExpressionEvaluationException if validation fails with a non-tolerant error */core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MapAppendFunctionImpl.java (1)
49-50: Align MAP operand validation with String‑key expectations.Line 50 now uses
OperandTypes.family(SqlTypeFamily.MAP, SqlTypeFamily.MAP), which only checks the MAP family and doesn’t constrain key types. The implementation casts keys toString, so non‑string keys could pass validation but fail at runtime. Please confirm map keys are alwaysVARCHARin this engine; otherwise consider tightening the checker (customSqlOperandTypeChecker) or adding a key‑type guard inverifyMap.Calcite OperandTypes.family(SqlTypeFamily.MAP) does it validate MAP key types? How can a SqlOperandTypeChecker enforce Map<VARCHAR, ?> operands?integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java (1)
491-504: Avoid re-cleaning the actual YAML inside the loopYou can precompute the cleaned actual YAML once and add a null/empty guard for clearer failures. This keeps comparisons consistent and slightly cheaper.
♻️ Proposed refactor
- public static void assertYamlEqualsIgnoreId(List<String> expectedYamls, String actualYaml) { - if (expectedYamls.isEmpty()) { + public static void assertYamlEqualsIgnoreId(List<String> expectedYamls, String actualYaml) { + if (expectedYamls == null || expectedYamls.isEmpty()) { throw new IllegalArgumentException("expectedYamls list must not be empty"); } + String cleanedActual = cleanUpYaml(actualYaml); AssertionError lastError = null; for (String expectedYaml : expectedYamls) { try { - assertYamlEquals(cleanUpYaml(expectedYaml), cleanUpYaml(actualYaml)); + assertYamlEquals(cleanUpYaml(expectedYaml), cleanedActual); return; } catch (AssertionError e) { lastError = e; } } throw lastError; }core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)
90-96: Add an explicit setter with JavaDoc for the operator table provider.Lombok’s public static setter won’t carry method‑level JavaDoc, but the guidelines require public methods to be documented. Consider replacing the Lombok setter with an explicit method (or reduce its visibility if it’s not part of the public API).
♻️ Suggested refactor
- `@Setter` private static SqlOperatorTableProvider operatorTableProvider; + private static SqlOperatorTableProvider operatorTableProvider; + + /** + * Sets the SQL operator table provider. This must be called during initialization by the + * opensearch module. + * + * `@param` provider the provider to use for obtaining operator tables + */ + public static void setOperatorTableProvider(SqlOperatorTableProvider provider) { + operatorTableProvider = java.util.Objects.requireNonNull(provider, "provider"); + }As per coding guidelines, public methods should have JavaDoc.
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
986-1003: Consider guarding duplicate agg registrations.
AggBuilder.registercurrently allows silent overrides; mirroring theBuilderguard would make mistakes louder.♻️ Suggested change
void register(BuiltinFunctionName functionName, AggHandler aggHandler) { - map.put(functionName, aggHandler); + if (map.containsKey(functionName)) { + throw new IllegalStateException( + String.format( + Locale.ROOT, + "Each aggregate function can only be registered once: %s", + functionName)); + } + map.put(functionName, aggHandler); }
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/AppendFunctionImpl.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
Show resolved
Hide resolved
integ-test/src/test/resources/expectedOutput/calcite/agg_filter_nested.yaml
Show resolved
Hide resolved
- use OperandTypes.CHARACTER consistenly - use OperandTypes.VARIADIC for append function - add safeguard for atan rewrite Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
| import org.apache.calcite.sql.type.OperandTypes; | ||
| import org.apache.calcite.sql.type.SqlReturnTypeInference; | ||
| import org.apache.calcite.sql.validate.SqlUserDefinedFunction; | ||
| import org.jspecify.annotations.NonNull; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lombok.NonNull is not for annotating return value; Calcite adopts org.checkerframework.checker.nullness.qual.NonNull, which is heavier than org.jspecify.annotations.NonNull.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
core/src/main/java/org/opensearch/sql/expression/function/udf/math/NumberToStringFunction.java (1)
33-40: Add JavaDoc for public getOperandMetadata().This public method is in
core/src/main/javaand now changed; guidelines require JavaDoc with at least@return. As per coding guidelines, please add JavaDoc.✍️ Suggested update
`@Override` + /** + * {`@inheritDoc`} + * + * `@return` non-null operand metadata for numeric operands + */ public `@NonNull` UDFOperandMetadata getOperandMetadata() { return PPLOperandTypes.NUMERIC; }core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVZipFunctionImpl.java (1)
65-74: Add JavaDoc for the public override.
getOperandMetadata()is public and changed; it still lacks JavaDoc with@return, which is required forcore/src/main/java/**/*.java.💡 Suggested JavaDoc
`@Override` + /** + * Returns operand metadata for mvzip. + * + * `@return` operand metadata for 2- or 3-argument variants + */ public `@NonNull` UDFOperandMetadata getOperandMetadata() {As per coding guidelines, document public methods with JavaDoc.
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVFindFunctionImpl.java (1)
48-53: Add JavaDoc for this public override.
The method is public but has no JavaDoc with@return. Please add a brief block to satisfy the project rules.📝 Proposed JavaDoc
`@Override` - public `@NonNull` UDFOperandMetadata getOperandMetadata() { + /** + * Operand metadata for mvfind. + * + * `@return` operand metadata describing accepted argument families + */ + public `@NonNull` UDFOperandMetadata getOperandMetadata() {As per coding guidelines, ...
core/src/main/java/org/opensearch/sql/expression/function/udf/RelevanceQueryFunction.java (1)
33-45: Add JavaDoc with@returnfor the public method.Public methods must have proper JavaDoc; the current block comment doesn’t include
@return. As per coding guidelines, convert this to JavaDoc and include the return tag.🔧 Proposed fix
- /* - * The first parameter is always required (either fields or query). - * The second parameter is query when fields are present, otherwise it's the first parameter. - * Starting from the 3rd parameter (or 2nd when no fields), they are optional parameters for relevance queries. - * Different query has different parameter set, which will be validated in dedicated query builder. - * Query parameter is always required and cannot be null. - */ + /** + * The first parameter is always required (either fields or query). + * The second parameter is query when fields are present, otherwise it's the first parameter. + * Starting from the 3rd parameter (or 2nd when no fields), they are optional parameters for relevance queries. + * Different query has different parameter set, which will be validated in dedicated query builder. + * Query parameter is always required and cannot be null. + * + * `@return` operand metadata for relevance query functions + */core/src/main/java/org/opensearch/sql/expression/function/udf/SpanFunction.java (1)
88-96: Remove unsupported CHARACTER operand from metadata or implement handling.The operand metadata declares
(CHARACTER, NUMERIC, CHARACTER)as valid (line 63), but the implementor does not support it. When a CHARACTER field is passed with a CHARACTER unit string:
- Line 88 condition is false (unitType is VARCHAR, not null/ANY)
- Line 97 condition is false (plain VARCHAR is not instanceof ExprSqlType)
- Execution falls through to the exception at line 115
All current tests use either numeric fields or fields wrapped in ExprSqlType (EXPR_DATE/EXPR_TIME/EXPR_TIMESTAMP), never plain STRING/CHARACTER fields. Either remove the CHARACTER operand from metadata or implement the missing path to handle string-based spans.
🤖 Fix all issues with AI agents
In
`@core/src/main/java/org/opensearch/sql/calcite/utils/UserDefinedFunctionUtils.java`:
- Around line 232-235: getOperandMetadata() is annotated `@NonNull` but returns
the nullable field operandMetadata (set in adaptExprMethodToUDF), violating the
contract; either validate and throw early when operandMetadata is null or return
a non-null default instance. Fix by updating adaptExprMethodToUDF to ensure it
never leaves operandMetadata null (validate inputs and throw
IllegalArgumentException) or change getOperandMetadata() to return a safe
default UDFOperandMetadata (or throw a clear IllegalStateException) when
operandMetadata == null, referencing getOperandMetadata(), operandMetadata, and
adaptExprMethodToUDF to locate the change.
In
`@core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java`:
- Around line 55-58: TransformFunctionImpl currently uses
sqlOperatorBinding.getOperandType(1) which returns the FUNCTION type during
validation, causing incorrect ARRAY<FUNCTION> inference; update
TransformFunctionImpl to mirror ReduceFunctionImpl by distinguishing
SqlCallBinding vs RexCallBinding: when sqlOperatorBinding is an instance of
SqlCallBinding, extract the SqlLambda from the call, obtain the lambda
expression and derive its return type via SqlTypeUtil.deriveType(sqlValidator,
lambdaExpr) (or equivalent) and use that as lambdaReturnType; when it's a
RexCallBinding (already compiled), keep using getOperandType(1); update the
return type creation code that calls createArrayType(...) to use the derived
lambdaReturnType for correct ARRAY<actual_return_type>.
In
`@core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java`:
- Around line 71-74: getOperandCountRange currently only enforces a minimum of 3
arguments which allows even argument counts and causes IndexOutOfBounds in
JsonSetFunctionImpl.eval when it assumes path/value pairs; update
getOperandCountRange in class JsonSetFunctionImpl to validate that the total
operand count is odd (>=3) — either by returning a custom SqlOperandCountRange
that enforces odd counts or by adding explicit validation at the start of eval
(method eval) to throw a clear validation exception if keys.size() is even,
preventing the keys.get(i+1) access from ever indexing out of bounds.
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/binning/MinspanBucketFunction.java`:
- Around line 50-52: Add a JavaDoc block to the public method
getOperandMetadata() in the MinspanBucketFunction class that documents the
method purpose and includes an `@return` tag describing the UDFOperandMetadata
value being returned (specifically that it supplies the operand types
PPLOperandTypes.NUMERIC_NUMERIC_NUMERIC_NUMERIC for this UDF). Keep the comment
concise and placed immediately above the getOperandMetadata() method
declaration.
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/ToSecondsFunction.java`:
- Around line 53-56: Add JavaDoc to the public override method
getOperandMetadata in ToSecondsFunction: describe what the method returns and
include an `@return` tag that specifies it returns the operand metadata
(PPLOperandTypes.DATETIME_OR_STRING_OR_INTEGER) used by the UDF; keep the
comment concise and place it immediately above the getOperandMetadata()
override.
- Line 23: Add JavaDoc to the public override getOperandMetadata() in
ToSecondsFunction explaining it "Returns the operand metadata for the function
signature" and annotate the method return with `@NonNull` as suggested; update the
method in class ToSecondsFunction to return
PPLOperandTypes.DATETIME_OR_STRING_OR_INTEGER and include the `@Override` and
JavaDoc. Also verify that the org.jspecify.annotations.NonNull import is
available to the build: if not provided by the runtime/plugin, add the jspecify
dependency to core/build.gradle so the import resolves.
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/WeekdayFunction.java`:
- Around line 52-55: Add a JavaDoc block to the public override of
getOperandMetadata in WeekdayFunction describing the method and include an
`@return` tag explaining it returns the operand metadata for this UDF; reference
the getOperandMetadata method and the PPLOperandTypes.DATETIME_OPTIONAL_INTEGER
return value so it's clear this documents that the function accepts a DATETIME
and an optional INTEGER operand type.
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/math/EulerFunction.java`:
- Around line 42-45: The public override EulerFunction.getOperandMetadata lacks
JavaDoc; add a short Javadoc comment (or use {`@inheritDoc`}) above the method and
include an `@return` describing that it returns the UDFOperandMetadata for this
UDF (specifically PPLOperandTypes.NONE), referencing the return type
UDFOperandMetadata and the constant PPLOperandTypes.NONE so readers know this
UDF takes no operands.
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java`:
- Around line 42-44: Add a JavaDoc comment to the public method getKind() in
class ModFunction: document that this method returns the SQL operator kind for
the modulo operation and include an `@return` tag stating it returns SqlKind.MOD;
ensure the JavaDoc also notes it overrides the superclass method (since
getKind() is an override) and follows project formatting conventions.
In
`@core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java`:
- Around line 36-46: Update the Javadoc for getOperandMetadata() to match the
implementation: remove the phrase "default null without checking" and document
that the default returns a non-null UDFOperandMetadata that wraps
OperandTypes.VARIADIC (i.e., uses a permissive type checker and effectively
skips strict type checking). Mention the `@NonNull` return contract and reference
UDFOperandMetadata.wrap(...) and OperandTypes.VARIADIC in the comment so readers
see the actual behavior.
In `@docs/dev/ppl-functions.md`:
- Around line 131-147: The doc example uses an unqualified ADD symbol but the
register signature requires a BuiltinFunctionName enum; update the example to
call register with BuiltinFunctionName.ADD (e.g.,
register(BuiltinFunctionName.ADD, ...)) so it matches the API; ensure the
surrounding references (register, FunctionImp, builder.makeCall,
SqlStdOperatorTable.CONCAT, SqlStdOperatorTable.PLUS,
OpenSearchTypeUtil::isCharacter) remain unchanged.
- Around line 192-196: Update the docs to reflect the new UDAF registration API
by replacing references to AggBuilder::registerOperator(BuiltinFunctionName
functionName, SqlAggFunction aggFunction) with
AggBuilder::register(BuiltinFunctionName functionName, AggHandler aggHandler);
also ensure the subsequent bullets and any example snippets or mentions that
suggest registerOperator are changed to use AggBuilder::register and the
AggHandler/SqlAggFunction parameter names/types consistent with the PR
(including keeping the note about PPLFuncImpTable::registerExternalAggOperator
for engine-dependent functions).
♻️ Duplicate comments (1)
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.java (1)
42-48: Javadoc documents STRING operand signatures that the type checker rejects.The Javadoc claims
(IP, STRING)and(STRING, IP)signatures, but the type checker at lines 135-136 only accepts operands whereOpenSearchTypeUtil.isIp(type, true)returns true. This method does not accept VARCHAR/CHAR types. Either update the Javadoc to reflect the actual(IP, IP)only behavior, or modify the type checker to also accept character types if STRING support is intended.
🧹 Nitpick comments (14)
core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/ExtractFunction.java (1)
51-53: Add JavaDoc for public method.
The public override lacks required JavaDoc with@return. As per coding guidelines, please document it.💡 Suggested JavaDoc
`@Override` + /** + * Returns operand metadata for the EXTRACT function. + * + * `@return` operand metadata for (STRING, DATE/TIME/TIMESTAMP/STRING) + */ public `@NonNull` UDFOperandMetadata getOperandMetadata() { return PPLOperandTypes.STRING_DATETIME; }As per coding guidelines, document public methods with JavaDoc.
core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/SecToTimeFunction.java (1)
64-73: Consider adding JavaDoc to the public static utility method.The
secToTimemethod is public and could be called directly. Per coding guidelines, public methods should have JavaDoc with@param,@return, and@throwsdocumentation. This would help clarify the expected input range and output format.📝 Suggested JavaDoc
+ /** + * Converts seconds to time string in HH:mm:ss[.nnnnnn] format. + * + * `@param` seconds the number of seconds to convert (integral or decimal) + * `@return` time string representation + */ public static String secToTime(Number seconds) {core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/DatePartFunction.java (1)
52-52: Minor typo in JavaDoc: "MIBUTEOFHOUR" should be "MINUTEOFHOUR".This appears to be a pre-existing typo, but since you're touching this file, consider fixing it.
📝 Suggested fix
- * <li>MIBUTEOFHOUR + * <li>MINUTEOFHOURcore/src/main/java/org/opensearch/sql/expression/function/udf/ToNumberFunction.java (1)
98-100: Empty catch block silently swallows parsing errors.The empty catch block suppresses all exceptions without logging or documentation, making it difficult to diagnose conversion failures. If returning
nullon parse failure is intentional behavior, consider at minimum adding a comment explaining this design choice, or logging at debug/trace level.♻️ Suggested improvement
} catch (Exception e) { - + // Intentionally return null for unparseable input (lenient parsing) + // Callers should handle null return appropriately }Or, for better observability:
+import java.util.logging.Logger; +import java.util.logging.Level; ... + private static final Logger LOGGER = Logger.getLogger(ToNumberFunction.class.getName()); ... } catch (Exception e) { - + LOGGER.log(Level.FINE, "Failed to parse number: " + numStr, e); }core/src/main/java/org/opensearch/sql/expression/function/udf/math/ConvFunction.java (2)
26-29: Pre-existing doc typo: "floor function" should be "conv function".The class-level JavaDoc appears to have been copied from another function and still references "floor function" instead of "conv function".
📝 Suggested fix
/** * Convert number x from base a to base b<br> - * The supported signature of floor function is<br> + * The supported signature of conv function is<br> * (STRING, INTEGER, INTEGER) -> STRING<br> * (INTEGER, INTEGER, INTEGER) -> STRING */
61-74: Pre-existing: JavaDoc does not match implementation behavior.The JavaDoc states the method returns
"0"for invalid input andnullfor out-of-range bases, but the actual implementation will throwNumberFormatExceptionin both cases. Consider either updating the documentation to reflect the actual exception-throwing behavior, or wrapping the logic in a try-catch to match the documented contract.📝 Option 1: Update JavaDoc to match implementation
/** * Convert numStr from fromBase to toBase * * `@param` numStr the number to convert (case-insensitive for alphanumeric digits, may have a * leading '-') * `@param` fromBase base of the input number (2 to 36) * `@param` toBase target base (2 to 36) - * `@return` the converted number in the target base (uppercase), "0" if the input is invalid, or - * null if bases are out of range. + * `@return` the converted number in the target base + * `@throws` NumberFormatException if numStr is invalid for fromBase or if bases are out of range */core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/TimestampFunction.java (1)
75-84: Consider adding JavaDoc to public static methods.The static
timestampmethods are public API entry points called via reflection. Adding JavaDoc with@paramand@returntags would improve maintainability, especially documenting the expected input types and the return value format.📝 Example JavaDoc
+ /** + * Converts a datetime value to a timestamp string. + * + * `@param` properties function properties containing timezone context + * `@param` datetime the datetime value to convert + * `@return` timestamp string representation + */ public static String timestamp(FunctionProperties properties, ExprValue datetime) { return (String) convertToTimestampValue(datetime, properties).valueForCalcite(); } + /** + * Converts a datetime value to timestamp and adds a time offset. + * + * `@param` properties function properties containing timezone context + * `@param` datetime the base datetime value + * `@param` addTime the time value to add to the base datetime + * `@return` timestamp string representation after adding the time offset + */ public static String timestamp( FunctionProperties properties, ExprValue datetime, ExprValue addTime) {core/src/main/java/org/opensearch/sql/expression/function/udf/math/ScalarMinFunction.java (1)
54-56: Consider adding JavaDoc to the publicminmethod.Per coding guidelines, public methods should have JavaDoc documentation. While this is pre-existing code, adding a brief doc comment would improve maintainability:
/** * Returns the minimum value from the given arguments using mixed-type comparison. * * `@param` args array of values to compare * `@return` the minimum value, or null if all values are null */ public static Object min(Object[] args) {core/src/main/java/org/opensearch/sql/expression/function/udf/RexExtractMultiFunction.java (1)
47-58: Add JavaDoc for the public method.
getOperandMetadatais public and should include JavaDoc (with@return, and@throwsif applicable). As per coding guidelines.📝 Suggested update
+ /** + * Returns operand metadata for REX_EXTRACT_MULTI validation. + * + * `@return` operand metadata describing accepted operands + */ `@Override` public `@NonNull` UDFOperandMetadata getOperandMetadata() {core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonArrayLengthFunctionImpl.java (1)
54-63: Consider adding null safety to theevalmethod.The
evalmethod has potential robustness issues in the existing code:
- Using
assertfor argument validation can be disabled at runtime- If
gson.fromJsonreturnsnull(e.g., for JSON literal"null"),target.size()will throw NPESince
NullPolicy.ANYindicates this function handles nulls itself, explicit null handling would be safer.♻️ Suggested improvement
public static Object eval(Object... args) { - assert args.length == 1 : "Json array length only accept one argument"; + if (args.length != 1) { + throw new IllegalArgumentException("Json array length only accepts one argument"); + } String value = (String) args[0]; + if (value == null) { + return null; + } try { List<?> target = gson.fromJson(value, List.class); - return target.size(); + return target != null ? target.size() : null; } catch (JsonSyntaxException e) { return null; } }core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImpl.java (1)
44-45: Consider stricter operand count validation.Using
OperandTypes.VARIADICallows calls with 0 or 1 arguments, which theevalmethod handles by returningnull. While this is safe, requiring at least 2 arguments (json_string + path) at validation time would provide earlier feedback to users.♻️ Optional: Enforce minimum 2 operands
`@Override` public `@NonNull` UDFOperandMetadata getOperandMetadata() { - return UDFOperandMetadata.wrap(OperandTypes.VARIADIC); + return UDFOperandMetadata.wrap( + OperandTypes.repeat(SqlOperandCountRanges.from(2), OperandTypes.ANY)); }core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonDeleteFunctionImpl.java (1)
39-40: Consider requiring at least 2 operands.
OperandTypes.ONE_OR_MOREallows calls with just a JSON string and no paths to delete, which is semantically meaningless. The function should require at least 2 arguments: the JSON string and at least one path to delete.♻️ Proposed fix: Enforce minimum 2 operands
`@Override` public `@NonNull` UDFOperandMetadata getOperandMetadata() { - return UDFOperandMetadata.wrap(OperandTypes.ONE_OR_MORE); + return UDFOperandMetadata.wrap( + OperandTypes.repeat(SqlOperandCountRanges.from(2), OperandTypes.ANY)); }core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/UnixTimestampFunction.java (1)
51-53: Add JavaDoc for the public override.
Line 52 changes the public API signature; please add a brief JavaDoc with@returnto comply with project guidelines. As per coding guidelines, ...✍️ Suggested JavaDoc addition
`@Override` + /** + * Returns operand metadata for unix_timestamp. + * + * `@return` non-null operand metadata + */ public `@NonNull` UDFOperandMetadata getOperandMetadata() { return PPLOperandTypes.OPTIONAL_DATE_OR_TIMESTAMP_OR_NUMERIC; }core/src/main/java/org/opensearch/sql/expression/function/udf/SpanFunction.java (1)
60-70: Consider adding JavaDoc for this public method.The refactor to
OperandTypes.family(...).or(...)chains looks correct and aligns with the PR's goal of moving away fromCompositeOperandTypeChecker. The@NonNullannotation is a good addition.Per coding guidelines, public methods should have JavaDoc documentation explaining the operand types and their semantics. Consider documenting what each combination represents:
- CHARACTER/NUMERIC/CHARACTER: time-based span on string fields
- DATETIME/NUMERIC/CHARACTER: time-based span on datetime fields
- NUMERIC/NUMERIC/ANY: numeric span (no time unit required)
core/src/main/java/org/opensearch/sql/calcite/utils/UserDefinedFunctionUtils.java
Show resolved
Hide resolved
| RelDataTypeFactory typeFactory = sqlOperatorBinding.getTypeFactory(); | ||
| RexCallBinding rexCallBinding = (RexCallBinding) sqlOperatorBinding; | ||
| List<RexNode> operands = rexCallBinding.operands(); | ||
| RelDataType lambdaReturnType = ((RexLambda) operands.get(1)).getExpression().getType(); | ||
| RelDataType lambdaReturnType = sqlOperatorBinding.getOperandType(1); | ||
| return createArrayType( | ||
| typeFactory, typeFactory.createTypeWithNullability(lambdaReturnType, true), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "TransformFunctionImpl.java" | head -20Repository: opensearch-project/sql
Length of output: 166
🏁 Script executed:
cat -n ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javaRepository: opensearch-project/sql
Length of output: 10999
🏁 Script executed:
# Search for other lambda-based collection functions and how they handle return types
find . -path ./target -prune -o -type f -name "*.java" -print | xargs grep -l "getOperandType.*lambda\|lambda.*getOperandType" 2>/dev/null | head -10Repository: opensearch-project/sql
Length of output: 166
🏁 Script executed:
# Search for similar patterns with SqlReturnTypeInference in CollectionUDF directory
find ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF -type f -name "*.java" | head -20Repository: opensearch-project/sql
Length of output: 1529
🏁 Script executed:
cat -n ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/FilterFunctionImpl.javaRepository: opensearch-project/sql
Length of output: 3150
🏁 Script executed:
cat -n ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.javaRepository: opensearch-project/sql
Length of output: 7288
🏁 Script executed:
cat -n ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/LambdaUtils.javaRepository: opensearch-project/sql
Length of output: 996
🏁 Script executed:
# Search for TransformFunctionImpl tests
find . -path ./target -prune -o -type f -name "*TransformFunctionImplTest*" -o -name "*TransformTest*" | grep -i testRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Broader search for lambda-related tests
find ./core/src/test -type f -name "*.java" | xargs grep -l "TransformFunctionImpl\|transform.*lambda" 2>/dev/null | head -10Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Search for SqlLambda usage and getExpression calls in the codebase
rg "SqlLambda" --type java -A 3 | head -50Repository: opensearch-project/sql
Length of output: 1907
🏁 Script executed:
# Check if there's any validation or type binding code that explains lambda operand types
rg "getOperandType.*1\)" --type java -B 2 -A 2 | head -60Repository: opensearch-project/sql
Length of output: 5563
🏁 Script executed:
# Look at how ReduceFunctionImpl and TransformFunctionImpl differ in handling binding types
grep -B 5 -A 10 "getReturnTypeInference" ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javaRepository: opensearch-project/sql
Length of output: 624
🏁 Script executed:
# Search for any tests that use transform function
find ./core/src/test -type f -name "*.java" -o -name "*.sql" | xargs grep -l "transform" 2>/dev/null | head -5Repository: opensearch-project/sql
Length of output: 374
🏁 Script executed:
# Search for any documentation or comments about lambda operand types
rg "lambda.*type\|FUNCTION.*type" --type java -i | grep -i "lambda\|function" | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check if there are any RexCallBinding vs SqlCallBinding handling patterns for lambdas
rg "RexCallBinding\|SqlCallBinding" ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ --type java -B 2 -A 5 | head -60Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Search for any direct usage of SqlCallBinding in the transform/lambda related code
rg "instanceof.*Binding" ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ --type java -B 3 -A 3Repository: opensearch-project/sql
Length of output: 1488
🏁 Script executed:
# Check the Calcite version used in the project
find . -name "pom.xml" -o -name "build.gradle" | xargs grep -i "calcite" | grep -i "version" | head -5Repository: opensearch-project/sql
Length of output: 293
🏁 Script executed:
# Search for any commits or comments about TransformFunctionImpl implementation
git log --oneline --all -- "./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java" 2>/dev/null | head -10Repository: opensearch-project/sql
Length of output: 307
🏁 Script executed:
# Look for any integration tests or examples showing transform usage
find . -name "*.sql" -o -name "*.md" | xargs grep -l "transform" 2>/dev/null | head -5Repository: opensearch-project/sql
Length of output: 243
🏁 Script executed:
# Check the ppl-functions.md documentation for transform function
grep -A 10 -B 5 "transform" ./docs/user/ppl/cmd/ppl-functions.md 2>/dev/null || grep -A 10 -B 5 "transform" ./docs/dev/ppl-functions.md 2>/dev/nullRepository: opensearch-project/sql
Length of output: 634
🏁 Script executed:
# Look at the most recent commit details for TransformFunctionImpl
git show b1dca98d --stat 2>/dev/null | head -30Repository: opensearch-project/sql
Length of output: 1723
🏁 Script executed:
# Search for documentation or tests showing transform usage with different return types
find . -name "*.java" -path "*/test/*" | xargs grep -l "TransformFunctionImpl\|transform.*array" 2>/dev/null | head -5Repository: opensearch-project/sql
Length of output: 238
🏁 Script executed:
# Check if there are any comments or code patterns about when return type inference is called
rg "SqlReturnTypeInference\|getReturnTypeInference" ./core/src/main/java/org/opensearch/sql/expression/function/ --type java -B 2 -A 10 | head -80Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check the array function integration test
cat -n ./integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java | head -100Repository: opensearch-project/sql
Length of output: 3717
🏁 Script executed:
# Check the PPL array function test
cat -n ./ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java | head -150Repository: opensearch-project/sql
Length of output: 6867
🏁 Script executed:
# Search for transform-specific tests or usage
rg "transform\(" --type java -i ./integ-test ./ppl | head -20Repository: opensearch-project/sql
Length of output: 1956
🏁 Script executed:
# Check the CalciteArrayFunctionIT for transform tests
grep -n "transform" ./integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.javaRepository: opensearch-project/sql
Length of output: 478
Extract lambda's return type explicitly for correct type inference during validation.
At the SqlCallBinding phase (validation), getOperandType(1) returns the FUNCTION type itself, not the lambda's return type. This differs from how ReduceFunctionImpl handles similar lambda-based operations. Follow the ReduceFunctionImpl pattern: distinguish between SqlCallBinding (extract SqlLambda, get expression, derive type via SqlTypeUtil.deriveType()) and RexCallBinding (use simple getOperandType() on already-compiled types). Without this, type inference may produce ARRAY<FUNCTION> instead of ARRAY<actual_return_type>.
🤖 Prompt for AI Agents
In
`@core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java`
around lines 55 - 58, TransformFunctionImpl currently uses
sqlOperatorBinding.getOperandType(1) which returns the FUNCTION type during
validation, causing incorrect ARRAY<FUNCTION> inference; update
TransformFunctionImpl to mirror ReduceFunctionImpl by distinguishing
SqlCallBinding vs RexCallBinding: when sqlOperatorBinding is an instance of
SqlCallBinding, extract the SqlLambda from the call, obtain the lambda
expression and derive its return type via SqlTypeUtil.deriveType(sqlValidator,
lambdaExpr) (or equivalent) and use that as lambdaReturnType; when it's a
RexCallBinding (already compiled), keep using getOperandType(1); update the
return type creation code that calls createArrayType(...) to use the derived
lambdaReturnType for correct ARRAY<actual_return_type>.
core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
Show resolved
Hide resolved
.../src/main/java/org/opensearch/sql/expression/function/udf/binning/MinspanBucketFunction.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/ToSecondsFunction.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/udf/math/EulerFunction.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java
Show resolved
Hide resolved
…mplementation - additionally udpate ppl-functions.md documentation Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
core/src/main/java/org/opensearch/sql/expression/function/udf/condition/EarliestFunction.java (1)
38-45: Add JavaDoc for publicgetOperandMetadata().Public methods under
core/src/main/java/**require JavaDoc with@return. Please add a brief JavaDoc (especially now that the non-null contract is explicit).✍️ Proposed JavaDoc
+ /** + * Returns operand metadata for the {`@code` earliest} function. + * + * `@return` non-null operand metadata + */ `@Override` public `@NonNull` UDFOperandMetadata getOperandMetadata() { return PPLOperandTypes.STRING_TIMESTAMP; }As per coding guidelines, ...
core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtendFunctionImpl.java (1)
62-64: Fix copy-paste error in exception message.The error message incorrectly refers to "Json append function" but this is
JsonExtendFunctionImpl.Suggested fix
if (keys.size() % 2 != 0) { throw new RuntimeException( - "Json append function needs corresponding path and values, but current get: " + keys); + "Json extend function needs corresponding path and values, but current get: " + keys); }core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java (1)
66-75: Add unit tests for the new default methods in UserDefinedFunctionBuilder.This interface with its new default methods—
getKind(),getOperandMetadata(),toUDF(String), andtoUDF(String, boolean)—contains new business logic incore/src/main/javathat must be covered by unit tests per coding guidelines. Current test coverage validates specific UDF implementations but does not test the default interface behavior itself.
🤖 Fix all issues with AI agents
In
`@core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java`:
- Around line 62-63: Add a JavaDoc comment to the public override method
getOperandMetadata in TransformFunctionImpl describing what the method returns
and include an `@return` tag that explains the returned UDFOperandMetadata (e.g.,
metadata about operands for this UDF). Ensure the JavaDoc appears immediately
above the method signature for the public `@NonNull` UDFOperandMetadata
getOperandMetadata() override and follows project JavaDoc style.
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/SecToTimeFunction.java`:
- Around line 48-50: Add a JavaDoc comment to the public override
getOperandMetadata() in SecToTimeFunction that documents the method and its
return value; specifically, describe that it returns the operand metadata for
this UDF and include an `@return` tag stating it returns a UDFOperandMetadata
(e.g., PPLOperandTypes.NUMERIC) applicable to this function. Ensure the JavaDoc
sits immediately above the method declaration and follows project JavaDoc style.
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java`:
- Around line 46-49: Add a JavaDoc comment to the public method
getOperandMetadata() in ModFunction explaining what operand metadata it returns
(i.e., the expected operand types for the modulus UDF) and include a `@return` tag
that states it returns PPLOperandTypes.NUMERIC_NUMERIC; place the comment
immediately above the getOperandMetadata() declaration in ModFunction.java.
- Around line 41-49: Add a new test class ModFunctionTest.java that follows the
ToNumberFunctionTest pattern and verifies ModFunction's metadata (getKind()
returns SqlKind.MOD and getOperandMetadata() returns
PPLOperandTypes.NUMERIC_NUMERIC) and exercises the core logic by calling
integralMod() and floatingMod() with a variety of inputs: integer modulo cases,
floating-point modulo cases, mixed-type coercion (e.g., int % double and double
% int), and edge cases including divisor zero to assert the expected
exception/behavior; use the existing test utilities/assertions used by other UDF
tests to construct inputs and compare results.
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/math/NumberToStringFunction.java`:
- Around line 39-41: Add a JavaDoc comment to the public override
getOperandMetadata() in NumberToStringFunction describing what the method
returns and any exceptions it may throw; reference the method signature public
`@NonNull` UDFOperandMetadata getOperandMetadata(), state that it returns the
operand metadata (PPLOperandTypes.NUMERIC) via an `@return` tag, and include an
`@throws` tag only if the method can throw checked or runtime exceptions —
otherwise explicitly note no exceptions are thrown. Ensure the JavaDoc follows
project style and is placed immediately above the getOperandMetadata() method.
In
`@core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java`:
- Around line 49-51: Add a JavaDoc comment to the public default method
getKind() in UserDefinedFunctionBuilder describing its purpose (returns the SQL
kind for this function), the return type (SqlKind, defaulting to
SqlKind.OTHER_FUNCTION), and any behavior/override expectations so consumers
know they can override it for other SqlKind values; place the comment
immediately above the getKind() declaration and follow existing project JavaDoc
style and tags (brief description and `@return`).
♻️ Duplicate comments (8)
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.java (1)
42-47: Class Javadoc documents unsupported STRING operand signatures.The Javadoc claims support for
(IP, STRING)and(STRING, IP)signatures, but the type checker at lines 135-136 only accepts IP types viaOpenSearchTypeUtil.isIp(type, true). ThegetAllowedSignaturesmethod correctly reports<IP>, <IP>.Either update the Javadoc to remove STRING claims, or modify the type checker to accept VARCHAR if STRING support is intended.
,core/src/main/java/org/opensearch/sql/expression/function/udf/math/EulerFunction.java (1)
42-44: Add explicit JavaDoc (or {@inheritDoc}) for the public override.This public method still lacks an explicit JavaDoc/
{@inheritdoc}block. Please add a brief doc (including@return) to meet the public-API documentation requirement. As per coding guidelines, ...core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/ToSecondsFunction.java (2)
23-23: jspecify dependency concern already flagged.The import addition is appropriate for the
@NonNullannotation. The concern about verifying this dependency incore/build.gradlewas already raised in a prior review.
53-56: Nullability contract improvement looks good.The
@NonNullannotation ongetOperandMetadata()is appropriate since the method always returns a non-null constant. This aligns with the PR-wide pattern of strengthening nullability contracts across UDF operand metadata accessors.The missing JavaDoc concern was already flagged in a prior review.
docs/dev/ppl-functions.md (1)
192-195: Inconsistent UDAF registration API documentation.Lines 192-193 reference
AggBuilder::registerOperator(BuiltinFunctionName functionName, SqlAggFunction aggFunction)for "standard registration," but line 195 documentsAggBuilder::register(BuiltinFunctionName functionName, AggHandler aggHandler). According to the AI summary, the API changed fromregisterOperatorwithSqlAggFunctiontoregisterwithAggHandler.The first bullet should be updated to reflect the new
registerAPI to avoid misleading developers.📝 Suggested fix
-- Use `AggBuilder::registerOperator(BuiltinFunctionName functionName, SqlAggFunction aggFunction)` for standard - registration +- Use `AggBuilder::register(BuiltinFunctionName functionName, AggHandler aggHandler)` for standard + registrationcore/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java (1)
56-74: Missing validation for odd argument count.The function signature
(json_string, path1, value1, path2, value2, ...)requires an odd total count. With an even count like 4 arguments, theevalmethod will throwIndexOutOfBoundsExceptionat line 105 when accessingkeys.get(i + 1)for the incomplete pair. Add odd-count validation incheckOperandTypes.🐛 Proposed fix
`@Override` public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) { final int count = callBinding.getOperandCount(); + // Require odd count: 1 json_string + n pairs of (path, value) + if (count % 2 == 0) { + if (throwOnFailure) { + throw callBinding.newValidationSignatureError(); + } + return false; + } for (int i = 1; i < count; i += 2) { RelDataType nameType = callBinding.getOperandType(i); if (!OpenSearchTypeUtil.isCharacter(nameType)) {core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java (1)
41-44: Add JavaDoc for the publicgetKind()override.
Public methods require JavaDoc with@return. As per coding guidelines, please document this method accordingly.core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (1)
52-59: Derive the lambda return type instead of FUNCTION during validation.Line 56 uses
getOperandType(1), which (underSqlCallBinding) yields the lambda’s FUNCTION type, not its return type—this can inferARRAY<FUNCTION>and propagate incorrect casts. Follow theReduceFunctionImplpattern: when binding isSqlCallBinding, extract theSqlLambdaexpression and derive its return type; fall back togetOperandType(1)forRexCallBinding.🐛 Proposed fix
return sqlOperatorBinding -> { RelDataTypeFactory typeFactory = sqlOperatorBinding.getTypeFactory(); - RelDataType lambdaReturnType = sqlOperatorBinding.getOperandType(1); + RelDataType lambdaReturnType; + if (sqlOperatorBinding instanceof SqlCallBinding) { + SqlCallBinding callBinding = (SqlCallBinding) sqlOperatorBinding; + SqlNode lambdaNode = callBinding.operand(1); + if (lambdaNode instanceof org.apache.calcite.sql.SqlLambda) { + SqlNode lambdaExpr = + ((org.apache.calcite.sql.SqlLambda) lambdaNode).getExpression(); + lambdaReturnType = SqlTypeUtil.deriveType(callBinding, lambdaExpr); + } else { + lambdaReturnType = callBinding.getOperandType(1); + } + } else { + lambdaReturnType = sqlOperatorBinding.getOperandType(1); + } return createArrayType( typeFactory, typeFactory.createTypeWithNullability(lambdaReturnType, true), true); };#!/bin/bash # Compare lambda return-type inference patterns (Transform vs Reduce) rg -n "class ReduceFunctionImpl|getReturnTypeInference|SqlLambda|deriveType" \ core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.java -C 3
🧹 Nitpick comments (5)
core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/FromUnixTimeFunction.java (1)
56-58: Add JavaDoc for public method per guidelines.
getOperandMetadata()is public and currently has no JavaDoc. Please add a brief JavaDoc with an@returntag. As per coding guidelines.📝 Proposed JavaDoc
+ /** + * Returns operand metadata for FROM_UNIXTIME. + * + * `@return` operand metadata describing numeric or numeric+string arguments + */ `@Override` public `@NonNull` UDFOperandMetadata getOperandMetadata() { return PPLOperandTypes.NUMERIC_OPTIONAL_STRING; }core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/PeriodNameFunction.java (1)
66-77: Remove unused variabledateType.The variable
dateTypeis computed on lines 69-71 but never used. This appears to be dead code.♻️ Suggested fix
`@Override` public Expression implement( RexToLixTranslator translator, RexCall call, List<Expression> translatedOperands) { - ExprType dateType = - OpenSearchTypeFactory.convertRelDataTypeToExprType( - call.getOperands().getFirst().getType()); return Expressions.call( PeriodNameFunctionImplementor.class, "name", translatedOperands.getFirst(), Expressions.constant(periodUnit)); }core/src/main/java/org/opensearch/sql/expression/function/udf/binning/WidthBucketFunction.java (1)
136-139: Pre-existing: Floating-point modulo equality comparison may be imprecise.This is not part of your changes, but
maxValue % optimalWidth == 0may not behave as expected due to floating-point precision. For example,10.0 % 0.1may not equal exactly0.0. Consider using a tolerance-based comparison if this causes issues.♻️ Optional fix
- if (maxValue % optimalWidth == 0) { + double remainder = maxValue % optimalWidth; + if (Math.abs(remainder) < 1e-10 || Math.abs(remainder - optimalWidth) < 1e-10) {core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/ExtractFunction.java (1)
46-53: Add JavaDoc@returnfor the non-null contract.
Public methods in this module should document their return contract; the new non-null guarantee should be explicitly stated.📝 Suggested JavaDoc
`@Override` + /** + * Returns operand metadata for the EXTRACT function. + * + * `@return` non-null operand metadata for (STRING, DATE/TIME/TIMESTAMP/STRING) + */ public `@NonNull` UDFOperandMetadata getOperandMetadata() { return PPLOperandTypes.STRING_DATETIME; }As per coding guidelines, ...
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.java (1)
47-78: Add a defensive arity check before accessing operand(2).If an invalid call slips through, this currently risks an
IndexOutOfBoundsExceptioninstead of a clearer validation error. A small guard improves diagnostics.♻️ Proposed fix
if (sqlOperatorBinding instanceof RexCallBinding) { return sqlOperatorBinding.getOperandType(sqlOperatorBinding.getOperandCount() - 1); } else if (sqlOperatorBinding instanceof SqlCallBinding callBinding) { + if (callBinding.getOperandCount() < 3) { + throw new IllegalArgumentException( + "reduce requires at least 3 operands (array, base, lambda)"); + } RelDataType elementType = callBinding.getOperandType(0).getComponentType(); RelDataType baseType = callBinding.getOperandType(1); SqlLambda reduce1 = callBinding.getCall().operand(2);
Description
This PR migrates from our custom type checking mechanism to Apache Calcite's native validation system by introducing a SqlNode validation layer. This addresses the lack of a proper SQL validation phase and resolves compatibility issues with user-defined types (UDTs).
This implements Approach 1 described in #3865.
How It Works
The PR introduces a validation layer in the query execution pipeline:
This approach leverages Calcite's robust type system while preserving OpenSearch's custom type semantics through careful type mapping and restoration.
Benefits
sqrt(x)→pow(x, 0.5)) before generating physical plansWork Items / Implementation Details
Core Infrastructure
deriveType,coerceOperandType, and type inference methodsType System Enhancements
SqlTypeNameenum with OpenSearch UDTs through dynamic type mapping:EXPR_TIME→SqlTypeName.TIME) before validationcommonTypeForBinaryComparisonto enable datetime cross-type comparisons (DATE vs TIME, etc.)SAFE_CASTfor string-to-number conversions to tolerate malformatted data; useCASTfor literal numbersFunction & Operator Handling
array_slice,reduce,mvappend, etc.) with proper type inferencejson_extract,json_set, etc.)atanoverloading, percentile approximations)DISTINCT_COUNT_APPROX,COUNT(*)rewriting, etc.)ADDoperator: String concatenation vs numeric additionATAN: Single-operand vs two-operand versionsGEOIP: String overrides due to UDT erasureSqlKindforDIVIDEandMODUDFsQuery Construct Support
RelHintthrough conversions (addedSqlHintforLogicalAggregate)bucket_nullableflagsIN/NOT INwith tuple inputs via row constructor rewritingDialect & Compatibility
LogicalValuesfor empty row generationEdge Cases & Fixes
SAFE_CASTRelToSqlConverteris instantiated per-use (stateful component)Test Fixes
CalcitePPLDateTimeBuiltinFunctionIT: Interval semanticsCalcitePPLBuiltinFunctionIT: LOG function, sarg deserializationCalciteArrayFunctionIT: Type checkers, reduce function inferenceCalciteMathematicalFunctionIT,CalcitePPLAggregationITCalciteBinCommandIT: Timestamp operations, windowed aggregates in GROUP BYCalciteStreamstatsCommandIT: Sort columns, bucket_nullableCalcitePPLJsonBuiltinFunctionIT: String conversionCode Cleanup
PPLFuncImpTableUDFOperandMetadata.wrapUDTinterfaceCalciteFuncSignature,PPLTypeCheckerEnhancedCoalesceinto built-inCoalesceOptimizations
SAFE_CASTon non-string literal numbers (useCASTfor better performance)dedupoptimization issuesPerformance Impact
Profiled on personal laptop, each test runs twice, then averaged.
Conclusion: No significant performance degradation. ClickBench shows slight overhead (+12.3%) during analyze phase due to validation, but big5 and TPCH show improvements, likely from better query optimization enabled by proper type information.
Related Issues
Resolves #4636, resolves #3865
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.