Skip to content

Conversation

@yuancu
Copy link
Collaborator

@yuancu yuancu commented Dec 2, 2025

Description

This PR migrates from our custom type checking mechanism to Apache Calcite's native validation system by introducing a SqlNode validation layer. This addresses the lack of a proper SQL validation phase and resolves compatibility issues with user-defined types (UDTs).

This implements Approach 1 described in #3865.

How It Works

The PR introduces a validation layer in the query execution pipeline:

RelNode → SqlNode → [Validation & Type Coercion] → SqlNode → RelNode

This approach leverages Calcite's robust type system while preserving OpenSearch's custom type semantics through careful type mapping and restoration.

Benefits

  • Comprehensive type checking: Access to Calcite's full type checker suite (variable-length operands, same-operand constraints, OR-relationship type checkers) vs. our limited family-based checking
  • Function optimization: Ability to rewrite functions at the SqlNode level (e.g., sqrt(x)pow(x, 0.5)) before generating physical plans
  • Automatic type coercion: Implicit casting where type-safe and appropriate, improving query compatibility
  • Reduced maintenance: Leverage Calcite's battle-tested validation logic instead of maintaining custom implementations

Work Items / Implementation Details

Core Infrastructure

  • Introduced SqlNode validation layer with round-trip conversion (RelNode ↔ SqlNode)
  • Extended Calcite validator to support OpenSearch UDTs via type substitution
  • Implemented custom deriveType, coerceOperandType, and type inference methods
  • Made validator thread-safe with proper synchronization

Type System Enhancements

  • UDT Support: To support OpenSearch-specific UDTs (which Calcite doesn't natively support), we bridge Calcite's SqlTypeName enum with OpenSearch UDTs through dynamic type mapping:
    1. Convert UDTs to standard SQL types (e.g., EXPR_TIMESqlTypeName.TIME) before validation
    2. Apply Calcite's validation and type coercion
    3. Restore UDTs after validation to maintain plan correctness
  • Cross-type Comparisons: Override commonTypeForBinaryComparison to enable datetime cross-type comparisons (DATE vs TIME, etc.)
  • IP Type Handling: Emulated IP type using standard SQL types for validation, with special handling in operators
  • Composite Types: Extended SQL-UDT conversion to handle nested/composite types
  • Type Coercion: Use SAFE_CAST for string-to-number conversions to tolerate malformatted data; use CAST for literal numbers

Function & Operator Handling

  • Defined operand type checkers for all built-in functions:
    • Array functions (array_slice, reduce, mvappend, etc.) with proper type inference
    • JSON functions (json_extract, json_set, etc.)
    • Mathematical functions (corrected atan overloading, percentile approximations)
    • Aggregation functions (DISTINCT_COUNT_APPROX, COUNT(*) rewriting, etc.)
  • Function overloading support:
    • ADD operator: String concatenation vs numeric addition
    • ATAN: Single-operand vs two-operand versions
    • GEOIP: String overrides due to UDT erasure
  • Arithmetic operations between strings and numerics (implicit coercion)
  • Define SqlKind for DIVIDE and MOD UDFs

Query Construct Support

  • Preserve null ordering and collation through SqlNode round-trips
  • Support SEMI and ANTI joins in SQL conversion
  • Preserve sort orders in subqueries
  • Pass RelHint through conversions (added SqlHint for LogicalAggregate)
  • Handle windowed aggregates and bucket_nullable flags
  • Support IN/NOT IN with tuple inputs via row constructor rewriting

Dialect & Compatibility

  • Extended OpenSearch Spark SQL dialect for custom syntax
  • Fixed interval semantics mismatch between SQL and PPL
  • Fixed quarter interval bug in Calcite
  • Handle identifier expansion in aggregate functions
  • Properly handle LogicalValues for empty row generation

Edge Cases & Fixes

  • Skip validation for unsupported patterns:
    • Bin-on-timestamp operations (not yet implemented)
    • Group-by window functions (return original plan gracefully)
    • Specific aggregation patterns with LogicalValues
  • Fixed nullability attribute preservation through SAFE_CAST
  • Trim unused fields after SQL→Rel conversion
  • Ensure RelToSqlConverter is instantiated per-use (stateful component)
  • Fixed float literal handling with explicit casts
  • Remove JSON_TYPE operator insertion where inappropriate

Test Fixes

  • Fixed integration tests across multiple suites:
    • CalcitePPLDateTimeBuiltinFunctionIT: Interval semantics
    • CalcitePPLBuiltinFunctionIT: LOG function, sarg deserialization
    • CalciteArrayFunctionIT: Type checkers, reduce function inference
    • CalciteMathematicalFunctionIT, CalcitePPLAggregationIT
    • CalciteBinCommandIT: Timestamp operations, windowed aggregates in GROUP BY
    • CalciteStreamstatsCommandIT: Sort columns, bucket_nullable
    • CalcitePPLJsonBuiltinFunctionIT: String conversion
  • Updated all explain integration tests for plan changes
  • Fixed YAML tests, doctests, and unit tests
  • Updated ClickBench query plans

Code Cleanup

  • Removed legacy type checking from PPLFuncImpTable
  • Deprecated UDFOperandMetadata.wrapUDT interface
  • Removed unused classes: CalciteFuncSignature, PPLTypeChecker
  • Consolidated EnhancedCoalesce into built-in Coalesce
  • Removed type checkers from operator registration (now handled by Calcite)

Optimizations

  • Eliminated SAFE_CAST on non-string literal numbers (use CAST for better performance)
  • Investigated and addressed dedup optimization issues

Performance Impact

Dataset Scenario Analyze (ms) Optimize (ms) Execute (ms) Total Avg (ms)
clickbench Without Validation 7.93 8.29 0.95 17.53
With Validation 9.59 8.72 1.00 19.69
Difference +1.66 (+20.9%) +0.43 (+5.2%) +0.05 (+5.3%) +2.16 (+12.3%)
big5 Without Validation 3.52 4.27 0.80 8.99
With Validation 3.37 3.80 0.72 8.22
Difference -0.15 (-4.3%) -0.47 (-11.0%) -0.08 (-10.0%) -0.77 (-8.6%)
tpch Without Validation 8.65 692.00 183.15 1028.80
With Validation 7.91 689.16 175.63 1008.77
Difference -0.74 (-8.6%) -2.84 (-0.4%) -7.52 (-4.1%) -20.03 (-1.9%)

Profiled on personal laptop, each test runs twice, then averaged.

Conclusion: No significant performance degradation. ClickBench shows slight overhead (+12.3%) during analyze phase due to validation, but big5 and TPCH show improvements, likely from better query optimization enabled by proper type information.

Related Issues

Resolves #4636, resolves #3865

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@opensearch-project opensearch-project deleted a comment from coderabbitai bot Dec 2, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Implicit type coercion (numeric↔string, datetime strings, IP), safer casts in filters, LEFT SEMI/ANTI JOIN support, IP-aware operations, improved string/number arithmetic, and window OVER clauses preserving surrounding ORDER BY.
  • Bug Fixes

    • More consistent datetime/range/bin behavior, improved mixed-type BETWEEN/IN handling, and more stable dedup/aggregation naming and results.
  • Documentation

    • Updated examples and outputs; some samples marked non-executable.
  • Tests

    • Explain-plan expectations moved to YAML and integration coverage expanded.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds a Calcite-based validation pipeline and PPL-specific type-coercion/validation: introduces validator, coercion rules, convertlets, Rel<->Sql converters/shuttles, operator-table provider, and many UDF operand-metadata/type-utility changes. Removes legacy coercion/type-checker framework and updates large test fixtures from JSON to YAML.

Changes

Cohort / File(s) Summary Attention
Calcite validation core
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java, core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java, .../PplTypeCoercion.java, .../PplTypeCoercionRule.java, .../ValidationUtils.java, .../SqlOperatorTableProvider.java, .../PplConvertletTable.java
Adds PPL-aware SqlValidator wiring, lazy validator init, pluggable operator-table provider, PPL validator and coercion implementations, convertlet table. Global provider lifecycle, validator initialization and thread-safety.
Rel↔Sql converters & shuttles
core/src/main/java/org/opensearch/sql/calcite/validate/converters/OpenSearchRelToSqlConverter.java, .../OpenSearchSqlToRelConverter.java, .../shuttles/*
New Rel->Sql and Sql->Rel converters and shuttles to support validation round-trip, preserve hints, and translate SEMI/ANTI joins. Join semantics, hint fidelity, converter recursion.
Query validation flow
core/src/main/java/org/opensearch/sql/executor/QueryService.java
Inserts Rel→Sql→validate→Rel validation step (with shuttles and fallbacks) before Calcite plan conversion; SkipRelValidationShuttle conditional bypass. Error tolerance, skip gating, performance impact.
Type system & utilities
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java, .../OpenSearchTypeUtil.java, .../ValidationUtils.java, ExtendedRexBuilder.java
New centralized type utilities (UDT-aware), leastRestrictive, and PPL-aware casting/return-type derivation. UDT handling and numeric/character arithmetic/casts.
Window/collation embedding
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java, CalciteRexNodeVisitor.java
Embed surrounding RelCollation into RexOver windows; simplify window function call construction. Window ORDER BY semantics and determinism.
Dialect & convertlets
core/src/main/java/org/opensearch/sql/calcite/validate/OpenSearchSparkSqlDialect.java, PplConvertletTable.java
Custom cast handling for IP, conformance tweak to allow SEMI/ANTI joins, convertlets for IP-aware operator lowering. IP type casting, conformance impacts.
Function registry & operators
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java, PPLBuiltinOperators.java, UDFOperandMetadata.java, PPLOperandTypes.java, UserDefinedFunctionBuilder.java
Migrate to single-implementation-per-function model, replace custom PPL type-checkers with Calcite OperandTypes, refactor operator → function declarations, default UDF metadata. Overload/backward-compat behavior, function resolution changes.
Removed legacy coercion framework
core/src/main/java/org/opensearch/sql/expression/function/CoercionUtils.java, PPLTypeChecker.java, CalciteFuncSignature.java
Deletes custom coercion and PPLTypeChecker machinery; replaces with Calcite-based coercion/validators. Ensure all call sites migrated; risk of missing behavior.
UDFs: operand metadata and nullability
core/src/main/java/org/opensearch/sql/expression/function/** (many files)
Annotates getOperandMetadata() as @NonNull, supplies explicit OperandTypes or custom SqlOperandTypeChecker wrappers across many UDFs (collection, json, ip, datetime, math, etc.). Ensure operand checks and allowed-signatures remain correct.
IP functions & CIDR
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/*.java
Replace static UDT-signatures with SqlOperandTypeChecker implementations using OpenSearchTypeUtil.isIp. IP detection semantics and error messages.
Calcite tool helpers
core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java, PlanUtils.java
Expose preparing statement publicly; replace inline SqlToRelConverter usage; use OpenSearchTypeUtil for time checks. Visibility change, removal of nested converter class.
Rel->Sql fixes
core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/PplRelToSqlRelShuttle.java, SqlRewriteShuttle.java, SkipRelValidationShuttle.java
Add shuttles to fix literal/intervals, rewrite COUNT()/IN lists to ROW, and skip validation for specific patterns. Correctness of rewritten SQL and skip heuristics.
Tests & fixtures
integ-test/src/test/..., core/src/test/..., api/src/test/..., integ-test/src/test/resources/expectedOutput/calcite/*
Mass migration from JSON to YAML expectations; new/updated expected plans; test adjustments (exceptions, conditional skips, alternative plan matches). Update CI baseline, flaky plan variants, YAML loader correctness.
Docs & examples
docs/user/ppl/*, docs/dev/ppl-functions.md, api/src/test/.../UnifiedQueryTranspilerTest.java
Docs: replace runnable examples with ignore blocks; update function registration guidance; update transpiler test import and expected SQL. Ensure docs match new registration and operand-checker model.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client as Client
    participant QS as QueryService
    participant Rel as PPL RelNode
    participant RelToSql as OpenSearchRelToSqlConverter
    participant Val as PplValidator
    participant SqlToRel as OpenSearchSqlToRelConverter
    participant Planner as CalcitePlanner

    Client->>QS: submit PPL query
    QS->>Rel: build initial RelNode
    QS->>RelToSql: convert RelNode → SqlNode
    RelToSql-->>QS: SqlNode
    QS->>Val: validate SqlNode (PplValidator + PplTypeCoercion + PplTypeCoercionRule)
    alt validation succeeds
      Val-->>QS: validated SqlNode
      QS->>SqlToRel: convert validated SqlNode → RelNode
      SqlToRel-->>QS: validated RelNode
    else validation tolerant failure
      Val-->>QS: tolerated exception
      QS-->>QS: fallback to original RelNode
    end
    QS->>Planner: convert validated RelNode → executable plan
    Planner-->>Client: execute plan / produce results
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • ps48
  • kavithacm
  • derek-ho
  • joshuali925
  • dai-chen
  • anirudha
  • LantaoJin
  • mengweieric
  • vamsimanohar
  • Swiddis
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.60% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately and concisely summarizes the primary change: introducing Calcite validation and type-coercion.
Description check ✅ Passed Description is detailed and directly related to the changeset, describing the validation layer, UDT mapping, coercion approach, and testing.
Linked Issues check ✅ Passed Implements core objectives from [#4636] and [#3865]: Rel→Sql validation round‑trip, Calcite-based coercion, UDT↔SQL mapping/restoration, and PplValidator/PplTypeCoercion.
Out of Scope Changes check ✅ Passed No major out‑of‑scope code detected; most edits implement Calcite validation/coercion and required refactors (UDT mapping, function registry, tests/docs).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@yuancu yuancu added the enhancement New feature or request label Dec 3, 2025
@yuancu yuancu force-pushed the issues/4636 branch 5 times, most recently from e085f81 to fc6dd27 Compare December 15, 2025 13:40
…checking

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/executor/QueryService.java

# Conflicts:
#	core/src/main/java/org/opensearch/sql/executor/QueryService.java
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
… logics

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- 2 more ITs passed in PPLBuiltinFunctionIT

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
- this fix testRand, where desrialization of sarg does not restore its type
- todo: update the toRex in ExtendedRelJson to the align with the latest version

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…estamp; (time, timestamp) -> timestamp (1240/1599)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…2/1872)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- allow type cast
- rewrite call to sql compare to custom ip comapre

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…1356/1599 | 1476/1915)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…d in mvindex's implementation (1580/2015)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…iting (1579/2015)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…pe hint

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…e inference (1701/2015)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
@qianheng-aws
Copy link
Collaborator

This PR validate the whole SQL including type compatible, SQL grammer, field access and others. However what we need is only type compatibility since the relbuilder will not build an incorrect plan.

I'm thinking if we can only convert recall to sqlnode and invoke validateCall to do that part of validation(should include type coercion), and then convert it back to recall.

@yuancu
Copy link
Collaborator Author

yuancu commented Jan 8, 2026

I'm thinking if we can only convert recall to sqlnode and invoke validateCall to do that part of validation(should include type coercion), and then convert it back to recall.

It's kind of tricky, the validation of these are deeply entangled. E.g. if a function call is at a very bottom level of the AST, the validator has to go down the AST and validate all the nodes it comes across to build a proper validation scope for the function call. If we want to try to build them for these function calls, we may need to replicate those steps, which does not differ much from the original validation.

yuancu added 2 commits January 9, 2026 15:55
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/user/ppl/interfaces/endpoint.md (1)

56-64: Fix duplicate "Description" section and add documentation for SAFE_CAST behavior in explain output.

The Explain section has a duplicate "### Description" heading at lines 58 and 61 that should be consolidated into a single introduction.

Additionally, while the examples show SAFE_CAST in the logical plan output (e.g., SAFE_CAST(30:BIGINT)), there is no explanation of what SAFE_CAST is, how it differs from CAST, or why it appears in the output. For users upgrading from previous versions and any automated tooling that parses explain output, clarification would be valuable:

  • Explain that SAFE_CAST is used in logical plans for type-safe casting with malformed string handling
  • Note that the physical plan shows the optimized form without SAFE_CAST (as seen in examples where logical shows SAFE_CAST(30:BIGINT) but physical shows >($5, 30))
  • Consider a brief note that this represents a change in output format for users parsing explain responses
🤖 Fix all issues with AI agents
In @core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java:
- Around line 129-168: The getValidator method in CalcitePlanContext currently
calls connection.createStatement() without ensuring the Statement is closed;
change the try/catch that obtains CalciteServerStatement to a try-with-resources
that declares a Statement (e.g., try (Statement stmt =
connection.createStatement()) { CalciteServerStatement statement =
stmt.unwrap(CalciteServerStatement.class); ... } ), move the validator creation
into that try block so the Statement is closed automatically, and remove the old
catch/throw pattern while preserving the existing IllegalStateException and
validator initialization logic.
🧹 Nitpick comments (5)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java (1)

458-465: Add verifySchema() for consistency with other BETWEEN tests.

All other BETWEEN tests in this file (e.g., testBetween, testBetweenWithDifferentTypes) verify the schema before checking data rows. This test should follow the same pattern for consistency.

♻️ Proposed fix
 @Test
 public void testBetweenWithMixedTypes() throws IOException {
   JSONObject actual =
       executeQuery(
           String.format(
               "source=%s | where age between '35' and 38 | fields firstname, age",
               TEST_INDEX_BANK));
+  verifySchema(actual, schema("firstname", "string"), schema("age", "int"));
   verifyDataRows(actual, rows("Hattie", 36), rows("Elinor", 36));
 }
docs/user/ppl/interfaces/endpoint.md (1)

122-124: Clarify the different constant representations across plan formats.

The same literal constant 30 appears in three different forms across the extended format output:

  • Logical plan: SAFE_CAST(30:BIGINT) (line 122)
  • Physical plan: 30:BIGINT (line 123)
  • Extended Java code: 30L (line 124)

While these representations serve different purposes (logical abstraction, physical optimization, runtime code), users may find the variations confusing without explanation.

Consider adding a brief explanation in the documentation about how the same operation is represented differently across plan stages, or reference Calcite's documentation on plan representations.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

1810-1818: Note the TODO comment regarding pushdown behavior.

The TODO on line 1810 indicates that the chart with useother=true scenario currently doesn't achieve pushdown as expected. This is flagged for future investigation.

Consider creating a tracking issue for this if one doesn't already exist, to ensure it's addressed in a follow-up.

Would you like me to help create a GitHub issue to track this pushdown optimization opportunity?

core/src/main/java/org/opensearch/sql/executor/QueryService.java (1)

283-341: Well-structured validation pipeline with comprehensive JavaDoc.

The validate method correctly implements the RelNode → SqlNode → validate → RelNode round-trip as per PR objectives. A few observations:

  1. Line 311: Objects.requireNonNull(rewritten) will throw NPE if SqlRewriteShuttle returns null. Consider whether a more descriptive exception would aid debugging.

  2. Line 314: Silent fallback for tolerant validation exceptions may make debugging difficult. Consider adding debug-level logging when falling back.

  3. The method is at the ~50-line threshold per coding guidelines. The current organization is logical, but if it grows further, consider extracting the SqlToRelConverter configuration into a helper.

💡 Optional: Add debug logging for validation fallback
     } catch (CalciteContextException e) {
       if (ValidationUtils.tolerantValidationException(e)) {
+        log.debug("Tolerant validation fallback triggered: {}", e.getMessage());
         return relNode;
       }
       throw new ExpressionEvaluationException(e.getMessage(), e);
     }
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)

84-93: Static operatorTableProvider requires careful initialization ordering.

The static operatorTableProvider must be set before any CalcitePlanContext calls getValidator(). The IllegalStateException at line 141-142 guards against this, but consider documenting the initialization requirement in the class JavaDoc.

📝 Optional: Add class-level documentation for initialization order
/**
 * ...existing class doc...
 * 
 * <p><b>Initialization:</b> {@link #setOperatorTableProvider(SqlOperatorTableProvider)} 
 * must be called during module initialization before any context creates a validator.
 */
public class CalcitePlanContext {
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff91bdc and a6839e2.

📒 Files selected for processing (76)
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • docs/user/ppl/interfaces/endpoint.md
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/access_struct_subfield_with_item.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_date_histogram_daily.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/date_histogram_minute_agg.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_in_range.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_keyword_can_match_shortcut.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_keyword_no_can_match_shortcut.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_asc_with_match.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_desc_with_match.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q11.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q12.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q13.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q14.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q15.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q22.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q23.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q31.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q32.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q43.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_bin_aligntime.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_bin_span.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_eval_max.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_eval_min.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_fillnull_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_fillnull_value_syntax.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_time_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_push.yaml
✅ Files skipped from review due to trivial changes (1)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
🚧 Files skipped from review as they are similar to previous changes (12)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_fillnull_value_syntax.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_time_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_fillnull_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_keyword_no_can_match_shortcut.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_eval_max.yaml
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java
  • integ-test/src/test/resources/expectedOutput/calcite/access_struct_subfield_with_item.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_date_histogram_daily.yaml
🧰 Additional context used
📓 Path-based instructions (8)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources

  • Verify test data files exist in integ-test/src/test/resources/
  • Check test assertions are meaningful and specific
  • Validate tests clean up resources after execution
  • Ensure tests are independent and can run in any order
  • Flag tests that reference non-existent indices (e.g., EMP)
  • Verify integration tests are in correct module (integ-test/)
  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_eval_min.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q32.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_bin_span.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q31.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_in_range.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/date_histogram_minute_agg.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_asc_with_match.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q23.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q11.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q14.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q22.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q15.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q12.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_keyword_can_match_shortcut.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q13.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_bin_aligntime.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_desc_with_match.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q43.yaml
core/src/main/java/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit

  • Public methods MUST have JavaDoc with @param, @return, and @throws
  • Follow existing function implementation patterns in the same package
  • New expression functions should follow ExpressionFunction interface patterns
  • Validate function naming follows project conventions (camelCase)

Files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

⚙️ CodeRabbit configuration file

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java: - Flag methods >50 lines - this file is known to be hard to read

  • Suggest extracting complex logic into helper methods
  • Check for code organization and logical grouping
  • Validate all RelNode types are handled

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_eval_min.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q32.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1.yaml
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_bin_span.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q31.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yaml
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_in_range.yaml
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/date_histogram_minute_agg.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml
  • docs/user/ppl/interfaces/endpoint.md
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_asc_with_match.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q23.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q11.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q14.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q22.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q15.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q12.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_keyword_can_match_shortcut.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q13.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_bin_aligntime.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/sort_numeric_desc_with_match.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q43.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-29T05:32:11.893Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:11.893Z
Learning: In opensearch-project/sql, when creating custom Calcite RelNode classes that extend EnumerableLimitSort or other Calcite RelNode types, always override the `copy` method. Without overriding copy, the class will downgrade to its parent class type during copy operations, losing the custom implementation.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml
  • docs/user/ppl/interfaces/endpoint.md
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • docs/user/ppl/interfaces/endpoint.md
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_bin_aligntime.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_date_string.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Development requires JDK 21 for the OpenSearch SQL project

Applied to files:

  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
🧬 Code graph analysis (2)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java (1)
integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java (1)
  • MatcherUtils (43-502)
core/src/main/java/org/opensearch/sql/executor/QueryService.java (9)
core/src/main/java/org/opensearch/sql/calcite/validate/OpenSearchSparkSqlDialect.java (1)
  • OpenSearchSparkSqlDialect (28-113)
core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplRelToSqlNodeConverter.java (1)
  • PplRelToSqlNodeConverter (36-172)
core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplSqlToRelConverter.java (1)
  • PplSqlToRelConverter (24-92)
core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/PplRelToSqlRelShuttle.java (1)
  • PplRelToSqlRelShuttle (28-83)
core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SkipRelValidationShuttle.java (1)
  • SkipRelValidationShuttle (85-175)
core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SqlRewriteShuttle.java (1)
  • SqlRewriteShuttle (21-69)
common/src/main/java/org/opensearch/sql/common/utils/QueryContext.java (1)
  • QueryContext (18-87)
core/src/main/java/org/opensearch/sql/exception/CalciteUnsupportedException.java (1)
  • CalciteUnsupportedException (8-17)
core/src/main/java/org/opensearch/sql/exception/ExpressionEvaluationException.java (1)
  • ExpressionEvaluationException (9-18)

Comment on lines +129 to +168
public SqlValidator getValidator() {
if (validator == null) {
synchronized (this) {
// Double-Checked Locking for thread-safety
if (validator == null) {
final CalciteServerStatement statement;
try {
statement = connection.createStatement().unwrap(CalciteServerStatement.class);
} catch (SQLException e) {
throw new RuntimeException(e);
}
if (operatorTableProvider == null) {
throw new IllegalStateException(
"SqlOperatorTableProvider must be set before creating CalcitePlanContext");
}
SqlValidator.Config validatorConfig =
SqlValidator.Config.DEFAULT
.withTypeCoercionRules(PplTypeCoercionRule.instance())
.withTypeCoercionFactory(PplTypeCoercion::create)
// Use lenient conformance for PPL compatibility
.withConformance(OpenSearchSparkSqlDialect.DEFAULT.getConformance())
// Use Spark SQL's NULL collation (NULLs sorted LOW/FIRST)
.withDefaultNullCollation(NullCollation.LOW)
// This ensures that coerced arguments are replaced with cast version in sql
// select list because coercion is performed during select list expansion during
// sql validation. Affects 4356.yml
// See SqlValidatorImpl#validateSelectList and AggConverter#translateAgg
.withIdentifierExpansion(true);
validator =
PplValidator.create(
statement,
config,
operatorTableProvider.getOperatorTable(),
TYPE_FACTORY,
validatorConfig);
}
}
}
return validator;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "CalciteServerStatement" --type java -B 2 -A 2

Repository: opensearch-project/sql

Length of output: 5211


🏁 Script executed:

# Check if CalciteServerStatement is defined in this codebase or imported from Calcite
rg -n "class CalciteServerStatement" --type java

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check for try-with-resources usage with statements in similar code
rg -n "createStatement()" --type java -B 2 -A 2 | head -50

Repository: opensearch-project/sql

Length of output: 4675


🏁 Script executed:

# Look at the imports and see where CalciteServerStatement comes from
head -50 core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java

Repository: opensearch-project/sql

Length of output: 2005


🏁 Script executed:

# Search for close() calls on statement variables
rg -n "statement\.close()" --type java

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check CalciteToolsHelper usage after statement creation
sed -n '160,180p' core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java

Repository: opensearch-project/sql

Length of output: 891


🏁 Script executed:

# Check how validator is used after it's created - is it long-lived?
sed -n '129,200p' core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java

Repository: opensearch-project/sql

Length of output: 2766


🏁 Script executed:

# Look at full method context of CalciteToolsHelper where statement is created
sed -n '140,180p' core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java

Repository: opensearch-project/sql

Length of output: 1652


Wrap connection.createStatement() in try-with-resources for proper resource cleanup.

The CalciteServerStatement created on line 136 is never explicitly closed. While this pattern is used consistently elsewhere in the codebase (e.g., CalciteToolsHelper.java:168), the coding guidelines require using try-with-resources for proper resource cleanup. Either use try-with-resources or document why this Calcite framework object does not require explicit closing.

Example fix:

final CalciteServerStatement statement;
try (Statement stmt = connection.createStatement()) {
  statement = stmt.unwrap(CalciteServerStatement.class);
  // rest of initialization
}
🤖 Prompt for AI Agents
In @core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java around
lines 129 - 168, The getValidator method in CalcitePlanContext currently calls
connection.createStatement() without ensuring the Statement is closed; change
the try/catch that obtains CalciteServerStatement to a try-with-resources that
declares a Statement (e.g., try (Statement stmt = connection.createStatement())
{ CalciteServerStatement statement = stmt.unwrap(CalciteServerStatement.class);
... } ), move the validator creation into that try block so the Statement is
closed automatically, and remove the old catch/throw pattern while preserving
the existing IllegalStateException and validator initialization logic.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/user/ppl/functions/collection.md (1)

165-201: Two TRANSFORM examples are appropriately disabled pending issue #4972, but note that MVMAP has the same pattern with three similarly disabled examples.

The TRANSFORM function examples at lines 167–172 and 185–190 are marked with ppl ignore and reference issue #4972. This is a consistent pattern—other lambda-based functions like FORALL, EXISTS, FILTER, and REDUCE retain working examples, suggesting the issue specifically affects TRANSFORM and MVMAP transformations. Verify that issue #4972 adequately captures all affected examples (TRANSFORM: 2 examples; MVMAP: 3 examples at lines 819–824, 837–842, 859–864) to ensure comprehensive tracking.

🧹 Nitpick comments (1)
docs/user/ppl/functions/collection.md (1)

818-875: MVMAP examples disabled with same pattern as TRANSFORM.

Three MVMAP examples are disabled, consistent with the TRANSFORM failures. Both functions involve array element transformations (lambdas for TRANSFORM, expressions for MVMAP).

Please verify that issue #4972 explicitly covers both TRANSFORM and MVMAP failures, as the root cause may be related to how the new Calcite validation handles array element-level operations.

Consider consolidating the TODO comments or creating a more descriptive issue title that clearly indicates "lambda and array transformation validation" to help future maintainers understand the scope.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6839e2 and 906d999.

📒 Files selected for processing (4)
  • docs/user/ppl/cmd/bin.md
  • docs/user/ppl/cmd/patterns.md
  • docs/user/ppl/functions/collection.md
  • docs/user/ppl/functions/conversion.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/user/ppl/cmd/patterns.md
  • docs/user/ppl/functions/conversion.md
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (1)
docs/user/ppl/cmd/bin.md (1)

501-508: The example correctly uses the ppl ignore directive, which is the documented approach for marking code blocks that should be skipped during testing. This pattern is consistently used throughout the codebase (in limitations.md, functions/collection.md, and others) to handle incomplete or broken examples—paired with HTML TODO comments explaining the issue. No changes are needed; the implementation follows established conventions.

Likely an incorrect or invalid review comment.

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java (1)

1014-1017: Test assertion doesn't match documented intent.

The test comment (lines 1004-1008) states it should verify "the error message clearly explains the limitation and suggests solutions," but the assertion now checks for an internal Calcite error (resolving method 'minus[...]') that provides no user-friendly guidance.

The TODO referencing issue #4973 acknowledges this is a temporary state. When addressing that issue, consider updating the error handling to surface a user-friendly message explaining that bins on timestamp fields requires pushdown to be enabled, rather than exposing internal Calcite method resolution failures.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 906d999 and 38ef160.

📒 Files selected for processing (6)
  • docs/user/ppl/cmd/bin.md
  • docs/user/ppl/cmd/patterns.md
  • docs/user/ppl/functions/collection.md
  • docs/user/ppl/functions/conversion.md
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
  • docs/user/ppl/cmd/bin.md
  • docs/user/ppl/cmd/patterns.md
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
  • docs/user/ppl/functions/conversion.md
  • docs/user/ppl/functions/collection.md
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources

  • Verify test data files exist in integ-test/src/test/resources/
  • Check test assertions are meaningful and specific
  • Validate tests clean up resources after execution
  • Ensure tests are independent and can run in any order
  • Flag tests that reference non-existent indices (e.g., EMP)
  • Verify integration tests are in correct module (integ-test/)
  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Verify AST generation for new PPL parser syntax
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: test-sql-cli-integration (21)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java (1)

1004-1017: Test assertion doesn't match stated verification intent.

The comment states "Verify the error message clearly explains the limitation and suggests solutions" and the assertion message expects "clear error message about bins parameter requirements," but the actual check validates an internal Calcite error (resolving method 'minus[class java.lang.String, class java.lang.String]'). This cryptic message won't help users understand that they need to enable pushdown for timestamp binning.

The TODO referencing #4973 acknowledges this, but consider updating the assertion message (lines 1011-1013) to reflect what's actually being checked, so the test failure message is accurate if the internal error changes:

     assertTrue(
-        "Expected clear error message about bins parameter requirements on timestamp fields, but"
-            + " got: "
+        "Expected Calcite method resolution error (tracked by #4973 for proper user message), but"
+            + " got: "
             + errorMessage,
🤖 Fix all issues with AI agents
In @docs/user/ppl/cmd/bin.md:
- Around line 501-502: Example 20 in the docs contains a non-functional ppl code
block marked only with an HTML TODO comment; either remove the entire Example 20
block until issue #4973 is resolved or make the limitation obvious to users by
replacing the HTML comment with a visible admonition (warning note) immediately
before the ```ppl ignore``` block that states the feature is not supported and
references issue #4973; ensure the visible note uses the same documentation
styling (admonition/warning) as other docs so users see it when rendered.
🧹 Nitpick comments (5)
integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_big_range_big_term_query.yaml (1)

1-1: Consider removing the leading blank line.

YAML files typically don't start with a blank line. While this won't affect parsing, it's inconsistent with standard YAML formatting conventions.

Suggested fix
-
 calcite:
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml (1)

12-15: Physical plan demonstrates proper pushdown optimization.

The expected physical plan correctly shows:

  • Dual-level EnumerableLimit for system limit (10000) and user pagination (offset 100, fetch 10)
  • Aggregation pushed to OpenSearch with multi_terms sized at 110 to satisfy offset + fetch
  • Boolean query structure combining term, range, script, and exists filters

One observation: the embedded base64-encoded scripts make it difficult to verify the serialized expressions are correct if they need future updates. Consider adding a comment in the test or documentation explaining how to decode/encode these blocks for maintenance purposes.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml (1)

1-1: Minor: Leading empty line in YAML file.

The file begins with an empty line. While this doesn't affect parsing, it's unconventional for YAML files. Consider removing for consistency with other expected output files, or ignore if this is a generated artifact.

integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml (1)

1-1: Unnecessary leading empty line.

The empty line at the start of the file is not needed and could be removed for cleaner formatting.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml (1)

1-1: Minor: Spurious blank line at file start.

An empty line at the beginning of the YAML file is unnecessary and could be removed for cleaner formatting, though it won't affect functionality.

🧹 Suggested fix
-
 calcite:
   logical: |
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 906d999 and 9ecc8ff.

📒 Files selected for processing (45)
  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
  • docs/user/ppl/cmd/bin.md
  • docs/user/ppl/cmd/patterns.md
  • docs/user/ppl/functions/collection.md
  • docs/user/ppl/functions/conversion.md
  • docs/user/ppl/interfaces/endpoint.md
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_big_range_big_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_big_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_small_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_disjunction_big_range_small_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_numeric.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_true_not_pushed.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_text_type_no_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_agg_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_function_script_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_script_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_then_limit_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_10_filter_5_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_then_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_basic.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_dedup_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml
✅ Files skipped from review due to trivial changes (3)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_agg_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_dedup_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter.yaml
🚧 Files skipped from review as they are similar to previous changes (22)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_true_not_pushed.yaml
  • docs/user/ppl/cmd/patterns.md
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_function_script_push.yaml
  • docs/user/ppl/functions/conversion.md
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having2.yaml
  • docs/user/ppl/interfaces/endpoint.md
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yaml
  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_text_type_no_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_script_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_then_limit_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_10_filter_5_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_big_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_numeric.yaml
🧰 Additional context used
📓 Path-based instructions (6)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources

  • Verify test data files exist in integ-test/src/test/resources/
  • Check test assertions are meaningful and specific
  • Validate tests clean up resources after execution
  • Ensure tests are independent and can run in any order
  • Flag tests that reference non-existent indices (e.g., EMP)
  • Verify integration tests are in correct module (integ-test/)
  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_then_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_big_range_big_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_basic.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_disjunction_big_range_small_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_small_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_then_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_big_range_big_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_basic.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_disjunction_big_range_small_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_small_term_query.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_basic.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having3.yaml
📚 Learning: 2025-12-29T05:32:11.893Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:11.893Z
Learning: In opensearch-project/sql, when creating custom Calcite RelNode classes that extend EnumerableLimitSort or other Calcite RelNode types, always override the `copy` method. Without overriding copy, the class will downgrade to its parent class type during copy operations, losing the custom implementation.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (31)
integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml (2)

7-7: LGTM! Explicit NULLS FIRST aligns with PR objectives.

The addition of NULLS FIRST to the window function correctly reflects the PR's goal of preserving null direction during RelNode ↔ SqlNode round trips. This makes the null ordering behavior explicit in the explain plan output, which is consistent with Calcite's validation system behavior.

The test data is realistic and representative of the expected Calcite explain output format.


13-13: Physical plan update is consistent with logical plan changes.

The physical plan correctly mirrors the logical plan's null ordering semantics. The OpenSearch aggregation query DSL with "order":[{"_count":"desc"},{"_key":"asc"}] is the appropriate translation for the rare/top operation with the preserved null direction.

integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_basic.yaml (1)

1-23: LGTM! Type annotation simplification aligns with Calcite validation changes.

The removal of explicit :VARCHAR annotations from string literals ('young', 'adult') is consistent with the PR's migration to Calcite's type validation system. The plan semantics remain correct:

  • Boundary conditions are properly handled (age < 30 for 'young', age >= 30 for 'adult' with no overlap).
  • Filter pushdown to OpenSearch index scans is preserved.
  • Aggregation and projection logic is intact.
integ-test/src/test/resources/expectedOutput/calcite/explain_limit_then_filter_push.yaml (2)

7-8: LGTM - LogicalProject addition is expected from SqlNode round-trip.

The explicit LogicalProject wrapping the index scan is a consequence of the RelNode → SqlNode → RelNode validation round-trip introduced by this PR. The physical plan shows this doesn't impact optimization effectiveness—the push-down still correctly projects only [age] and applies LIMIT->5 at the scan level.


11-11: LGTM - Explicit BIGINT typing reflects Calcite's type coercion.

The change from 30 to 30:BIGINT is the expected result of Calcite's validation system adding explicit type information to literals during coercion. This ensures type-safe comparison with the age field.

integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_big_range_big_term_query.yaml (1)

3-10: LGTM!

The test data is well-structured and covers the expected behavior for the Calcite integration:

  • The logical plan correctly represents the query structure with explicit type-safe range bounds (>=($28, 1), <=($28, 100)).
  • The physical plan properly reflects the typed Sarg representation (Sarg[[1L:BIGINT..100L:BIGINT]]:BIGINT), aligning with the PR's type coercion objectives.
  • The OpenSearch query builder correctly translates to a boolean query with term and range clauses, including boundary values.
  • Edge cases are covered with inclusive range boundaries.
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml (1)

1-11: Test data structure looks correct for dedup expression scenario.

The expected output covers:

  • Logical plan: Properly demonstrates dedup via ROW_NUMBER() OVER (PARTITION BY new_gender, new_state) with filter <= 2 to keep top 2 rows per partition.
  • Physical plan: Shows aggregation pushdown to OpenSearch with composite buckets on the partition keys and top_hits to retrieve actual documents.

A few observations:

  1. The physical plan on line 11 is extremely long (single line with Base64-encoded scripts), which is typical for these files but makes diff reviews difficult.
  2. The Base64 payloads encode LOWER function metadata for gender.keyword and state.keyword fields—consistent with the LOWER($4) and LOWER($7) expressions in the logical plan.

Given the PR objectives mention "Investigate dedup optimization failure" as remaining work, this alternative expected output file likely represents an acceptable plan shape during the investigation. Please confirm this test case aligns with the expected behavior after the Calcite validation/coercion changes.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml (1)

1-11: LGTM - Expected output correctly captures dedup-with-expressions pushdown.

The test data is realistic and representative:

  • Uses typical account index fields (account_number, firstname, balance, gender, state, etc.)
  • Validates the ROW_NUMBER-based dedup logic with keepempty=2 (condition <=($19, 2))
  • Physical plan correctly shows composite aggregation pushdown with LOWER function scripts for new_gender and new_state
  • top_hits size of 2 and missing_bucket=false properly align with the logical plan's dedup semantics and IS NOT NULL filters

The base64-encoded script content is expected for Calcite's serialized script handling.

integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_disjunction_big_range_small_term_query.yaml (2)

1-8: LGTM! Logical plan correctly represents the disjunction query.

The expected output properly captures the OR condition between a term match and a numeric range. The plan structure (SystemLimit → Sort → Project → Filter → IndexScan) is appropriate for this query pattern.

Minor: Line 1 adds an empty line at the start of the file, which is a minor formatting inconsistency but doesn't affect functionality.


9-10: LGTM! Physical plan correctly pushes down the disjunction to OpenSearch.

The pushdown context properly translates the OR condition:

  • Calcite's Sarg[[1L:BIGINT..100L:BIGINT]] representation for the range is appropriate
  • The OpenSearch DSL correctly uses bool.should to handle the disjunction with both term and range clauses
  • Predicate pushdown is successfully demonstrated, which aligns with the PR objectives for Calcite validation integration

This test case effectively covers the edge case of mixed query types (term + range) in a disjunction.

docs/user/ppl/functions/collection.md (2)

165-201: LGTM! Properly documented known issue.

The TRANSFORM examples are consistently marked with TODO comments and ppl ignore blocks, referencing issue #4972 for tracking. The expected outputs are preserved, which will facilitate re-enabling these examples once the issue is resolved.


818-875: LGTM! Consistent handling of MVMAP examples.

All three MVMAP examples are uniformly marked with TODO comments and ppl ignore blocks, referencing the same issue #4972. The approach is consistent with the TRANSFORM examples and preserves all expected outputs for future validation.

integ-test/src/test/resources/expectedOutput/calcite/big5/range_field_conjunction_small_range_small_term_query.yaml (1)

1-10: LGTM! Test data is realistic and covers expected edge cases.

The expected output correctly validates the Calcite validation/coercion pipeline:

  • The logical plan reflects the Sort → Project → Filter ordering consistent with PR objectives.
  • The filter predicate uses compound range semantics (AND(>=($28, 10), <=($28, 20))) and OR disjunction.
  • Field indices in the physical plan ($13metrics.size, $15aws.cloudwatch.log_stream) correctly map to the projected fields.
  • The Sarg[[10L:BIGINT..20L:BIGINT]]:BIGINT notation aligns with explicit type-casting introduced by the validation system.

The numeric bounds appearing as 10.0/20.0 (floats) in the ES query vs 10L:BIGINT in the Sarg is expected behavior for OpenSearch compatibility.

integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push.yaml (2)

7-11: Redundant LogicalProject nodes appear in the plan.

Lines 7 and 9 contain identical LogicalProject nodes projecting all 17 fields. This redundancy likely results from the RelNode → SqlNode → RelNode round-trip introduced by the Calcite validation phase. While functionally correct, this may indicate a missed optimization opportunity.

Given the PR TODOs mention "Investigate dedup optimization failure," this is expected behavior for now and should be addressed as part of that follow-up.

Confirm the redundant projections don't impact query performance in production and are handled by downstream optimization passes.


13-13: BIGINT-typed Sarg representation is correct and consistent.

The physical plan correctly uses Sarg[(30L:BIGINT..40L:BIGINT)]:BIGINT for the age range filter, and the JSON sourceBuilder correctly translates this to:

  • "include_lower":false, "include_upper":false matching the exclusive bounds (30..40)
  • Balance filter >($0, 10000) correctly maps to "include_lower":false

The field indexing after PROJECT->[balance, age] is also correct: $0 = balance, $1 = age.

integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_having3.yaml (1)

4-13: LGTM! Type coercion changes correctly reflect Calcite validation behavior.

The updated expected output properly captures the new Calcite-based type coercion:

  1. Logical plan (line 5): The explicit CAST(1000):DOUBLE NOT NULL for comparing against new_avg (derived from AVG, which returns DOUBLE) is the expected implicit cast insertion from Calcite's type coercion.

  2. Physical plan (line 12): Typed literals (1000.0E0:DOUBLE, 1:BIGINT) correctly match the operand types of the comparisons.

  3. Consistency: The second comparison >($4, 1) doesn't require a cast in the logical plan since COUNT() returns BIGINT and the literal 1 is compatible (typed as 1:BIGINT in the physical plan).

Test data is realistic, covers the type coercion edge case for numeric comparisons, and aligns with the PR's objective of migrating to Calcite's validation system.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yaml (1)

1-11: Logical plan structure looks correct.

The expected logical plan covers realistic analytics query patterns including:

  • Pagination with offset/limit and system query size limits
  • NULL handling with IS NOT NULL predicates
  • Complex filter conditions with OR clauses and date range comparisons
  • Aggregation with COUNT() grouped by multiple fields
  • SAFE_CAST usage aligns with the PR's type coercion approach for handling type mismatches
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml (2)

3-10: Logical plan structure is correct and consistent.

The dedup plan correctly:

  • Partitions ROW_NUMBER() by $4, $5 (new_gender, new_state) which are the LOWER expression results
  • Filters nulls on partition keys before windowing to ensure correct dedup behavior
  • Maintains sort order for deterministic dedup results

The column references align properly with the projection schema defined in line 9.


11-12: Physical plan pushdown is consistent with logical plan.

The OpenSearchRequestBuilder correctly reflects:

  • Composite aggregation on new_gender/new_state with LOWER scripts matching the logical projection
  • top_hits with size:2 matching the dedup limit condition <=($6, 2)
  • Proper _source includes and script_fields for computed columns

The test data is realistic and representative of Calcite's explain output for dedup with expressions.

integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml (2)

1-10: LGTM! Logical plan structure is consistent and correctly represents dedup semantics.

The column indices are properly aligned:

  • $28 correctly references metrics.size for both the partition key and NULL filter
  • $45 correctly references _row_number_dedup_ for the dedup filter (<= 1)
  • Outer projection indices match the inner projection field positions

The plan correctly handles the edge case of NULL partition keys with IS NOT NULL($28) before windowing.


12-13: Physical plan pushdown correctly implements dedup semantics.

The OpenSearch aggregation structure is appropriate:

  • Composite aggregation with top_hits(size=1) correctly deduplicates by taking first row per metrics.size partition
  • _source includes field list matches the logical projection
  • missing_bucket:false correctly excludes NULL partition keys (aligned with the IS NOT NULL filter in logical plan)
integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml (2)

5-9: Field indices correctly reflect added derived field.

The field index shifts ([$14]-[$26] and join condition =($13, $20)) are consistent with the addition of the $f13 derived field in line 7. The CAST(REX_EXTRACT(...)):VARCHAR wrapper aligns with the PR's Calcite validation system that automatically inserts type casts during coercion.


20-21: Physical plan correctly propagates CAST to pushdown context.

The EnumerableCalc and CalciteEnumerableIndexScan correctly incorporate the CAST(REX_EXTRACT(...)):VARCHAR in both the projection and the SORT_EXPR pushdown. This ensures the sort-merge join operates on type-coerced values, consistent with the logical plan.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml (2)

6-14: Logical plan structure looks correct.

The plan properly handles:

  • Empty string filtering on Referer (line 13)
  • Domain extraction via REGEXP_REPLACE with pattern ^https?://(?:www\.)?([^/]+)/.*$ (line 12)
  • NULL filtering on computed k field (line 11)
  • Aggregation with AVG, COUNT, MIN (line 9)
  • Post-aggregation filtering on count > 100000 (line 7)

The wide projection at line 12 covering all 111 clickbench fields is expected for the hits table schema.


18-19: Physical plan correctly pushes aggregation to OpenSearch.

The pushdown context appropriately includes:

  • Filter pushdown: <>($0, '') and existence check on Referer
  • Composite aggregation for group-by on computed k (domain extraction)
  • Sub-aggregations: avg with CHAR_LENGTH script, top_hits for min(Referer)

The base64-encoded Calcite scripts enable server-side evaluation of REGEXP_REPLACE and CHAR_LENGTH expressions, which is the expected pattern for complex expression pushdown.

integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml (2)

9-14: LGTM!

The logical plan structure is internally consistent. Column references correctly trace through the plan:

  • Aggregate outputs state@0, city@1, avg_age@2
  • Project reorders to avg_age@0, state@1, age2@2
  • Sort on $1 correctly sorts by state

The test data covers relevant scenarios including aggregation, computed columns (age2=[+($2, 2)]), NULL handling (ASC-nulls-first, IS NOT NULL), and deduplication with window functions.


21-21: LGTM!

The physical plan correctly pushes down the query to OpenSearch:

  • Filter >($2, 30) maps to the range query on age
  • Composite aggregation with state and city buckets matches the logical grouping
  • SORT->[0 ASC FIRST] aligns with "order":"asc" and "missing_order":"first" on the state bucket
  • The avg_age metric aggregation is correctly nested

The expected output accurately represents the Calcite enumerable index scan path with appropriate pushdown context.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml (3)

6-14: Logical plan structure looks correct for the Calcite validation changes.

The restructured plan correctly:

  • Reorders projections to place the final column mapping (l=[$0], c=[$1], CounterID=[$2]) after filtering.
  • Applies the filter >($1, 100000) on the aggregated count before final projection.
  • Expands all 111 fields in the intermediate LogicalProject (line 12) to preserve field indices during validation.
  • Maintains consistent field index references: $103 for CounterID and $26 for URL.

The plan flow (scan → filter non-empty URL → project → filter non-null CounterID → project subset → aggregate → reproject → filter count → final project) is logically sound.


18-18: Explicit BIGINT type annotation reflects Calcite type coercion.

The change from an implicit integer literal to 100000:BIGINT is expected behavior from the Calcite validation system, which now performs explicit type annotation during plan generation. This aligns with the PR's objective of migrating type checking/coercion to Calcite's validation system.


19-19: Script payload is valid and properly encoded.

The Base64-encoded script decodes successfully to a valid JSON structure that correctly represents the CHAR_LENGTH operation with a dynamicParam: 0 reference. The DIGESTS array ["URL"] properly maps to this dynamic parameter at index 0, confirming the field binding is correct.

docs/user/ppl/cmd/bin.md (1)

503-508: No changes required. The example is already correctly marked as not-yet-implemented with an explicit TODO comment (<!-- TODO: Enable after fixing https://github.com/opensearch-project/sql/issues/4973 -->) and the ignore flag on the code block. This is the proper way to document planned features that are not yet functional.

@yuancu
Copy link
Collaborator Author

yuancu commented Jan 13, 2026

Curious how much overhead will this validation process introduce?

I have updated a profiling result in the PR description. It's profiled with this script. Based on the result, the validation layer brings no significant performance degration.

@yuancu
Copy link
Collaborator Author

yuancu commented Jan 13, 2026

Hi @ykmr1224 , do you have any concern over this PR? I foresee that it will bring some impact to the schema-on-read solution, be it in a helpful or negative way. I'm worrying that this PR will make your solution difficult to proceed.

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (3)

184-197: Unused String.format argument.

The format string has only two placeholders (%s) but three arguments are provided. The third TEST_INDEX_WORK_INFORMATION argument at line 196 is never used.

🔧 Suggested fix
     assertYamlEqualsIgnoreId(
         expected,
         explainQueryYaml(
             String.format(
                 "source = %s"
                     + "| eval count_dept = ["
                     + "    source = %s | stats count(name)"
                     + "  ]"
                     + "| fields name, count_dept",
-                TEST_INDEX_WORKER, TEST_INDEX_WORK_INFORMATION, TEST_INDEX_WORK_INFORMATION)));
+                TEST_INDEX_WORKER, TEST_INDEX_WORK_INFORMATION)));

199-212: Unused String.format argument.

Same issue as above - the format string has only two %s placeholders but three arguments are provided at line 211.

🔧 Suggested fix
     assertYamlEqualsIgnoreId(
         expected,
         explainQueryYaml(
             String.format(
                 "source = %s"
                     + "| where id > ["
                     + "    source = %s | stats count(name)"
                     + "  ] + 999"
                     + "| fields name",
-                TEST_INDEX_WORKER, TEST_INDEX_WORK_INFORMATION, TEST_INDEX_WORK_INFORMATION)));
+                TEST_INDEX_WORKER, TEST_INDEX_WORK_INFORMATION)));

2016-2037: Missing @Test annotation - test will not execute.

The testComplexDedup method is missing the @Test annotation, so it will not be executed as part of the test suite.

🔧 Suggested fix
+  @Test
   public void testComplexDedup() throws IOException {
     enabledOnlyWhenPushdownIsEnabled();
🧹 Nitpick comments (4)
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yaml (1)

11-12: Physical plan correctly pushes down aggregation and scripted fields.

The physical plan appropriately translates to OpenSearch:

  • Composite aggregation with top_hits size=2 matching the dedup count
  • Script fields for computed columns with correct DIGESTS referencing source fields (gender.keyword, state.keyword)
  • Source includes/excludes properly configured

The single-line format for the physical plan JSON makes diff reviews challenging. Consider using YAML block scalar with literal style and formatting the JSON across multiple lines for improved readability and easier change tracking in future PRs—though this is a low-priority suggestion if exact-match testing requires this format.

integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java (1)

469-500: LGTM - well-structured overload following existing patterns.

The implementation correctly:

  • Validates non-empty input with a clear exception message
  • Short-circuits on first successful match
  • Propagates the last failure for diagnostic purposes
  • Has comprehensive JavaDoc

The pattern is consistent with the existing two-argument overload at lines 460-467.

🔧 Optional: Add null validation for defensive programming

For better error messages when debugging test failures, consider adding a null check:

 public static void assertYamlEqualsIgnoreId(List<String> expectedYamls, String actualYaml) {
+  if (expectedYamls == null) {
+    throw new IllegalArgumentException("expectedYamls must not be null");
+  }
   if (expectedYamls.isEmpty()) {
     throw new IllegalArgumentException("expectedYamls list must not be empty");
   }

This is optional since the existing methods in this file don't perform extensive null checks either, and test utility callers are expected to provide valid inputs.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (2)

1811-1818: TODO comment indicates a known regression.

The comment "TODO: Fix -- now it gets away from push-down" suggests that push-down optimization is not working as expected for this case. Consider creating an issue to track this.

Would you like me to open a GitHub issue to track this push-down optimization regression for the chart command with useother=true?


2066-2074: Non-deterministic plan output handling.

The use of List.of() with multiple expected plan files (explain_dedup_expr4.yaml, alternatives) handles legitimate optimizer non-determinism. This is an acceptable pattern, but consider documenting why multiple valid outputs exist (e.g., different but equivalent join orderings, filter placements).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ecc8ff and 35ac017.

📒 Files selected for processing (8)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative2.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java
🧰 Additional context used
📓 Path-based instructions (6)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources

  • Verify test data files exist in integ-test/src/test/resources/
  • Check test assertions are meaningful and specific
  • Validate tests clean up resources after execution
  • Ensure tests are independent and can run in any order
  • Flag tests that reference non-existent indices (e.g., EMP)
  • Verify integration tests are in correct module (integ-test/)
  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push.yaml
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (9)
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative1.yaml (1)

1-12: LGTM! Well-structured expected output for dedup-with-expressions scenario.

The logical plan correctly models:

  • ROW_NUMBER windowing partitioned by computed expressions ($4, $5 = new_gender, new_state from LOWER)
  • Null filtering on partition keys before dedup
  • Sort order preservation (ASC-nulls-first, DESC-nulls-last)

The physical plan appropriately shows:

  • Composite aggregation with scripted fields for the computed partition keys
  • top_hits with size=2 matching the <= 2 dedup filter
  • Base64-encoded calcite scripts for OpenSearch execution

This alternative1 variant correctly demonstrates dedup partitioning on expression-derived columns.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative2.yaml (1)

1-12: LGTM! Correctly represents an alternative dedup optimization path.

This alternative2 variant differs from alternative1 in a meaningful way:

  • Logical plan: PARTITION BY $1, $3 (original gender, state fields) vs alternative1's $4, $5 (computed new_gender, new_state)
  • Physical plan: Uses direct field terms ("field":"gender.keyword") instead of scripted aggregation sources

The distinction correctly demonstrates that the optimizer may choose to partition on either the source fields or the computed expressions depending on the query context. Both the logical structure and OpenSearch request payload are consistent with this optimization choice.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yaml (1)

1-10: Logical plan structure is correct and well-formed.

The logical plan correctly represents the dedup + rename transformation:

  • ROW_NUMBER() OVER (PARTITION BY $4, $5) for deduplication partitioning
  • <=($6, 2) filter for allowedDuplicates=2
  • LOWER($4) and LOWER($7) for the rename expressions
  • Proper null handling with IS NOT NULL filters
  • Sort directions preserved (ASC-nulls-first, DESC-nulls-last)
integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push.yaml (2)

1-5: LGTM - Logical plan structure is correct.

The logical plan correctly represents a count aggregation scenario:

  • LogicalSystemLimit enforces the query size limit
  • LogicalAggregate(group=[{}], cnt=[COUNT()]) represents a global COUNT aggregation
  • CalciteLogicalIndexScan on the standard test index

The hierarchy and notation follow Calcite conventions.


6-7: Physical plan correctly represents aggregation pushdown.

The OpenSearchRequestBuilder configuration is appropriate for a count aggregation:

  • size=0 correctly indicates no documents should be fetched (aggregation-only)
  • track_total_hits=2147483647 (Integer.MAX_VALUE) ensures accurate counting
  • AGGREGATION and LIMIT pushdowns are properly captured in PushDownContext

The rel#:LogicalAggregate.NONE.[] pattern appears to be ID-normalized for stable test comparisons (consistent with assertYamlEqualsIgnoreId usage mentioned in the PR).

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml (1)

6-12: LGTM! Column references are now consistent across the plan.

The updated expected output correctly reflects the fix where dedup partitioning and null checks now reference $1, $3 (gender, state) instead of $4, $5 (new_gender, new_state). This aligns the PARTITION BY columns with the sort columns used in both LogicalSystemLimit (line 3) and LogicalSort (line 8), and the physical plan's composite buckets on gender.keyword and state.keyword are consistent with this change.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (3)

32-49: LGTM!

The test class properly extends ExplainIT, enables Calcite mode, and loads the necessary test indices. The structure follows established integration test patterns.


444-482: LGTM!

The timechart tests verify the correct calculation formulas for per_second, per_minute, per_hour, and per_day functions. The expressions correctly include the time unit conversions and TIMESTAMPDIFF calculations.


1306-1331: LGTM - proper resource cleanup pattern.

Good use of try-finally blocks to ensure resetQueryBucketSize() is called even if assertions fail. This pattern is consistently applied throughout the test class for test isolation.

… subclass for validation

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

1818-1825: Track the TODO for push-down regression

The new TODO suggests a known push-down regression. Please link it to a tracking issue (or add one) so it doesn’t get lost.

If you want, I can draft the issue text or propose a targeted test to lock in the desired push-down behavior.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

1736-1835: Preserve explicit window ORDER BY when embedding collations.

embedExistingCollationsIntoOver unconditionally replaces the window's order keys with the plan collation, overwriting any explicit ORDER BY specified in the window. When a window already has orderKeys, they should be preserved. Consider only injecting plan collations when the window has no existing order keys.

🛠️ Proposed fix
                      public RexNode visitOver(RexOver over) {
                        RexWindow window = over.getWindow();
+                       ImmutableList<RexFieldCollation> orderKeys =
+                           window.orderKeys.isEmpty() ? rexCollations : window.orderKeys;
+                       if (orderKeys.isEmpty()) {
+                         return over;
+                       }
                        return context.rexBuilder.makeOver(
                            over.getType(),
                            over.getAggOperator(),
                            over.getOperands(),
                            window.partitionKeys,
-                           rexCollations,
+                           orderKeys,
                            window.getLowerBound(),
                            window.getUpperBound(),
                            window.isRows(),
                            true,
                            false,
                            over.isDistinct(),
                            over.ignoreNulls());
                      }
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)

398-425: Fail fast on null external registrations and make duplicate detection atomic.

Right now a null operator/aggFunction would surface as an NPE later, and containsKey + put can misreport duplicates under concurrency.

🐛 Proposed fix (null guard + atomic warn)
 public void registerExternalOperator(BuiltinFunctionName functionName, SqlOperator operator) {
-  if (externalFunctionRegistry.containsKey(functionName)) {
-    logger.warn(
-        String.format(Locale.ROOT, "Function %s is registered multiple times", functionName));
-  }
-  externalFunctionRegistry.put(functionName, (builder, args) -> builder.makeCall(operator, args));
+  if (operator == null) {
+    throw new IllegalArgumentException(
+        String.format(Locale.ROOT, "Operator cannot be null for function %s", functionName));
+  }
+  FunctionImp impl = (builder, args) -> builder.makeCall(operator, args);
+  FunctionImp previous = externalFunctionRegistry.put(functionName, impl);
+  if (previous != null) {
+    logger.warn(
+        String.format(Locale.ROOT, "Function %s is registered multiple times", functionName));
+  }
 }

 public void registerExternalAggOperator(
     BuiltinFunctionName functionName, SqlUserDefinedAggFunction aggFunction) {
-  if (aggExternalFunctionRegistry.containsKey(functionName)) {
-    logger.warn(
-        String.format(
-            Locale.ROOT, "Aggregate function %s is registered multiple times", functionName));
-  }
+  if (aggFunction == null) {
+    throw new IllegalArgumentException(
+        String.format(Locale.ROOT, "Aggregate function cannot be null for %s", functionName));
+  }
   AggHandler handler =
       (distinct, field, argList, ctx) ->
           UserDefinedFunctionUtils.makeAggregateCall(
               aggFunction, List.of(field), argList, ctx.relBuilder);
-  aggExternalFunctionRegistry.put(functionName, handler);
+  AggHandler previous = aggExternalFunctionRegistry.put(functionName, handler);
+  if (previous != null) {
+    logger.warn(
+        String.format(
+            Locale.ROOT, "Aggregate function %s is registered multiple times", functionName));
+  }
 }
🤖 Fix all issues with AI agents
In
`@core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/AppendFunctionImpl.java`:
- Around line 49-50: Update the operand type checker in AppendFunctionImpl by
changing getOperandMetadata() to use OperandTypes.VARIADIC instead of
OperandTypes.SAME_VARIADIC so mixed-type variadic arguments are allowed
(matching AppendCore.collectElements() behavior); mirror the approach used in
MVAppendFunctionImpl and keep the existing return-type inference logic intact to
handle zero-arg cases.

In
`@core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java`:
- Around line 494-510: The rewriteCall in PPLBuiltinOperators.ATAN unsafely
casts SqlCall to SqlBasicCall causing ClassCastException for other SqlCall
subclasses (e.g., SqlCase); fix by first checking if call instanceof
SqlBasicCall and if so setOperator as before, otherwise rebuild and return a new
SqlBasicCall using the chosen SqlOperator (SqlStdOperatorTable.ATAN2 or
SqlStdOperatorTable.ATAN), passing call.getOperandList() and
call.getParserPosition() so operands and position are preserved; ensure you
reference the SqlOperator selection logic (call.getOperandList().size() == 2)
and preserve semantics when constructing the new SqlBasicCall.

In `@integ-test/src/test/resources/expectedOutput/calcite/agg_filter_nested.yaml`:
- Line 5: Add a test record with a null or missing author.name to the dataset
that generates this expected output so the CASE expression in LogicalProject
($f0=[CASE(<($7, 'K'), 1, null:INTEGER)]) is exercised for the null branch;
update the input test data used by the calcite/agg_filter_nested test (the
sample documents) to include one document where author is absent or author.name
is explicitly null, then rerun the test to regenerate/verify the expected output
includes the null branch behavior.
♻️ Duplicate comments (5)
docs/user/ppl/interfaces/endpoint.md (1)

123-125: Add a brief note explaining logical vs physical cast representation.

The extended example now shows 30:BIGINT in the physical plan while the logical plan keeps >($5, 30). A short note would prevent confusion for users comparing plans or parsing explain output.

core/src/main/java/org/opensearch/sql/executor/QueryService.java (1)

310-317: Add logging when validation is skipped for tolerant exceptions.

When ValidationUtils.tolerantValidationException(e) returns true and the original relNode is returned, there's no logging to indicate that validation was bypassed. This makes it harder to troubleshoot queries that silently skip validation due to known Calcite limitations.

🔍 Suggested logging for observability
     try {
       validator.validate(Objects.requireNonNull(rewritten));
     } catch (CalciteContextException e) {
       if (ValidationUtils.tolerantValidationException(e)) {
+        log.debug("Skipping validation due to known Calcite limitation: {}", e.getMessage());
         return relNode;
       }
       throw new ExpressionEvaluationException(e.getMessage(), e);
     }
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)

140-145: Close the JDBC Statement created for validator initialization.

Line 142 creates a Statement that is never closed; this can leak resources under load. Please use try‑with‑resources or retain and close it when the context is disposed.

As per coding guidelines, use try-with-resources for resource cleanup.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (2)

500-515: Null guard for registerOperator (same as earlier note).

Still missing a null check on operator before building the lambda.


844-855: Guard empty-arg ADD dispatch (same as earlier note).

Stream.of(args).allMatch(...) returns true for empty args, selecting CONCAT unexpectedly; add an args.length > 0 guard and document the dispatch pattern.

🧹 Nitpick comments (5)
core/src/main/java/org/opensearch/sql/executor/QueryService.java (1)

283-292: Add @throws documentation for the exception case.

Per coding guidelines, public/private methods with JavaDoc should document exceptions. The method throws ExpressionEvaluationException on validation failure.

📝 Suggested JavaDoc update
  /**
   * Validates a RelNode by converting it to SqlNode, performing validation, and converting back.
   *
   * <p>This process enables Calcite's type validation and implicit casting mechanisms to work on
   * PPL queries.
   *
   * `@param` relNode the relation node to validate
   * `@param` context the Calcite plan context containing the validator
   * `@return` the validated (and potentially modified) relation node
+  * `@throws` ExpressionEvaluationException if validation fails with a non-tolerant error
   */
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MapAppendFunctionImpl.java (1)

49-50: Align MAP operand validation with String‑key expectations.

Line 50 now uses OperandTypes.family(SqlTypeFamily.MAP, SqlTypeFamily.MAP), which only checks the MAP family and doesn’t constrain key types. The implementation casts keys to String, so non‑string keys could pass validation but fail at runtime. Please confirm map keys are always VARCHAR in this engine; otherwise consider tightening the checker (custom SqlOperandTypeChecker) or adding a key‑type guard in verifyMap.

Calcite OperandTypes.family(SqlTypeFamily.MAP) does it validate MAP key types? How can a SqlOperandTypeChecker enforce Map<VARCHAR, ?> operands?
integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java (1)

491-504: Avoid re-cleaning the actual YAML inside the loop

You can precompute the cleaned actual YAML once and add a null/empty guard for clearer failures. This keeps comparisons consistent and slightly cheaper.

♻️ Proposed refactor
-  public static void assertYamlEqualsIgnoreId(List<String> expectedYamls, String actualYaml) {
-    if (expectedYamls.isEmpty()) {
+  public static void assertYamlEqualsIgnoreId(List<String> expectedYamls, String actualYaml) {
+    if (expectedYamls == null || expectedYamls.isEmpty()) {
       throw new IllegalArgumentException("expectedYamls list must not be empty");
     }
+    String cleanedActual = cleanUpYaml(actualYaml);
     AssertionError lastError = null;
     for (String expectedYaml : expectedYamls) {
       try {
-        assertYamlEquals(cleanUpYaml(expectedYaml), cleanUpYaml(actualYaml));
+        assertYamlEquals(cleanUpYaml(expectedYaml), cleanedActual);
         return;
       } catch (AssertionError e) {
         lastError = e;
       }
     }
     throw lastError;
   }
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)

90-96: Add an explicit setter with JavaDoc for the operator table provider.

Lombok’s public static setter won’t carry method‑level JavaDoc, but the guidelines require public methods to be documented. Consider replacing the Lombok setter with an explicit method (or reduce its visibility if it’s not part of the public API).

♻️ Suggested refactor
-  `@Setter` private static SqlOperatorTableProvider operatorTableProvider;
+  private static SqlOperatorTableProvider operatorTableProvider;
+
+  /**
+   * Sets the SQL operator table provider. This must be called during initialization by the
+   * opensearch module.
+   *
+   * `@param` provider the provider to use for obtaining operator tables
+   */
+  public static void setOperatorTableProvider(SqlOperatorTableProvider provider) {
+    operatorTableProvider = java.util.Objects.requireNonNull(provider, "provider");
+  }

As per coding guidelines, public methods should have JavaDoc.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)

986-1003: Consider guarding duplicate agg registrations.

AggBuilder.register currently allows silent overrides; mirroring the Builder guard would make mistakes louder.

♻️ Suggested change
 void register(BuiltinFunctionName functionName, AggHandler aggHandler) {
-  map.put(functionName, aggHandler);
+  if (map.containsKey(functionName)) {
+    throw new IllegalStateException(
+        String.format(
+            Locale.ROOT,
+            "Each aggregate function can only be registered once: %s",
+            functionName));
+  }
+  map.put(functionName, aggHandler);
 }

qianheng-aws
qianheng-aws previously approved these changes Jan 20, 2026
- use OperandTypes.CHARACTER consistenly
- use OperandTypes.VARIADIC for append function
- add safeguard for atan rewrite

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
import org.apache.calcite.sql.type.OperandTypes;
import org.apache.calcite.sql.type.SqlReturnTypeInference;
import org.apache.calcite.sql.validate.SqlUserDefinedFunction;
import org.jspecify.annotations.NonNull;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lombok.NonNull is not for annotating return value; Calcite adopts org.checkerframework.checker.nullness.qual.NonNull, which is heavier than org.jspecify.annotations.NonNull.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
core/src/main/java/org/opensearch/sql/expression/function/udf/math/NumberToStringFunction.java (1)

33-40: Add JavaDoc for public getOperandMetadata().

This public method is in core/src/main/java and now changed; guidelines require JavaDoc with at least @return. As per coding guidelines, please add JavaDoc.

✍️ Suggested update
   `@Override`
+  /**
+   * {`@inheritDoc`}
+   *
+   * `@return` non-null operand metadata for numeric operands
+   */
   public `@NonNull` UDFOperandMetadata getOperandMetadata() {
     return PPLOperandTypes.NUMERIC;
   }
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVZipFunctionImpl.java (1)

65-74: Add JavaDoc for the public override.
getOperandMetadata() is public and changed; it still lacks JavaDoc with @return, which is required for core/src/main/java/**/*.java.

💡 Suggested JavaDoc
   `@Override`
+  /**
+   * Returns operand metadata for mvzip.
+   *
+   * `@return` operand metadata for 2- or 3-argument variants
+   */
   public `@NonNull` UDFOperandMetadata getOperandMetadata() {

As per coding guidelines, document public methods with JavaDoc.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVFindFunctionImpl.java (1)

48-53: Add JavaDoc for this public override.
The method is public but has no JavaDoc with @return. Please add a brief block to satisfy the project rules.

📝 Proposed JavaDoc
   `@Override`
-  public `@NonNull` UDFOperandMetadata getOperandMetadata() {
+  /**
+   * Operand metadata for mvfind.
+   *
+   * `@return` operand metadata describing accepted argument families
+   */
+  public `@NonNull` UDFOperandMetadata getOperandMetadata() {

As per coding guidelines, ...

core/src/main/java/org/opensearch/sql/expression/function/udf/RelevanceQueryFunction.java (1)

33-45: Add JavaDoc with @return for the public method.

Public methods must have proper JavaDoc; the current block comment doesn’t include @return. As per coding guidelines, convert this to JavaDoc and include the return tag.

🔧 Proposed fix
-  /*
-   * The first parameter is always required (either fields or query).
-   * The second parameter is query when fields are present, otherwise it's the first parameter.
-   * Starting from the 3rd parameter (or 2nd when no fields), they are optional parameters for relevance queries.
-   * Different query has different parameter set, which will be validated in dedicated query builder.
-   * Query parameter is always required and cannot be null.
-   */
+  /**
+   * The first parameter is always required (either fields or query).
+   * The second parameter is query when fields are present, otherwise it's the first parameter.
+   * Starting from the 3rd parameter (or 2nd when no fields), they are optional parameters for relevance queries.
+   * Different query has different parameter set, which will be validated in dedicated query builder.
+   * Query parameter is always required and cannot be null.
+   *
+   * `@return` operand metadata for relevance query functions
+   */
core/src/main/java/org/opensearch/sql/expression/function/udf/SpanFunction.java (1)

88-96: Remove unsupported CHARACTER operand from metadata or implement handling.

The operand metadata declares (CHARACTER, NUMERIC, CHARACTER) as valid (line 63), but the implementor does not support it. When a CHARACTER field is passed with a CHARACTER unit string:

  • Line 88 condition is false (unitType is VARCHAR, not null/ANY)
  • Line 97 condition is false (plain VARCHAR is not instanceof ExprSqlType)
  • Execution falls through to the exception at line 115

All current tests use either numeric fields or fields wrapped in ExprSqlType (EXPR_DATE/EXPR_TIME/EXPR_TIMESTAMP), never plain STRING/CHARACTER fields. Either remove the CHARACTER operand from metadata or implement the missing path to handle string-based spans.

🤖 Fix all issues with AI agents
In
`@core/src/main/java/org/opensearch/sql/calcite/utils/UserDefinedFunctionUtils.java`:
- Around line 232-235: getOperandMetadata() is annotated `@NonNull` but returns
the nullable field operandMetadata (set in adaptExprMethodToUDF), violating the
contract; either validate and throw early when operandMetadata is null or return
a non-null default instance. Fix by updating adaptExprMethodToUDF to ensure it
never leaves operandMetadata null (validate inputs and throw
IllegalArgumentException) or change getOperandMetadata() to return a safe
default UDFOperandMetadata (or throw a clear IllegalStateException) when
operandMetadata == null, referencing getOperandMetadata(), operandMetadata, and
adaptExprMethodToUDF to locate the change.

In
`@core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java`:
- Around line 55-58: TransformFunctionImpl currently uses
sqlOperatorBinding.getOperandType(1) which returns the FUNCTION type during
validation, causing incorrect ARRAY<FUNCTION> inference; update
TransformFunctionImpl to mirror ReduceFunctionImpl by distinguishing
SqlCallBinding vs RexCallBinding: when sqlOperatorBinding is an instance of
SqlCallBinding, extract the SqlLambda from the call, obtain the lambda
expression and derive its return type via SqlTypeUtil.deriveType(sqlValidator,
lambdaExpr) (or equivalent) and use that as lambdaReturnType; when it's a
RexCallBinding (already compiled), keep using getOperandType(1); update the
return type creation code that calls createArrayType(...) to use the derived
lambdaReturnType for correct ARRAY<actual_return_type>.

In
`@core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java`:
- Around line 71-74: getOperandCountRange currently only enforces a minimum of 3
arguments which allows even argument counts and causes IndexOutOfBounds in
JsonSetFunctionImpl.eval when it assumes path/value pairs; update
getOperandCountRange in class JsonSetFunctionImpl to validate that the total
operand count is odd (>=3) — either by returning a custom SqlOperandCountRange
that enforces odd counts or by adding explicit validation at the start of eval
(method eval) to throw a clear validation exception if keys.size() is even,
preventing the keys.get(i+1) access from ever indexing out of bounds.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/binning/MinspanBucketFunction.java`:
- Around line 50-52: Add a JavaDoc block to the public method
getOperandMetadata() in the MinspanBucketFunction class that documents the
method purpose and includes an `@return` tag describing the UDFOperandMetadata
value being returned (specifically that it supplies the operand types
PPLOperandTypes.NUMERIC_NUMERIC_NUMERIC_NUMERIC for this UDF). Keep the comment
concise and placed immediately above the getOperandMetadata() method
declaration.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/ToSecondsFunction.java`:
- Around line 53-56: Add JavaDoc to the public override method
getOperandMetadata in ToSecondsFunction: describe what the method returns and
include an `@return` tag that specifies it returns the operand metadata
(PPLOperandTypes.DATETIME_OR_STRING_OR_INTEGER) used by the UDF; keep the
comment concise and place it immediately above the getOperandMetadata()
override.
- Line 23: Add JavaDoc to the public override getOperandMetadata() in
ToSecondsFunction explaining it "Returns the operand metadata for the function
signature" and annotate the method return with `@NonNull` as suggested; update the
method in class ToSecondsFunction to return
PPLOperandTypes.DATETIME_OR_STRING_OR_INTEGER and include the `@Override` and
JavaDoc. Also verify that the org.jspecify.annotations.NonNull import is
available to the build: if not provided by the runtime/plugin, add the jspecify
dependency to core/build.gradle so the import resolves.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/WeekdayFunction.java`:
- Around line 52-55: Add a JavaDoc block to the public override of
getOperandMetadata in WeekdayFunction describing the method and include an
`@return` tag explaining it returns the operand metadata for this UDF; reference
the getOperandMetadata method and the PPLOperandTypes.DATETIME_OPTIONAL_INTEGER
return value so it's clear this documents that the function accepts a DATETIME
and an optional INTEGER operand type.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/math/EulerFunction.java`:
- Around line 42-45: The public override EulerFunction.getOperandMetadata lacks
JavaDoc; add a short Javadoc comment (or use {`@inheritDoc`}) above the method and
include an `@return` describing that it returns the UDFOperandMetadata for this
UDF (specifically PPLOperandTypes.NONE), referencing the return type
UDFOperandMetadata and the constant PPLOperandTypes.NONE so readers know this
UDF takes no operands.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java`:
- Around line 42-44: Add a JavaDoc comment to the public method getKind() in
class ModFunction: document that this method returns the SQL operator kind for
the modulo operation and include an `@return` tag stating it returns SqlKind.MOD;
ensure the JavaDoc also notes it overrides the superclass method (since
getKind() is an override) and follows project formatting conventions.

In
`@core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java`:
- Around line 36-46: Update the Javadoc for getOperandMetadata() to match the
implementation: remove the phrase "default null without checking" and document
that the default returns a non-null UDFOperandMetadata that wraps
OperandTypes.VARIADIC (i.e., uses a permissive type checker and effectively
skips strict type checking). Mention the `@NonNull` return contract and reference
UDFOperandMetadata.wrap(...) and OperandTypes.VARIADIC in the comment so readers
see the actual behavior.

In `@docs/dev/ppl-functions.md`:
- Around line 131-147: The doc example uses an unqualified ADD symbol but the
register signature requires a BuiltinFunctionName enum; update the example to
call register with BuiltinFunctionName.ADD (e.g.,
register(BuiltinFunctionName.ADD, ...)) so it matches the API; ensure the
surrounding references (register, FunctionImp, builder.makeCall,
SqlStdOperatorTable.CONCAT, SqlStdOperatorTable.PLUS,
OpenSearchTypeUtil::isCharacter) remain unchanged.
- Around line 192-196: Update the docs to reflect the new UDAF registration API
by replacing references to AggBuilder::registerOperator(BuiltinFunctionName
functionName, SqlAggFunction aggFunction) with
AggBuilder::register(BuiltinFunctionName functionName, AggHandler aggHandler);
also ensure the subsequent bullets and any example snippets or mentions that
suggest registerOperator are changed to use AggBuilder::register and the
AggHandler/SqlAggFunction parameter names/types consistent with the PR
(including keeping the note about PPLFuncImpTable::registerExternalAggOperator
for engine-dependent functions).
♻️ Duplicate comments (1)
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.java (1)

42-48: Javadoc documents STRING operand signatures that the type checker rejects.

The Javadoc claims (IP, STRING) and (STRING, IP) signatures, but the type checker at lines 135-136 only accepts operands where OpenSearchTypeUtil.isIp(type, true) returns true. This method does not accept VARCHAR/CHAR types. Either update the Javadoc to reflect the actual (IP, IP) only behavior, or modify the type checker to also accept character types if STRING support is intended.

🧹 Nitpick comments (14)
core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/ExtractFunction.java (1)

51-53: Add JavaDoc for public method.
The public override lacks required JavaDoc with @return. As per coding guidelines, please document it.

💡 Suggested JavaDoc
   `@Override`
+  /**
+   * Returns operand metadata for the EXTRACT function.
+   *
+   * `@return` operand metadata for (STRING, DATE/TIME/TIMESTAMP/STRING)
+   */
   public `@NonNull` UDFOperandMetadata getOperandMetadata() {
     return PPLOperandTypes.STRING_DATETIME;
   }

As per coding guidelines, document public methods with JavaDoc.

core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/SecToTimeFunction.java (1)

64-73: Consider adding JavaDoc to the public static utility method.

The secToTime method is public and could be called directly. Per coding guidelines, public methods should have JavaDoc with @param, @return, and @throws documentation. This would help clarify the expected input range and output format.

📝 Suggested JavaDoc
+    /**
+     * Converts seconds to time string in HH:mm:ss[.nnnnnn] format.
+     *
+     * `@param` seconds the number of seconds to convert (integral or decimal)
+     * `@return` time string representation
+     */
     public static String secToTime(Number seconds) {
core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/DatePartFunction.java (1)

52-52: Minor typo in JavaDoc: "MIBUTEOFHOUR" should be "MINUTEOFHOUR".

This appears to be a pre-existing typo, but since you're touching this file, consider fixing it.

📝 Suggested fix
- *   <li>MIBUTEOFHOUR
+ *   <li>MINUTEOFHOUR
core/src/main/java/org/opensearch/sql/expression/function/udf/ToNumberFunction.java (1)

98-100: Empty catch block silently swallows parsing errors.

The empty catch block suppresses all exceptions without logging or documentation, making it difficult to diagnose conversion failures. If returning null on parse failure is intentional behavior, consider at minimum adding a comment explaining this design choice, or logging at debug/trace level.

♻️ Suggested improvement
     } catch (Exception e) {
-
+      // Intentionally return null for unparseable input (lenient parsing)
+      // Callers should handle null return appropriately
     }

Or, for better observability:

+import java.util.logging.Logger;
+import java.util.logging.Level;
...
+  private static final Logger LOGGER = Logger.getLogger(ToNumberFunction.class.getName());
...
     } catch (Exception e) {
-
+      LOGGER.log(Level.FINE, "Failed to parse number: " + numStr, e);
     }
core/src/main/java/org/opensearch/sql/expression/function/udf/math/ConvFunction.java (2)

26-29: Pre-existing doc typo: "floor function" should be "conv function".

The class-level JavaDoc appears to have been copied from another function and still references "floor function" instead of "conv function".

📝 Suggested fix
 /**
  * Convert number x from base a to base b<br>
- * The supported signature of floor function is<br>
+ * The supported signature of conv function is<br>
  * (STRING, INTEGER, INTEGER) -> STRING<br>
  * (INTEGER, INTEGER, INTEGER) -> STRING
  */

61-74: Pre-existing: JavaDoc does not match implementation behavior.

The JavaDoc states the method returns "0" for invalid input and null for out-of-range bases, but the actual implementation will throw NumberFormatException in both cases. Consider either updating the documentation to reflect the actual exception-throwing behavior, or wrapping the logic in a try-catch to match the documented contract.

📝 Option 1: Update JavaDoc to match implementation
     /**
      * Convert numStr from fromBase to toBase
      *
      * `@param` numStr the number to convert (case-insensitive for alphanumeric digits, may have a
      *     leading '-')
      * `@param` fromBase base of the input number (2 to 36)
      * `@param` toBase target base (2 to 36)
-     * `@return` the converted number in the target base (uppercase), "0" if the input is invalid, or
-     *     null if bases are out of range.
+     * `@return` the converted number in the target base
+     * `@throws` NumberFormatException if numStr is invalid for fromBase or if bases are out of range
      */
core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/TimestampFunction.java (1)

75-84: Consider adding JavaDoc to public static methods.

The static timestamp methods are public API entry points called via reflection. Adding JavaDoc with @param and @return tags would improve maintainability, especially documenting the expected input types and the return value format.

📝 Example JavaDoc
+  /**
+   * Converts a datetime value to a timestamp string.
+   *
+   * `@param` properties function properties containing timezone context
+   * `@param` datetime the datetime value to convert
+   * `@return` timestamp string representation
+   */
   public static String timestamp(FunctionProperties properties, ExprValue datetime) {
     return (String) convertToTimestampValue(datetime, properties).valueForCalcite();
   }

+  /**
+   * Converts a datetime value to timestamp and adds a time offset.
+   *
+   * `@param` properties function properties containing timezone context
+   * `@param` datetime the base datetime value
+   * `@param` addTime the time value to add to the base datetime
+   * `@return` timestamp string representation after adding the time offset
+   */
   public static String timestamp(
       FunctionProperties properties, ExprValue datetime, ExprValue addTime) {
core/src/main/java/org/opensearch/sql/expression/function/udf/math/ScalarMinFunction.java (1)

54-56: Consider adding JavaDoc to the public min method.

Per coding guidelines, public methods should have JavaDoc documentation. While this is pre-existing code, adding a brief doc comment would improve maintainability:

/**
 * Returns the minimum value from the given arguments using mixed-type comparison.
 *
 * `@param` args array of values to compare
 * `@return` the minimum value, or null if all values are null
 */
public static Object min(Object[] args) {
core/src/main/java/org/opensearch/sql/expression/function/udf/RexExtractMultiFunction.java (1)

47-58: Add JavaDoc for the public method.
getOperandMetadata is public and should include JavaDoc (with @return, and @throws if applicable). As per coding guidelines.

📝 Suggested update
+  /**
+   * Returns operand metadata for REX_EXTRACT_MULTI validation.
+   *
+   * `@return` operand metadata describing accepted operands
+   */
   `@Override`
   public `@NonNull` UDFOperandMetadata getOperandMetadata() {
core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonArrayLengthFunctionImpl.java (1)

54-63: Consider adding null safety to the eval method.

The eval method has potential robustness issues in the existing code:

  1. Using assert for argument validation can be disabled at runtime
  2. If gson.fromJson returns null (e.g., for JSON literal "null"), target.size() will throw NPE

Since NullPolicy.ANY indicates this function handles nulls itself, explicit null handling would be safer.

♻️ Suggested improvement
  public static Object eval(Object... args) {
-   assert args.length == 1 : "Json array length only accept one argument";
+   if (args.length != 1) {
+     throw new IllegalArgumentException("Json array length only accepts one argument");
+   }
    String value = (String) args[0];
+   if (value == null) {
+     return null;
+   }
    try {
      List<?> target = gson.fromJson(value, List.class);
-     return target.size();
+     return target != null ? target.size() : null;
    } catch (JsonSyntaxException e) {
      return null;
    }
  }
core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImpl.java (1)

44-45: Consider stricter operand count validation.

Using OperandTypes.VARIADIC allows calls with 0 or 1 arguments, which the eval method handles by returning null. While this is safe, requiring at least 2 arguments (json_string + path) at validation time would provide earlier feedback to users.

♻️ Optional: Enforce minimum 2 operands
 `@Override`
 public `@NonNull` UDFOperandMetadata getOperandMetadata() {
-  return UDFOperandMetadata.wrap(OperandTypes.VARIADIC);
+  return UDFOperandMetadata.wrap(
+      OperandTypes.repeat(SqlOperandCountRanges.from(2), OperandTypes.ANY));
 }
core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonDeleteFunctionImpl.java (1)

39-40: Consider requiring at least 2 operands.

OperandTypes.ONE_OR_MORE allows calls with just a JSON string and no paths to delete, which is semantically meaningless. The function should require at least 2 arguments: the JSON string and at least one path to delete.

♻️ Proposed fix: Enforce minimum 2 operands
 `@Override`
 public `@NonNull` UDFOperandMetadata getOperandMetadata() {
-  return UDFOperandMetadata.wrap(OperandTypes.ONE_OR_MORE);
+  return UDFOperandMetadata.wrap(
+      OperandTypes.repeat(SqlOperandCountRanges.from(2), OperandTypes.ANY));
 }
core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/UnixTimestampFunction.java (1)

51-53: Add JavaDoc for the public override.
Line 52 changes the public API signature; please add a brief JavaDoc with @return to comply with project guidelines. As per coding guidelines, ...

✍️ Suggested JavaDoc addition
   `@Override`
+  /**
+   * Returns operand metadata for unix_timestamp.
+   *
+   * `@return` non-null operand metadata
+   */
   public `@NonNull` UDFOperandMetadata getOperandMetadata() {
     return PPLOperandTypes.OPTIONAL_DATE_OR_TIMESTAMP_OR_NUMERIC;
   }
core/src/main/java/org/opensearch/sql/expression/function/udf/SpanFunction.java (1)

60-70: Consider adding JavaDoc for this public method.

The refactor to OperandTypes.family(...).or(...) chains looks correct and aligns with the PR's goal of moving away from CompositeOperandTypeChecker. The @NonNull annotation is a good addition.

Per coding guidelines, public methods should have JavaDoc documentation explaining the operand types and their semantics. Consider documenting what each combination represents:

  • CHARACTER/NUMERIC/CHARACTER: time-based span on string fields
  • DATETIME/NUMERIC/CHARACTER: time-based span on datetime fields
  • NUMERIC/NUMERIC/ANY: numeric span (no time unit required)

Comment on lines 55 to 58
RelDataTypeFactory typeFactory = sqlOperatorBinding.getTypeFactory();
RexCallBinding rexCallBinding = (RexCallBinding) sqlOperatorBinding;
List<RexNode> operands = rexCallBinding.operands();
RelDataType lambdaReturnType = ((RexLambda) operands.get(1)).getExpression().getType();
RelDataType lambdaReturnType = sqlOperatorBinding.getOperandType(1);
return createArrayType(
typeFactory, typeFactory.createTypeWithNullability(lambdaReturnType, true), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "TransformFunctionImpl.java" | head -20

Repository: opensearch-project/sql

Length of output: 166


🏁 Script executed:

cat -n ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java

Repository: opensearch-project/sql

Length of output: 10999


🏁 Script executed:

# Search for other lambda-based collection functions and how they handle return types
find . -path ./target -prune -o -type f -name "*.java" -print | xargs grep -l "getOperandType.*lambda\|lambda.*getOperandType" 2>/dev/null | head -10

Repository: opensearch-project/sql

Length of output: 166


🏁 Script executed:

# Search for similar patterns with SqlReturnTypeInference in CollectionUDF directory
find ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF -type f -name "*.java" | head -20

Repository: opensearch-project/sql

Length of output: 1529


🏁 Script executed:

cat -n ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/FilterFunctionImpl.java

Repository: opensearch-project/sql

Length of output: 3150


🏁 Script executed:

cat -n ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.java

Repository: opensearch-project/sql

Length of output: 7288


🏁 Script executed:

cat -n ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/LambdaUtils.java

Repository: opensearch-project/sql

Length of output: 996


🏁 Script executed:

# Search for TransformFunctionImpl tests
find . -path ./target -prune -o -type f -name "*TransformFunctionImplTest*" -o -name "*TransformTest*" | grep -i test

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Broader search for lambda-related tests
find ./core/src/test -type f -name "*.java" | xargs grep -l "TransformFunctionImpl\|transform.*lambda" 2>/dev/null | head -10

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Search for SqlLambda usage and getExpression calls in the codebase
rg "SqlLambda" --type java -A 3 | head -50

Repository: opensearch-project/sql

Length of output: 1907


🏁 Script executed:

# Check if there's any validation or type binding code that explains lambda operand types
rg "getOperandType.*1\)" --type java -B 2 -A 2 | head -60

Repository: opensearch-project/sql

Length of output: 5563


🏁 Script executed:

# Look at how ReduceFunctionImpl and TransformFunctionImpl differ in handling binding types
grep -B 5 -A 10 "getReturnTypeInference" ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java

Repository: opensearch-project/sql

Length of output: 624


🏁 Script executed:

# Search for any tests that use transform function
find ./core/src/test -type f -name "*.java" -o -name "*.sql" | xargs grep -l "transform" 2>/dev/null | head -5

Repository: opensearch-project/sql

Length of output: 374


🏁 Script executed:

# Search for any documentation or comments about lambda operand types
rg "lambda.*type\|FUNCTION.*type" --type java -i | grep -i "lambda\|function" | head -20

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check if there are any RexCallBinding vs SqlCallBinding handling patterns for lambdas
rg "RexCallBinding\|SqlCallBinding" ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ --type java -B 2 -A 5 | head -60

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Search for any direct usage of SqlCallBinding in the transform/lambda related code
rg "instanceof.*Binding" ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ --type java -B 3 -A 3

Repository: opensearch-project/sql

Length of output: 1488


🏁 Script executed:

# Check the Calcite version used in the project
find . -name "pom.xml" -o -name "build.gradle" | xargs grep -i "calcite" | grep -i "version" | head -5

Repository: opensearch-project/sql

Length of output: 293


🏁 Script executed:

# Search for any commits or comments about TransformFunctionImpl implementation
git log --oneline --all -- "./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java" 2>/dev/null | head -10

Repository: opensearch-project/sql

Length of output: 307


🏁 Script executed:

# Look for any integration tests or examples showing transform usage
find . -name "*.sql" -o -name "*.md" | xargs grep -l "transform" 2>/dev/null | head -5

Repository: opensearch-project/sql

Length of output: 243


🏁 Script executed:

# Check the ppl-functions.md documentation for transform function
grep -A 10 -B 5 "transform" ./docs/user/ppl/cmd/ppl-functions.md 2>/dev/null || grep -A 10 -B 5 "transform" ./docs/dev/ppl-functions.md 2>/dev/null

Repository: opensearch-project/sql

Length of output: 634


🏁 Script executed:

# Look at the most recent commit details for TransformFunctionImpl
git show b1dca98d --stat 2>/dev/null | head -30

Repository: opensearch-project/sql

Length of output: 1723


🏁 Script executed:

# Search for documentation or tests showing transform usage with different return types
find . -name "*.java" -path "*/test/*" | xargs grep -l "TransformFunctionImpl\|transform.*array" 2>/dev/null | head -5

Repository: opensearch-project/sql

Length of output: 238


🏁 Script executed:

# Check if there are any comments or code patterns about when return type inference is called
rg "SqlReturnTypeInference\|getReturnTypeInference" ./core/src/main/java/org/opensearch/sql/expression/function/ --type java -B 2 -A 10 | head -80

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check the array function integration test
cat -n ./integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java | head -100

Repository: opensearch-project/sql

Length of output: 3717


🏁 Script executed:

# Check the PPL array function test
cat -n ./ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java | head -150

Repository: opensearch-project/sql

Length of output: 6867


🏁 Script executed:

# Search for transform-specific tests or usage
rg "transform\(" --type java -i ./integ-test ./ppl | head -20

Repository: opensearch-project/sql

Length of output: 1956


🏁 Script executed:

# Check the CalciteArrayFunctionIT for transform tests
grep -n "transform" ./integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java

Repository: opensearch-project/sql

Length of output: 478


Extract lambda's return type explicitly for correct type inference during validation.

At the SqlCallBinding phase (validation), getOperandType(1) returns the FUNCTION type itself, not the lambda's return type. This differs from how ReduceFunctionImpl handles similar lambda-based operations. Follow the ReduceFunctionImpl pattern: distinguish between SqlCallBinding (extract SqlLambda, get expression, derive type via SqlTypeUtil.deriveType()) and RexCallBinding (use simple getOperandType() on already-compiled types). Without this, type inference may produce ARRAY<FUNCTION> instead of ARRAY<actual_return_type>.

🤖 Prompt for AI Agents
In
`@core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java`
around lines 55 - 58, TransformFunctionImpl currently uses
sqlOperatorBinding.getOperandType(1) which returns the FUNCTION type during
validation, causing incorrect ARRAY<FUNCTION> inference; update
TransformFunctionImpl to mirror ReduceFunctionImpl by distinguishing
SqlCallBinding vs RexCallBinding: when sqlOperatorBinding is an instance of
SqlCallBinding, extract the SqlLambda from the call, obtain the lambda
expression and derive its return type via SqlTypeUtil.deriveType(sqlValidator,
lambdaExpr) (or equivalent) and use that as lambdaReturnType; when it's a
RexCallBinding (already compiled), keep using getOperandType(1); update the
return type creation code that calls createArrayType(...) to use the derived
lambdaReturnType for correct ARRAY<actual_return_type>.

…mplementation

- additionally udpate ppl-functions.md documentation

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
core/src/main/java/org/opensearch/sql/expression/function/udf/condition/EarliestFunction.java (1)

38-45: Add JavaDoc for public getOperandMetadata().

Public methods under core/src/main/java/** require JavaDoc with @return. Please add a brief JavaDoc (especially now that the non-null contract is explicit).

✍️ Proposed JavaDoc
+  /**
+   * Returns operand metadata for the {`@code` earliest} function.
+   *
+   * `@return` non-null operand metadata
+   */
   `@Override`
   public `@NonNull` UDFOperandMetadata getOperandMetadata() {
     return PPLOperandTypes.STRING_TIMESTAMP;
   }

As per coding guidelines, ...

core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtendFunctionImpl.java (1)

62-64: Fix copy-paste error in exception message.

The error message incorrectly refers to "Json append function" but this is JsonExtendFunctionImpl.

Suggested fix
     if (keys.size() % 2 != 0) {
       throw new RuntimeException(
-          "Json append function needs corresponding path and values, but current get: " + keys);
+          "Json extend function needs corresponding path and values, but current get: " + keys);
     }
core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java (1)

66-75: Add unit tests for the new default methods in UserDefinedFunctionBuilder.

This interface with its new default methods—getKind(), getOperandMetadata(), toUDF(String), and toUDF(String, boolean)—contains new business logic in core/src/main/java that must be covered by unit tests per coding guidelines. Current test coverage validates specific UDF implementations but does not test the default interface behavior itself.

🤖 Fix all issues with AI agents
In
`@core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java`:
- Around line 62-63: Add a JavaDoc comment to the public override method
getOperandMetadata in TransformFunctionImpl describing what the method returns
and include an `@return` tag that explains the returned UDFOperandMetadata (e.g.,
metadata about operands for this UDF). Ensure the JavaDoc appears immediately
above the method signature for the public `@NonNull` UDFOperandMetadata
getOperandMetadata() override and follows project JavaDoc style.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/SecToTimeFunction.java`:
- Around line 48-50: Add a JavaDoc comment to the public override
getOperandMetadata() in SecToTimeFunction that documents the method and its
return value; specifically, describe that it returns the operand metadata for
this UDF and include an `@return` tag stating it returns a UDFOperandMetadata
(e.g., PPLOperandTypes.NUMERIC) applicable to this function. Ensure the JavaDoc
sits immediately above the method declaration and follows project JavaDoc style.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java`:
- Around line 46-49: Add a JavaDoc comment to the public method
getOperandMetadata() in ModFunction explaining what operand metadata it returns
(i.e., the expected operand types for the modulus UDF) and include a `@return` tag
that states it returns PPLOperandTypes.NUMERIC_NUMERIC; place the comment
immediately above the getOperandMetadata() declaration in ModFunction.java.
- Around line 41-49: Add a new test class ModFunctionTest.java that follows the
ToNumberFunctionTest pattern and verifies ModFunction's metadata (getKind()
returns SqlKind.MOD and getOperandMetadata() returns
PPLOperandTypes.NUMERIC_NUMERIC) and exercises the core logic by calling
integralMod() and floatingMod() with a variety of inputs: integer modulo cases,
floating-point modulo cases, mixed-type coercion (e.g., int % double and double
% int), and edge cases including divisor zero to assert the expected
exception/behavior; use the existing test utilities/assertions used by other UDF
tests to construct inputs and compare results.

In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/math/NumberToStringFunction.java`:
- Around line 39-41: Add a JavaDoc comment to the public override
getOperandMetadata() in NumberToStringFunction describing what the method
returns and any exceptions it may throw; reference the method signature public
`@NonNull` UDFOperandMetadata getOperandMetadata(), state that it returns the
operand metadata (PPLOperandTypes.NUMERIC) via an `@return` tag, and include an
`@throws` tag only if the method can throw checked or runtime exceptions —
otherwise explicitly note no exceptions are thrown. Ensure the JavaDoc follows
project style and is placed immediately above the getOperandMetadata() method.

In
`@core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java`:
- Around line 49-51: Add a JavaDoc comment to the public default method
getKind() in UserDefinedFunctionBuilder describing its purpose (returns the SQL
kind for this function), the return type (SqlKind, defaulting to
SqlKind.OTHER_FUNCTION), and any behavior/override expectations so consumers
know they can override it for other SqlKind values; place the comment
immediately above the getKind() declaration and follow existing project JavaDoc
style and tags (brief description and `@return`).
♻️ Duplicate comments (8)
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.java (1)

42-47: Class Javadoc documents unsupported STRING operand signatures.

The Javadoc claims support for (IP, STRING) and (STRING, IP) signatures, but the type checker at lines 135-136 only accepts IP types via OpenSearchTypeUtil.isIp(type, true). The getAllowedSignatures method correctly reports <IP>, <IP>.

Either update the Javadoc to remove STRING claims, or modify the type checker to accept VARCHAR if STRING support is intended.
,

core/src/main/java/org/opensearch/sql/expression/function/udf/math/EulerFunction.java (1)

42-44: Add explicit JavaDoc (or {@inheritDoc}) for the public override.

This public method still lacks an explicit JavaDoc/{@inheritdoc} block. Please add a brief doc (including @return) to meet the public-API documentation requirement. As per coding guidelines, ...

core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/ToSecondsFunction.java (2)

23-23: jspecify dependency concern already flagged.

The import addition is appropriate for the @NonNull annotation. The concern about verifying this dependency in core/build.gradle was already raised in a prior review.


53-56: Nullability contract improvement looks good.

The @NonNull annotation on getOperandMetadata() is appropriate since the method always returns a non-null constant. This aligns with the PR-wide pattern of strengthening nullability contracts across UDF operand metadata accessors.

The missing JavaDoc concern was already flagged in a prior review.

docs/dev/ppl-functions.md (1)

192-195: Inconsistent UDAF registration API documentation.

Lines 192-193 reference AggBuilder::registerOperator(BuiltinFunctionName functionName, SqlAggFunction aggFunction) for "standard registration," but line 195 documents AggBuilder::register(BuiltinFunctionName functionName, AggHandler aggHandler). According to the AI summary, the API changed from registerOperator with SqlAggFunction to register with AggHandler.

The first bullet should be updated to reflect the new register API to avoid misleading developers.

📝 Suggested fix
-- Use `AggBuilder::registerOperator(BuiltinFunctionName functionName, SqlAggFunction aggFunction)` for standard
-  registration
+- Use `AggBuilder::register(BuiltinFunctionName functionName, AggHandler aggHandler)` for standard
+  registration
core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java (1)

56-74: Missing validation for odd argument count.

The function signature (json_string, path1, value1, path2, value2, ...) requires an odd total count. With an even count like 4 arguments, the eval method will throw IndexOutOfBoundsException at line 105 when accessing keys.get(i + 1) for the incomplete pair. Add odd-count validation in checkOperandTypes.

🐛 Proposed fix
 `@Override`
 public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) {
   final int count = callBinding.getOperandCount();
+  // Require odd count: 1 json_string + n pairs of (path, value)
+  if (count % 2 == 0) {
+    if (throwOnFailure) {
+      throw callBinding.newValidationSignatureError();
+    }
+    return false;
+  }
   for (int i = 1; i < count; i += 2) {
     RelDataType nameType = callBinding.getOperandType(i);
     if (!OpenSearchTypeUtil.isCharacter(nameType)) {
core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java (1)

41-44: Add JavaDoc for the public getKind() override.
Public methods require JavaDoc with @return. As per coding guidelines, please document this method accordingly.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (1)

52-59: Derive the lambda return type instead of FUNCTION during validation.

Line 56 uses getOperandType(1), which (under SqlCallBinding) yields the lambda’s FUNCTION type, not its return type—this can infer ARRAY<FUNCTION> and propagate incorrect casts. Follow the ReduceFunctionImpl pattern: when binding is SqlCallBinding, extract the SqlLambda expression and derive its return type; fall back to getOperandType(1) for RexCallBinding.

🐛 Proposed fix
     return sqlOperatorBinding -> {
       RelDataTypeFactory typeFactory = sqlOperatorBinding.getTypeFactory();
-      RelDataType lambdaReturnType = sqlOperatorBinding.getOperandType(1);
+      RelDataType lambdaReturnType;
+      if (sqlOperatorBinding instanceof SqlCallBinding) {
+        SqlCallBinding callBinding = (SqlCallBinding) sqlOperatorBinding;
+        SqlNode lambdaNode = callBinding.operand(1);
+        if (lambdaNode instanceof org.apache.calcite.sql.SqlLambda) {
+          SqlNode lambdaExpr =
+              ((org.apache.calcite.sql.SqlLambda) lambdaNode).getExpression();
+          lambdaReturnType = SqlTypeUtil.deriveType(callBinding, lambdaExpr);
+        } else {
+          lambdaReturnType = callBinding.getOperandType(1);
+        }
+      } else {
+        lambdaReturnType = sqlOperatorBinding.getOperandType(1);
+      }
       return createArrayType(
           typeFactory, typeFactory.createTypeWithNullability(lambdaReturnType, true), true);
     };
#!/bin/bash
# Compare lambda return-type inference patterns (Transform vs Reduce)
rg -n "class ReduceFunctionImpl|getReturnTypeInference|SqlLambda|deriveType" \
  core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.java -C 3
🧹 Nitpick comments (5)
core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/FromUnixTimeFunction.java (1)

56-58: Add JavaDoc for public method per guidelines.

getOperandMetadata() is public and currently has no JavaDoc. Please add a brief JavaDoc with an @return tag. As per coding guidelines.

📝 Proposed JavaDoc
+  /**
+   * Returns operand metadata for FROM_UNIXTIME.
+   *
+   * `@return` operand metadata describing numeric or numeric+string arguments
+   */
  `@Override`
  public `@NonNull` UDFOperandMetadata getOperandMetadata() {
    return PPLOperandTypes.NUMERIC_OPTIONAL_STRING;
  }
core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/PeriodNameFunction.java (1)

66-77: Remove unused variable dateType.

The variable dateType is computed on lines 69-71 but never used. This appears to be dead code.

♻️ Suggested fix
     `@Override`
     public Expression implement(
         RexToLixTranslator translator, RexCall call, List<Expression> translatedOperands) {
-      ExprType dateType =
-          OpenSearchTypeFactory.convertRelDataTypeToExprType(
-              call.getOperands().getFirst().getType());
       return Expressions.call(
           PeriodNameFunctionImplementor.class,
           "name",
           translatedOperands.getFirst(),
           Expressions.constant(periodUnit));
     }
core/src/main/java/org/opensearch/sql/expression/function/udf/binning/WidthBucketFunction.java (1)

136-139: Pre-existing: Floating-point modulo equality comparison may be imprecise.

This is not part of your changes, but maxValue % optimalWidth == 0 may not behave as expected due to floating-point precision. For example, 10.0 % 0.1 may not equal exactly 0.0. Consider using a tolerance-based comparison if this causes issues.

♻️ Optional fix
-      if (maxValue % optimalWidth == 0) {
+      double remainder = maxValue % optimalWidth;
+      if (Math.abs(remainder) < 1e-10 || Math.abs(remainder - optimalWidth) < 1e-10) {
core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/ExtractFunction.java (1)

46-53: Add JavaDoc @return for the non-null contract.
Public methods in this module should document their return contract; the new non-null guarantee should be explicitly stated.

📝 Suggested JavaDoc
   `@Override`
+  /**
+   * Returns operand metadata for the EXTRACT function.
+   *
+   * `@return` non-null operand metadata for (STRING, DATE/TIME/TIMESTAMP/STRING)
+   */
   public `@NonNull` UDFOperandMetadata getOperandMetadata() {
     return PPLOperandTypes.STRING_DATETIME;
   }

As per coding guidelines, ...

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.java (1)

47-78: Add a defensive arity check before accessing operand(2).

If an invalid call slips through, this currently risks an IndexOutOfBoundsException instead of a clearer validation error. A small guard improves diagnostics.

♻️ Proposed fix
       if (sqlOperatorBinding instanceof RexCallBinding) {
         return sqlOperatorBinding.getOperandType(sqlOperatorBinding.getOperandCount() - 1);
       } else if (sqlOperatorBinding instanceof SqlCallBinding callBinding) {
+        if (callBinding.getOperandCount() < 3) {
+          throw new IllegalArgumentException(
+              "reduce requires at least 3 operands (array, base, lambda)");
+        }
         RelDataType elementType = callBinding.getOperandType(0).getComponentType();
         RelDataType baseType = callBinding.getOperandType(1);
         SqlLambda reduce1 = callBinding.getCall().operand(2);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Type Conversion - Leverage RelDataType from Calcite in V3 engine. [RFC] Implicit Type Coercion with Calcite

3 participants