Skip to content

feat: support ilike expression#3589

Open
n0r0shi wants to merge 1 commit intoapache:mainfrom
n0r0shi:ilike-func
Open

feat: support ilike expression#3589
n0r0shi wants to merge 1 commit intoapache:mainfrom
n0r0shi:ilike-func

Conversation

@n0r0shi
Copy link

@n0r0shi n0r0shi commented Feb 25, 2026

Summary

  • Wire ilike from the datafusion-spark crate (SparkILike) to Comet
  • Register in jni_api.rs and add serde mapping in QueryPlanSerde.scala
  • Custom handler CometILike in strings.scala to reject non-default escape characters
  • ILIKE requires case-insensitive comparison which depends on locale-specific case conversion. When spark.comet.caseConversion.enabled is disabled (default), ILIKE falls back to Spark to avoid incompatibilities with Rust's Unicode-based to_lowercase() vs Java's locale-aware rules (e.g. Turkish I)

Please let me know if the caseConversion handling is not appropriate

Register datafusion-spark's SparkILike UDF and add CometILike handler
in stringExpressions. Custom escape characters fall back to Spark.
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on ILIKE support. The escape character check and the general approach look reasonable.

I have a question about whether CometILike actually gets matched during serde. In Spark, ILike is a RuntimeReplaceable expression. Its replacement is Like(Lower(left), Lower(right)). This means Spark's analyzer should replace ILike with Like(Lower(left), Lower(right)) before Comet sees the physical plan. If that's the case, the CometILike handler would be dead code, because the plan would already contain Like and Lower nodes instead of ILike.

The fallback test seems to confirm this. The expected fallback reason is "Comet is not compatible with Spark for case conversion", which comes from CometCaseConversionBase (used by CometLower), not from CometILike. So it looks like the expression is flowing through the existing CometLike + CometLower path. When spark.comet.caseConversion.enabled is true, Lower serializes successfully and Like handles the matching. When it's false, CometLower refuses to serialize and everything falls back.

If that's the case, the CometILike serde handler, the SparkILike UDF registration in jni_api.rs, and the ILike import in strings.scala may all be unnecessary. Could you check whether removing those pieces still makes the tests pass? It's possible that ILIKE already works today through the Like(Lower(...), Lower(...)) replacement path, gated by spark.comet.caseConversion.enabled.

If it turns out ILike does appear in the plan for some reason, then CometILike.convert() should probably check COMET_CASE_CONVERSION_ENABLED the same way CometCaseConversionBase does, since Arrow's ilike kernel uses Rust's to_lowercase() which has the same locale mismatch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants