feat: support ilike expression by n0r0shi · Pull Request #3589 · apache/datafusion-comet

n0r0shi · 2026-02-25T05:07:18Z

Summary

Wire ilike from the datafusion-spark crate (SparkILike) to Comet
Register in jni_api.rs and add serde mapping in QueryPlanSerde.scala
Custom handler CometILike in strings.scala to reject non-default escape characters
ILIKE requires case-insensitive comparison which depends on locale-specific case conversion. When spark.comet.caseConversion.enabled is disabled (default), ILIKE falls back to Spark to avoid incompatibilities with Rust's Unicode-based to_lowercase() vs Java's locale-aware rules (e.g. Turkish I)

Please let me know if the caseConversion handling is not appropriate

Register datafusion-spark's SparkILike UDF and add CometILike handler in stringExpressions. Custom escape characters fall back to Spark.

andygrove

Thanks for working on ILIKE support. The escape character check and the general approach look reasonable.

I have a question about whether CometILike actually gets matched during serde. In Spark, ILike is a RuntimeReplaceable expression. Its replacement is Like(Lower(left), Lower(right)). This means Spark's analyzer should replace ILike with Like(Lower(left), Lower(right)) before Comet sees the physical plan. If that's the case, the CometILike handler would be dead code, because the plan would already contain Like and Lower nodes instead of ILike.

The fallback test seems to confirm this. The expected fallback reason is "Comet is not compatible with Spark for case conversion", which comes from CometCaseConversionBase (used by CometLower), not from CometILike. So it looks like the expression is flowing through the existing CometLike + CometLower path. When spark.comet.caseConversion.enabled is true, Lower serializes successfully and Like handles the matching. When it's false, CometLower refuses to serialize and everything falls back.

If that's the case, the CometILike serde handler, the SparkILike UDF registration in jni_api.rs, and the ILike import in strings.scala may all be unnecessary. Could you check whether removing those pieces still makes the tests pass? It's possible that ILIKE already works today through the Like(Lower(...), Lower(...)) replacement path, gated by spark.comet.caseConversion.enabled.

If it turns out ILike does appear in the plan for some reason, then CometILike.convert() should probably check COMET_CASE_CONVERSION_ENABLED the same way CometCaseConversionBase does, since Arrow's ilike kernel uses Rust's to_lowercase() which has the same locale mismatch.

feat: support ilike expression

24427e0

Register datafusion-spark's SparkILike UDF and add CometILike handler in stringExpressions. Custom escape characters fall back to Spark.

andygrove reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support ilike expression#3589

feat: support ilike expression#3589
n0r0shi wants to merge 1 commit intoapache:mainfrom
n0r0shi:ilike-func

n0r0shi commented Feb 25, 2026

Uh oh!

andygrove left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

n0r0shi commented Feb 25, 2026

Summary

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants