Skip to content

IBX-9846: Describe Embeddings search API#3029

Open
dabrt wants to merge 13 commits into5.0from
IBX-9846
Open

IBX-9846: Describe Embeddings search API#3029
dabrt wants to merge 13 commits into5.0from
IBX-9846

Conversation

@dabrt
Copy link
Contributor

@dabrt dabrt commented Jan 30, 2026

Question Answer
JIRA Ticket IBX-9846
Versions 4.6, 5.0
Edition All

Describe Embeddings search API

Checklist

  • Text renders correctly
  • Text has been checked with vale
  • Description metadata is up to date
  • PHP code samples have been fixed with PHP CS fixer
  • Added link to this PR in relevant JIRA ticket or code PR

@github-actions
Copy link

github-actions bot commented Jan 30, 2026

@dabrt dabrt requested a review from mikadamczyk February 26, 2026 14:09
dabrt and others added 2 commits February 27, 2026 16:01
Co-authored-by: Mikolaj Adamczyk <mikadamczyk@gmail.com>
@dabrt dabrt requested a review from mikadamczyk February 27, 2026 16:12
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

code_samples/ change report

Before (on target branch)After (in current PR)

code_samples/api/public_php_api/src/Command/FindByTaxonomyEmbeddingCommand.php


code_samples/api/public_php_api/src/Command/FindByTaxonomyEmbeddingCommand.php

docs/search/search_api.md@406:``` php
docs/search/search_api.md@407:// ...
docs/search/search_api.md@408:[[= include_file('code_samples/api/public_php_api/src/Command/FindByTaxonomyEmbeddingCommand.php', 10, 21) =]]
docs/search/search_api.md@409:// ...
docs/search/search_api.md@410:[[= include_file('code_samples/api/public_php_api/src/Command/FindByTaxonomyEmbeddingCommand.php', 26, 28) =]]
docs/search/search_api.md@411:// ...
docs/search/search_api.md@412:[[= include_file('code_samples/api/public_php_api/src/Command/FindByTaxonomyEmbeddingCommand.php', 33, 78) =]]
docs/search/search_api.md@413:```

001⫶// ...
002⫶use Ibexa\Contracts\Core\Repository\SearchService;
003⫶use Ibexa\Contracts\Core\Repository\Values\Content\EmbeddingQueryBuilder;
004⫶use Ibexa\Contracts\Core\Repository\Values\Content\Query\Criterion\ContentTypeIdentifier;
005⫶use Ibexa\Contracts\Core\Repository\Values\Content\Search\SearchHit;
006⫶use Ibexa\Contracts\Taxonomy\Search\Query\Value\TaxonomyEmbedding;
007⫶use Symfony\Component\Console\Attribute\AsCommand;
008⫶use Symfony\Component\Console\Command\Command;
009⫶use Symfony\Component\Console\Input\InputInterface;
010⫶use Symfony\Component\Console\Output\OutputInterface;
011⫶use Symfony\Component\Console\Style\SymfonyStyle;
012⫶
013⫶
014⫶// ...
015⫶{
016⫶ public function __construct(private readonly SearchService $searchService)
017⫶
018⫶// ...
019⫶ InputInterface $input,
020⫶ OutputInterface $output
021⫶ ): int {
022⫶ $io = new SymfonyStyle($input, $output);
023⫶
024⫶ // Example embedding vector.
025⫶ // In a real-life scenario, generate it with an embedding provider
026⫶ // and make sure its dimensions match the configured model.
027⫶ $vector = [
028⫶ 0.0123,
029⫶ -0.9876,
030⫶ 0.4567,
031⫶ 0.1111,
032⫶ ];
033⫶
034⫶ $query = EmbeddingQueryBuilder::create()
035⫶ ->withEmbedding(new TaxonomyEmbedding($vector))
036⫶ ->setFilter(new ContentTypeIdentifier('article'))
037⫶ ->setLimit(10)
038⫶ ->setOffset(0)
039⫶ ->setPerformCount(true)
040⫶ ->build();
041⫶
042⫶ $result = $this->searchService->findContent($query);
043⫶
044⫶ $io->success(sprintf('Found %d items.', $result->totalCount));
045⫶
046⫶ foreach ($result->searchHits as $searchHit) {
047⫶ assert($searchHit instanceof SearchHit);
048⫶
049⫶ /** @var \Ibexa\Contracts\Core\Repository\Values\Content\Content $content */
050⫶ $content = $searchHit->valueObject;
051⫶ $contentInfo = $content->versionInfo->contentInfo;
052⫶
053⫶ $io->writeln(sprintf(
054⫶ '%d: %s',
055⫶ $contentInfo->id,
056⫶ $contentInfo->name
057⫶ ));
058⫶ }
059⫶
060⫶ return self::SUCCESS;
061⫶ }
062⫶}


code_samples/api/public_php_api/src/Service/TaxonomyEmbeddingSearchService.php


code_samples/api/public_php_api/src/Service/TaxonomyEmbeddingSearchService.php

docs/search/search_api.md@417:``` php
docs/search/search_api.md@418:// ...
docs/search/search_api.md@419:[[= include_file('code_samples/api/public_php_api/src/Service/TaxonomyEmbeddingSearchService.php', 10, 15) =]]
docs/search/search_api.md@420:// ...
docs/search/search_api.md@421:[[= include_file('code_samples/api/public_php_api/src/Service/TaxonomyEmbeddingSearchService.php', 16, 18) =]]
docs/search/search_api.md@422:// ...
docs/search/search_api.md@423:[[= include_file('code_samples/api/public_php_api/src/Service/TaxonomyEmbeddingSearchService.php', 25, 36) =]]
docs/search/search_api.md@424:```

001⫶// ...
002⫶use Ibexa\Contracts\Core\Repository\SearchService;
003⫶use Ibexa\Contracts\Core\Repository\Values\Content\Content;
004⫶use Ibexa\Contracts\Core\Repository\Values\Content\EmbeddingQueryBuilder;
005⫶use Ibexa\Contracts\Core\Repository\Values\Content\Search\SearchResult;
006⫶use Ibexa\Contracts\Taxonomy\Search\Query\Value\TaxonomyEmbedding;
007⫶
008⫶// ...
009⫶final class TaxonomyEmbeddingSearchService
010⫶{
011⫶
012⫶// ...
013⫶ * @return SearchResult<Content>
014⫶ */
015⫶ public function searchByEmbedding(array $vector): SearchResult
016⫶ {
017⫶ $query = EmbeddingQueryBuilder::create()
018⫶ ->withEmbedding(new TaxonomyEmbedding($vector))
019⫶ ->setLimit(10)
020⫶ ->setOffset(0)
021⫶ ->build();
022⫶
023⫶ return $this->searchService->findContent($query);

Download colorized diff

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 3, 2026

@mnocon mnocon requested a review from a team March 5, 2026 16:32
@@ -0,0 +1,77 @@
<?php

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the copyright, this code sample is meant to be used by other freely


Embedding queries have been introduced primarily to support the [Taxonomy suggestions](taxonomy.md#taxonomy-suggestions) feature, therefore embedding search integration is provided for `TaxonomyEmbedding`.
To use them in other scenarios that use a custom embedding value type, implement matching embedding
visitors for your search engine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong formatting

Image

$io = new SymfonyStyle($input, $output);

// Example embedding vector.
// In a real-life scenario, generate it with an embedding provider
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should show people how to do this.

Can we just inject embeddingProviderResolver here, and do:

            $embeddingProvider = $this->embeddingProviderResolver->resolve();
            $embeddings = $embeddingProvider->getEmbedding('example_content');

(and private EmbeddingProviderResolverInterface $embeddingProviderResolver) in the constructor

*/
public function searchByEmbedding(array $vector): SearchResult
{
$query = EmbeddingQueryBuilder::create()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference is the lack of content type filter?
I'd combine these two into one sample, to make the maintenence easier for us in the future


- [`Ibexa\Contracts\Core\Repository\Values\Content\EmbeddingQuery`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-EmbeddingQuery.html): Represents a semantic similarity search request.
It encapsulates an [Embedding](#embedding) instance and supports filtering, pagination, aggregations, and result counting through the same API as standard content queries.
Embedding queries do not support criteria, Sort Clauses, facet builders, or spellcheck
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd mention the search engines where they are available.

Solr 9, ES, but not Legacy Search?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Faceted searched is removed in v5, and deprecated in v4.6

Let's replace it with aggregations.

echo $customField->getType(); // "custom_embedding_type"
```

For more information, see [Embeddings reference](embeddings_reference.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd link to https://doc.ibexa.co/en/5.0/search/extensibility/index_custom_elasticsearch_data/ here (I don't see an example for Solr right now).

Once you have the field, you have to listen to the indexing event (ContentIndexCreateEvent) and actually attach it to the index

Models used to resolve embedding queries must be configured in [system configuration](configuration.md).
Each key defines the model's name, vector dimensionality, the field suffix used in the search index, and the embedding provider that generates vectors.

``` yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd mention here:

  1. The default_embedding_model property
  2. That the field_suffix must be unique for each model

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be:
Models used to resolve embedding queries must be configured in system configuration. Each entry defines the model name, vector dimensionality, the field suffix used in the search index, and the embedding provider that generates vectors. The default model is selected through default_embedding_model. Each model must use a unique field_suffix, as it becomes part of the indexed field name.`


- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderInterface.html): Generates embeddings

- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderRegistryInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderRegistryInterface.html): Lists available embedding providers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderRegistryInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderRegistryInterface.html): Lists available embedding providers
- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderRegistryInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderRegistryInterface.html): Lists available embedding providers or get one by its identifier

@@ -0,0 +1,50 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please link to the search_api page for an example, or just include the example here as well.

In general, there's a lot of duplicated content between these two pages (this one and search_api.md), and it's hard to say which one is the primary one - search api links here for more information, but it's search_api.md that contains more information 🤔

This one does not include:

  1. Description of EmbeddingQueryBuilder
  2. Examples

Do we need this page?

Comment on lines +386 to +387
- [EmbeddingQuery](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-EmbeddingQuery.html):
Represents a semantic similarity search request.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the : ? This is not a pattern that was used in the doc previously, ever.

Suggested change
- [EmbeddingQuery](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-EmbeddingQuery.html):
Represents a semantic similarity search request.
- [EmbeddingQuery](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-EmbeddingQuery.html) represents a semantic similarity search request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants