es7.16版本向量检索api java

需要java的 es7.16版本向量检索api 和dsl语句
使用es自带的余弦相似度
支持1-6个的图片向量检索(向量已存在512维)
1-6个图片可同时参与检索,且是or的关系,需要分别返回得到倒叙前topn
可以设置得分,过滤指定分数以上的结果
可以取topn

参考:使用Java API实现ES 7.16版本的向量检索和DSL语句:

  1. 向量检索API
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.lucene.search.function.FunctionScoreQuery;
import org.elasticsearch.common.lucene.search.function.FunctionScoreQueryBuilder;
import org.elasticsearch.common.unit.Fuzziness;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.functionscore.ScoreFunctionBuilders;
import org.elasticsearch.index.query.functionscore.WeightBuilder;
import org.elasticsearch.rest.RestStatus;
import org.elasticsearch.script.ScriptType;
import org.elasticsearch.script.mustache.SearchTemplateRequestBuilder;

public class VectorSearch {
    private RestHighLevelClient client;

    public VectorSearch(RestHighLevelClient client) {
        this.client = client;
    }

    public SearchResponse search(String index, String field, float[] vector, int topN, float minScore) throws IOException {
        // 构建查询条件
        BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
        for (int i = 0; i < vector.length; i++) {
            FunctionScoreQueryBuilder.FilterFunctionBuilder[] functions = new FunctionScoreQueryBuilder.FilterFunctionBuilder[vector.length];
            functions[i] = new FunctionScoreQueryBuilder.FilterFunctionBuilder(
                    ScoreFunctionBuilders.weightFactorFunction(vector[i]),
                    new WeightBuilder().setWeight(vector[i])
            );
            boolQuery.should(QueryBuilders.functionScoreQuery(
                    QueryBuilders.matchAllQuery(),
                    functions
            ));
        }

        // 设置查询参数
        SearchRequest searchRequest = new SearchRequest(index);
        searchRequest.source().query(boolQuery)
                .size(topN)
                .minScore(minScore)
                .timeout(TimeValue.timeValueSeconds(10));

        // 执行查询
        SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
        if (searchResponse.status() != RestStatus.OK) {
            throw new RuntimeException("Failed to execute search");
        }
        
        return searchResponse; 
    }
}
  1. DSL语句
{
  "query": {
    "bool": {
      "should": [
        {
          "function_score": {
            "query": { "match_all": {} },
            "functions": [
              { "weight_factor": { "field1": 0.1 } }
            ]
          }
        },
        {
          "function_score": {
            "query": { "match_all": {} },
            "functions": [
              { "weight_factor": { "field2": 0.2 } }
            ]
          }
        },
       ...
      ]
    }
  },
  "_source" : false,
  "size" : topN,
  "min_score" : minScore
}

其中,field1field2等表示不同的图片向量字段,可以根据实际情况进行修改。

在 Elasticsearch 7.16 版本中,可以使用 Java API 进行向量检索,其中包括了余弦相似度的计算。以下是一个示例代码,用于执行这个操作:

import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.unit.Fuzziness;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.functionscore.ScoreFunctionBuilders;
import org.elasticsearch.index.query.functionscore.fieldvaluefactor.FieldValueFactorFunctionBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;

import java.io.IOException;

public class VectorSearchExample {
    private static final String INDEX_NAME = "your_index_name";
    private static final String VECTOR_FIELD_NAME = "your_vector_field_name";
    private static final int TOP_N = 10;

    public void runVectorSearch(RestHighLevelClient client, float[] targetVector) throws IOException {
        SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

        sourceBuilder.query(QueryBuilders.functionScoreQuery(
                QueryBuilders.matchAllQuery(),
                ScoreFunctionBuilders.scriptFunction(
                        new FieldValueFactorFunctionBuilder(VECTOR_FIELD_NAME)
                                .setField(VECTOR_FIELD_NAME)
                                .setMissing(0)
                )
        ).setMinScore(0));

        sourceBuilder.fetchSource(false);
        sourceBuilder.size(TOP_N);
        sourceBuilder.timeout(new TimeValue(60_000));

        searchRequest.source(sourceBuilder);

        SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

        for (SearchHit hit : searchResponse.getHits().getHits()) {
            float score = hit.getScore();
            String id = hit.getId();
            System.out.println("Document ID: " + id + ", Score: " + score);
        }
    }
}

你需要将 your_index_name 替换为你实际的索引名称,将 your_vector_field_name 替换为包含向量的字段名称。targetVector 参数是你希望与索引中的向量进行相似度比较的目标向量。

这个示例使用了 SearchSourceBuilder 来构建搜索查询,并设置了函数评分查询,以计算余弦相似度得分。它还通过设置 fetchSource(false) 来关闭检索文档的 source 字段,以提高性能。size() 方法设置返回的结果数量,timeout() 方法设置搜索请求的超时时间。最后,通过循环遍历搜索结果,可以获取每个匹配文档的得分和 ID。

请确保在运行该代码前已经正确配置并连接到 Elasticsearch 的 REST 高级客户端 RestHighLevelClient。此外,还需要添加相应的 Maven 依赖,例如:


<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>7.16.3</version>
</dependency>

请根据你的具体需求修改代码中的参数和逻辑。

参考
SpringBoot整合最新Elasticsearch Java API Client 7.16教程 https://blog.csdn.net/Yu_luoluo/article/details/122221237
ES 7.16 Java API Client 新特性(使用ElasticsearchClient连接操作ES完整demo) https://blog.csdn.net/Flexonyayayyay/article/details/122539569