需要java的 es7.16版本向量检索api 和dsl语句
使用es自带的余弦相似度
支持1-6个的图片向量检索(向量已存在512维)
1-6个图片可同时参与检索,且是or的关系,需要分别返回得到倒叙前topn
可以设置得分,过滤指定分数以上的结果
可以取topn
参考:使用Java API实现ES 7.16版本的向量检索和DSL语句:
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.lucene.search.function.FunctionScoreQuery;
import org.elasticsearch.common.lucene.search.function.FunctionScoreQueryBuilder;
import org.elasticsearch.common.unit.Fuzziness;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.functionscore.ScoreFunctionBuilders;
import org.elasticsearch.index.query.functionscore.WeightBuilder;
import org.elasticsearch.rest.RestStatus;
import org.elasticsearch.script.ScriptType;
import org.elasticsearch.script.mustache.SearchTemplateRequestBuilder;
public class VectorSearch {
private RestHighLevelClient client;
public VectorSearch(RestHighLevelClient client) {
this.client = client;
}
public SearchResponse search(String index, String field, float[] vector, int topN, float minScore) throws IOException {
// 构建查询条件
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
for (int i = 0; i < vector.length; i++) {
FunctionScoreQueryBuilder.FilterFunctionBuilder[] functions = new FunctionScoreQueryBuilder.FilterFunctionBuilder[vector.length];
functions[i] = new FunctionScoreQueryBuilder.FilterFunctionBuilder(
ScoreFunctionBuilders.weightFactorFunction(vector[i]),
new WeightBuilder().setWeight(vector[i])
);
boolQuery.should(QueryBuilders.functionScoreQuery(
QueryBuilders.matchAllQuery(),
functions
));
}
// 设置查询参数
SearchRequest searchRequest = new SearchRequest(index);
searchRequest.source().query(boolQuery)
.size(topN)
.minScore(minScore)
.timeout(TimeValue.timeValueSeconds(10));
// 执行查询
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
if (searchResponse.status() != RestStatus.OK) {
throw new RuntimeException("Failed to execute search");
}
return searchResponse;
}
}
{
"query": {
"bool": {
"should": [
{
"function_score": {
"query": { "match_all": {} },
"functions": [
{ "weight_factor": { "field1": 0.1 } }
]
}
},
{
"function_score": {
"query": { "match_all": {} },
"functions": [
{ "weight_factor": { "field2": 0.2 } }
]
}
},
...
]
}
},
"_source" : false,
"size" : topN,
"min_score" : minScore
}
其中,field1
、field2
等表示不同的图片向量字段,可以根据实际情况进行修改。
在 Elasticsearch 7.16 版本中,可以使用 Java API 进行向量检索,其中包括了余弦相似度的计算。以下是一个示例代码,用于执行这个操作:
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.unit.Fuzziness;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.functionscore.ScoreFunctionBuilders;
import org.elasticsearch.index.query.functionscore.fieldvaluefactor.FieldValueFactorFunctionBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import java.io.IOException;
public class VectorSearchExample {
private static final String INDEX_NAME = "your_index_name";
private static final String VECTOR_FIELD_NAME = "your_vector_field_name";
private static final int TOP_N = 10;
public void runVectorSearch(RestHighLevelClient client, float[] targetVector) throws IOException {
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.functionScoreQuery(
QueryBuilders.matchAllQuery(),
ScoreFunctionBuilders.scriptFunction(
new FieldValueFactorFunctionBuilder(VECTOR_FIELD_NAME)
.setField(VECTOR_FIELD_NAME)
.setMissing(0)
)
).setMinScore(0));
sourceBuilder.fetchSource(false);
sourceBuilder.size(TOP_N);
sourceBuilder.timeout(new TimeValue(60_000));
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
for (SearchHit hit : searchResponse.getHits().getHits()) {
float score = hit.getScore();
String id = hit.getId();
System.out.println("Document ID: " + id + ", Score: " + score);
}
}
}
你需要将 your_index_name 替换为你实际的索引名称,将 your_vector_field_name 替换为包含向量的字段名称。targetVector 参数是你希望与索引中的向量进行相似度比较的目标向量。
这个示例使用了 SearchSourceBuilder 来构建搜索查询,并设置了函数评分查询,以计算余弦相似度得分。它还通过设置 fetchSource(false) 来关闭检索文档的 source 字段,以提高性能。size() 方法设置返回的结果数量,timeout() 方法设置搜索请求的超时时间。最后,通过循环遍历搜索结果,可以获取每个匹配文档的得分和 ID。
请确保在运行该代码前已经正确配置并连接到 Elasticsearch 的 REST 高级客户端 RestHighLevelClient。此外,还需要添加相应的 Maven 依赖,例如:
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.16.3</version>
</dependency>
请根据你的具体需求修改代码中的参数和逻辑。
参考
SpringBoot整合最新Elasticsearch Java API Client 7.16教程 https://blog.csdn.net/Yu_luoluo/article/details/122221237
ES 7.16 Java API Client 新特性(使用ElasticsearchClient连接操作ES完整demo) https://blog.csdn.net/Flexonyayayyay/article/details/122539569