问题是这样,官网首页搜索的时候想学习百度当输入
“chelizi” -》车厘子
“cherry”-》 车厘子
然后匹配es存储关键字“车厘子 或樱桃”
请问是否有相关的开源的jar包或者api, 当输入拼音的时候能转成汉字列表,如果输入的拼音是英文,则转换成英文
百度开发者上面查找相关的api,应该是有的
可能同一个拼音存在多种不同词语 不同声调,不同语境也是不同词语,用开放的云词库试试,参考输入法之类的词库
ES就可以满足你的需求,你所需要的应该是分词器,中文分词器;该作者又开发出了https://github.com/medcl/elasticsearch-analysis-pinyin,可以通过拼音查出中文结果
需要加配置:
{
"settings": {
"refresh_interval": "2s",
"number_of_shards": 5,
"number_of_replicas": 1,
"analysis": {
"filter": {
"edge_ngram_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
},
"pinyin_jianpin": {
"type": "pinyin",
"first_letter": "none",
"padding_char": ""
},
"pinyin_simple_filter": {
"type": "pinyin",
"keep_first_letter": true,
"keep_separate_first_letter": true,
"keep_full_pinyin": false,
"keep_original": false,
"limit_first_letter_length": 20,
"lowercase": true
},
"pinyin_full_filter": {
"type": "pinyin",
"keep_first_letter": false,
"keep_separate_first_letter": false,
"keep_full_pinyin": true,
"none_chinese_pinyin_tokenize": true,
"keep_original": false,
"limit_first_letter_length": 20,
"lowercase": true
}
},
"tokenizer": {
"ik_smart": {
"type": "ik",
"use_smart": true
}
},
"analyzer": {
"ngramIndexAnalyzer": {
"type": "custom",
"tokenizer": "ik_max_word",
"filter": [
"edge_ngram_filter",
"lowercase"
]
},
"ngramSearchAnalyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase"
]
},
"ikIndexAnalyzer": {
"type": "custom",
"tokenizer": "ik"
},
"ikSearchAnalyzer": {
"type": "custom",
"tokenizer": "ik"
},
"pinyinSimpleIndexAnalyzer": {
"tokenizer": "ik_max_word",
"filter": [
"pinyin_simple_filter",
"edge_ngram_filter",
"lowercase"
]
},
"pinyinSimpleSearchAnalyzer": {
"tokenizer": "whitespace",
"filter": [
"pinyin_simple_filter",
"lowercase"
]
},
"jianpinIndexAnalyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"pinyin_first_letter",
"edge_ngram_filter",
"lowercase"
]
},
"jianpinSearchAnalyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"pinyin_first_letter",
"lowercase"
]
},
"pinyinFullIndexAnalyzer": {
"tokenizer": "keyword",
"filter": [
"pinyin_full_filter",
"lowercase"
]
},
"pinyinFullSearchAnalyzer": {
"tokenizer": "whitespace",
"filter": [
"pinyin_full_filter",
"lowercase"
]
}
}
}
}
}
我感觉像是一个字典存储了这些,要么就是深度学习了。
你这种功能其实不就是输入法的功能嘛.根据拼音显示汉字
ES就可以满足你的需求,你所需要的应该是分词器,中文分词器;该作者又开发出了https://github.com/medcl/elasticsearch-analysis-pinyin,可以通过拼音查出中文结果
可以考虑去gitee或者github上搜索一下有没有相关的。