文本处理器(1)TreeMap<String,Integer>数据类型(2)split方法(3)实现文本内容的词和词频统计
TreeMap<String,Integer>
key存储词,value统计出现的次数。
对输入的一段字符串,通过split进行分割成字符串数组。
https://blog.csdn.net/qq_40639185/article/details/96035765
package test;
import java.util.*;
public class project1 {
public static void main(String []args) {
String text="Good morning.Have a good class."+"Have a good visit.Have fun!";
Map<String,Integer> map=new TreeMap<>();
String[] words=text.split("[ \n\t\r.,;:!?(){}]");
for(int i=0;i<words.length;i++) {
String key=words[i].toLowerCase();
if(key.length()>0) {
if(!map.containsKey(key)) {
map.put(key, 1);
}else
{
int value=map.get(key);
value++;
map.put(key, value);
}
}
}
Set<Map.Entry<String, Integer>> entrySet=map.entrySet();
for(Map.Entry<String, Integer> entry:entrySet)
System.out.println(entry.getKey()+"\t"+entry.getValue());
}
}
// 统计文件中单词个数,并找出次数最多的单词
final Path FILE = Paths.get("src", "test", "resources", "fileToCount.txt");
final int MIN = 5;
final int MAX = 10;
ConcurrentHashMap<String, Integer> words = new ConcurrentHashMap<>();
AIOFileReader.line(FILE)
// 过滤掉前14行
.filter(line -> !line.trim().isEmpty()).skip(14)
// 使用空格分隔
.flatMapMerge(line -> Query.of(line.split(" ")))
// 过滤单词
.filter(word -> word.length() > MIN && word.length() < MAX)
// 统计单词次数
.onNext((w, err) -> words.merge(w, 1, Integer::sum))
// 阻塞,直到文件统计完毕
.blockingSubscribe();
Map.Entry<String, ? extends Number> common = Collections.max(words.entrySet(),
Comparator.comparingInt(e -> e.getValue().intValue()));
assertEquals("Hokosa", common.getKey());
assertEquals(183, common.getValue().intValue());
详细请参考:AIOFileReaderTest