如何统计一个文本中的单词出现次数并且按照字典排序输出
主要逻辑如下:如有帮助,请采纳一下,谢谢。
1.打开文件,读取全部内容
2.用空格将字符串分割,并保存到vector,分割字符串时判断字符串末尾是否是.或者,结尾,如果是,删除.或者,
3.遍历vector,并将字符及其数量插入map中。逻辑如下:
vector<string> vv;//保存分割字符串
map<string,int> mapString; //保存字符串及其数量
map<string,int>::iterator it;
for (int i =0; i < vv.size(); i++)
{
it = mapString.find(vv.at(i));
if (it == mapString.end())
{
mapString.insert(pair<string,int>(vv.at(i),1));
}else
{
int cnt = it->second + 1;
mapString.erase(it);
mapString.insert(pair<string,int>(vv.at(i),cnt));
}
}
4.因map自带排序功能,直接遍历输出即为需要的输出。
网上抄了一大段文字做测试。
先把文本全部转为小写。避免同以各单词因大小写不同而当作不同单词。
简历以各map<string, int>的单词计数器 word_count。
然后从头到尾扫面文本,跳过非字母字符(防止变电符号被当作单词部分,比如 Tom said: "I like chocolate". 不会把 "I 当单词,也不会把chocolate" 当单词),把连续的字母字符取出当一个单词存入word_count计数器加1。循环知道文本结束。
输出按字典顺序的单词和计数。
代码和测试结果如下:
#include <iostream>
#include <map>
#include <cctype>
#include <algorithm>
using namespace std;
void count_words(const string& text, map<string, int>& word_count) {
const char* p = text.data();
const char* t = p + text.length();
while (p < t) {
// skip non-alphabet characters
while (p < t && !isalpha(*p))
++p;
// collect the word starting position p
const char* q = p;
while (q < t && isalpha(*q))
++q;
if (q > p) {
string word(p, q - p);
++word_count[word];
p = q;
}
}
}
int main()
{
string text("\
C++ Tutorial\
\
C++ is a middle-level programming language developed by Bjarne Stroustrup starting in 1979 at Bell Labs. C++ runs on a variety of platforms, such as Windows, Mac OS, and the various versions of UNIX. This C++ tutorial adopts a simple and practical approach to describe the concepts of C++ for beginners to advanded software engineers.\
\
Why to Learn C++\
C++ is a MUST for students and working professionals to become a great Software Engineer. I will list down some of the key advantages of learning C++:\
\
C++ is very close to hardware, so you get a chance to work at a low level which gives you lot of control in terms of memory management, better performance and finally a robust software development.\
\
C++ programming gives you a clear understanding about Object Oriented Programming. You will understand low level implementation of polymorphism when you will implement virtual tables and virtual table pointers, or dynamic type identification.\
\
C++ is one of the every green programming languages and loved by millions of software developers. If you are a great C++ programmer then you will never sit without work and more importantly you will get highly paid for your work.\
\
C++ is the most widely used programming languages in application and system programming. So you can choose your area of interest of software development.\
\
C++ really teaches you the difference between compiler, linker and loader, different data types, storage classes, variable types their scopes etc.\
\
There are 1000s of good reasons to learn C++ Programming. But one thing for sure, to learn any programming language, not only C++, you just need to code, and code and finally code until you become expert.\
\
Hello World using C++\
Just to give you a little excitement about C++ programming, I'm going to give you a small conventional C++ Hello World program, You can try it using Demo link\
\
C++ is a super set of C programming with additional implementation of object-oriented concepts.\
\
Live Demo\
#include <iostream>\
using namespace std;\
\
// main() is where program execution begins.\
int main() {\
cout << \"Hello World\"; // prints Hello World\
return 0;\
}\
There are many C++ compilers available which you can use to compile and run above mentioned program:\
\
Apple C++. Xcode\
\
Bloodshed Dev-C++\
\
Clang C++\
\
Cygwin (GNU C++)\
\
Mentor Graphics\
\
MINGW - \"Minimalist GNU for Windows\"\
\
GNU CC source\
\
IBM C++\
\
Intel C++\
\
Microsoft Visual C++\
\
Oracle C++\
\
HP C++\
\
It is really impossible to give a complete list of all the available compilers. The C++ world is just too large and too much new is happening.\
\
Applications of C++ Programming\
As mentioned before, C++ is one of the most widely used programming languages. It has it's presence in almost every area of software development. I'm going to list few of them here:\
\
Application Software Development - C++ programming has been used in developing almost all the major Operating Systems like Windows, Mac OSX and Linux. Apart from the operating systems, the core part of many browsers like Mozilla Firefox and Chrome have been written using C++. C++ also has been used in developing the most popular database system called MySQL.\
\
Programming Languages Development - C++ has been used extensively in developing new programming languages like C#, Java, JavaScript, Perl, UNIX’s C Shell, PHP and Python, and Verilog etc.\
\
Computation Programming - C++ is the best friends of scientists because of fast speed and computational efficiencies.\
\
Games Development - C++ is extremely fast which allows programmers to do procedural programming for CPU intensive functions and provides greater control over hardware, because of which it has been widely used in development of gaming engines.\
\
Embedded System - C++ is being heavily used in developing Medical and Engineering Applications like softwares for MRI machines, high-end CAD/CAM systems etc.\
\
This list goes on, there are various areas where software developers are happily using C++ to provide great softwares. I highly recommend you to learn C++ and contribute great softwares to the community.\
");
// Convert the text to lower cases
for_each(text.begin(), text.end(), [](char & c){
c = tolower(c);
});
// collect and count words from text
map<string, int> word_count;
count_words(text, word_count);
for (auto p: word_count)
cout << p.first << ": " << p.second << endl;
return 0;
}
// Output:
a: 14
about: 2
above: 1
additional: 1
adopts: 1
advanded: 1
advantages: 1
all: 2
allows: 1
almost: 2
also: 1
and: 21
any: 1
apart: 1
apple: 1
application: 2
applications: 2
approach: 1
are: 5
area: 2
areas: 1
as: 1
at: 2
available: 2
because: 2
become: 2
been: 5
before: 1
beginners: 1
begins: 1
being: 1
bell: 1
best: 1
better: 1
between: 1
bjarne: 1
browsers: 1
but: 1
by: 2
c: 43
cad: 1
called: 1
cam: 1
can: 3
cc: 1
chance: 1
choose: 1
chrome: 1
clang: 1
classes: 1
clear: 1
close: 1
code: 3
community: 1
compile: 1
compiler: 1
compilers: 2
complete: 1
computation: 1
computational: 1
concepts: 2
contribute: 1
control: 2
conventional: 1
core: 1
cout: 1
cpu: 1
cygwin: 1
data: 1
database: 1
demo: 2
describe: 1
dev: 1
developed: 1
developers: 2
developing: 4
development: 7
difference: 1
different: 1
do: 1
down: 1
dynamic: 1
efficiencies: 1
embedded: 1
end: 1
engineer: 1
engineering: 1
engineers: 1
engines: 1
etc: 3
every: 2
excitement: 1
execution: 1
expert: 1
extensively: 1
extremely: 1
fast: 2
few: 1
finally: 2
firefox: 1
for: 7
friends: 1
from: 1
functions: 1
games: 1
gaming: 1
get: 2
give: 3
gives: 2
gnu: 3
goes: 1
going: 2
good: 1
graphicsmingw: 1
great: 4
greater: 1
green: 1
happening: 1
happily: 1
hardware: 2
has: 5
have: 1
heavily: 1
hello: 4
here: 1
high: 1
highly: 2
hp: 1
i: 4
identification: 1
if: 1
implement: 1
implementation: 2
importantly: 1
impossible: 1
in: 9
include: 1
int: 1
intel: 1
intensive: 1
interest: 1
iostream: 1
is: 14
it: 5
java: 1
javascript: 1
just: 3
key: 1
labs: 1
language: 2
languages: 5
large: 1
learn: 4
learning: 1
level: 3
like: 4
linkc: 1
linker: 1
linux: 1
list: 4
little: 1
live: 1
loader: 1
lot: 1
loved: 1
low: 2
m: 2
mac: 2
machines: 1
main: 2
major: 1
management: 1
many: 2
medical: 1
memory: 1
mentioned: 2
mentor: 1
microsoft: 1
middle: 1
millions: 1
minimalist: 1
more: 1
most: 3
mozilla: 1
mri: 1
much: 1
must: 1
mysql: 1
namespace: 1
need: 1
never: 1
new: 2
not: 1
object: 2
of: 25
on: 2
one: 3
only: 1
operating: 2
or: 1
oracle: 1
oriented: 2
os: 1
osx: 1
over: 1
paid: 1
part: 1
performance: 1
perl: 1
php: 1
platforms: 1
pointers: 1
polymorphism: 1
popular: 1
practical: 1
presence: 1
prints: 1
procedural: 1
professionals: 1
program: 3
programmer: 1
programmers: 1
programming: 16
programmingas: 1
provide: 1
provides: 1
python: 1
really: 2
reasons: 1
recommend: 1
return: 1
robust: 1
run: 1
runs: 1
s: 3
scientists: 1
scopes: 1
set: 1
shell: 1
simple: 1
sit: 1
small: 1
so: 2
software: 8
softwares: 3
some: 1
sourceibm: 1
speed: 1
starting: 1
std: 1
storage: 1
stroustrup: 1
students: 1
such: 1
super: 1
sure: 1
system: 3
systems: 3
table: 1
tables: 1
teaches: 1
terms: 1
the: 15
their: 1
them: 1
then: 1
there: 3
thing: 1
this: 2
to: 18
too: 2
try: 1
tutorial: 1
tutorialc: 1
type: 1
types: 2
understand: 1
understanding: 1
unix: 2
until: 1
use: 1
used: 7
using: 5
variable: 1
variety: 1
various: 2
verilog: 1
versions: 1
very: 1
virtual: 2
visual: 1
when: 1
where: 2
which: 4
why: 1
widely: 3
will: 5
windows: 3
with: 1
without: 1
work: 3
working: 1
world: 5
written: 1
xcodebloodshed: 1
you: 17
your: 2