I have a vector of numbers:
numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,
453,435,324,34,456,56,567,65,34,435)
How can I have R count the number of times a value x appears in the vector?
转载于:https://stackoverflow.com/questions/1923273/counting-the-number-of-elements-with-the-values-of-x-in-a-vector
You can just use table()
:
> a <- table(numbers)
> a
numbers
4 5 23 34 43 54 56 65 67 324 435 453 456 567 657
2 1 2 2 1 1 2 1 2 1 3 1 1 1 1
Then you can subset it:
> a[names(a)==435]
435
3
Or convert it into a data.frame if you're more comfortable working with that:
> as.data.frame(table(numbers))
numbers Freq
1 4 2
2 5 1
3 23 2
4 34 2
...
here's one fast and dirty way:
x <- 23
length(subset(numbers, numbers==x))
I would probably do something like this
length(which(numbers==x))
But really, a better way is
table(numbers)
The most direct way is sum(numbers == x)
.
numbers == x
creates a logical vector which is TRUE at every location that x occurs, and when sum
ing, the logical vector is coerced to numeric which converts TRUE to 1 and FALSE to 0.
However, note that for floating point numbers it's better to use something like: sum(abs(numbers - x) < 1e-6)
.
There is a standard function in R for that
tabulate(numbers)
My preferred solution uses rle
, which will return a value (the label, x
in your example) and a length, which represents how many times that value appeared in sequence.
By combining rle
with sort
, you have an extremely fast way to count the number of times any value appeared. This can be helpful with more complex problems.
Example:
> numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,453,435,324,34,456,56,567,65,34,435)
> a <- rle(sort(numbers))
> a
Run Length Encoding
lengths: int [1:15] 2 1 2 2 1 1 2 1 2 1 ...
values : num [1:15] 4 5 23 34 43 54 56 65 67 324 ...
If the value you want doesn't show up, or you need to store that value for later, make a
a data.frame
.
> b <- data.frame(number=a$values, n=a$lengths)
> b
values n
1 4 2
2 5 1
3 23 2
4 34 2
5 43 1
6 54 1
7 56 2
8 65 1
9 67 2
10 324 1
11 435 3
12 453 1
13 456 1
14 567 1
15 657 1
I find it is rare that I want to know the frequency of one value and not all of the values, and rle seems to be the quickest way to get count and store them all.
There is also count(numbers)
from plyr
package. Much more convenient than table
in my opinion.
One more way i find convenient is:
numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,453,435,324,34,456,56,567,65,34,435)
(s<-summary (as.factor(numbers)))
This converts the dataset to factor, and then summary() gives us the control totals (counts of the unique values).
Output is:
4 5 23 34 43 54 56 65 67 324 435 453 456 567 657
2 1 2 2 1 1 2 1 2 1 3 1 1 1 1
This can be stored as dataframe if preferred.
as.data.frame(cbind(Number = names(s),Freq = s), stringsAsFactors=F, row.names = 1:length(s))
here row.names has been used to rename row names. without using row.names, column names in s are used as row names in new dataframe
Output is:
Number Freq
1 4 2
2 5 1
3 23 2
4 34 2
5 43 1
6 54 1
7 56 2
8 65 1
9 67 2
10 324 1
11 435 3
12 453 1
13 456 1
14 567 1
15 657 1
Using table but without comparing with names
:
numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435)
x <- 67
numbertable <- table(numbers)
numbertable[as.character(x)]
#67
# 2
table
is useful when you are using the counts of different elements several times. If you need only one count, use sum(numbers == x)
If you want to count the number of appearances subsequently, you can make use of the sapply
function:
index<-sapply(1:length(numbers),function(x)sum(numbers[1:x]==numbers[x]))
cbind(numbers, index)
Output:
numbers index
[1,] 4 1
[2,] 23 1
[3,] 4 2
[4,] 23 2
[5,] 5 1
[6,] 43 1
[7,] 54 1
[8,] 56 1
[9,] 657 1
[10,] 67 1
[11,] 67 2
[12,] 435 1
[13,] 453 1
[14,] 435 2
[15,] 324 1
[16,] 34 1
[17,] 456 1
[18,] 56 2
[19,] 567 1
[20,] 65 1
[21,] 34 2
[22,] 435 3
You can change the number to whatever you wish in following line
length(which(numbers == 4))
numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435 453,435,324,34,456,56,567,65,34,435)
> length(grep(435, numbers))
[1] 3
> length(which(435 == numbers))
[1] 3
> require(plyr)
> df = count(numbers)
> df[df$x == 435, ]
x freq
11 435 3
> sum(435 == numbers)
[1] 3
> sum(grepl(435, numbers))
[1] 3
> sum(435 == numbers)
[1] 3
> tabulate(numbers)[435]
[1] 3
> table(numbers)['435']
435
3
> length(subset(numbers, numbers=='435'))
[1] 3