如何计算清单项的出现次数?

Given an item, how can I count its occurrences in a list in Python?

转载于:https://stackoverflow.com/questions/2600191/how-to-count-the-occurrences-of-a-list-item

If you only want one item's count, use the count method:

>>> [1, 2, 3, 4, 1, 4, 1].count(1)
3

Don't use this if you want to count multiple items. Calling count in a loop requires a separate pass over the list for every count call, which can be catastrophic for performance. If you want to count all items, or even just multiple items, use Counter, as explained in the other answers.

list.count(x) returns the number of times x appears in a list

see: http://docs.python.org/tutorial/datastructures.html#more-on-lists

If you are using Python 2.7 or 3 and you want number of occurrences for each element:

>>> from collections import Counter
>>> z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red']
>>> Counter(z)
Counter({'blue': 3, 'red': 2, 'yellow': 1})
# Python >= 2.6 (defaultdict) && < 2.7 (Counter, OrderedDict)
from collections import defaultdict
def count_unsorted_list_items(items):
    """
    :param items: iterable of hashable items to count
    :type items: iterable

    :returns: dict of counts like Py2.7 Counter
    :rtype: dict
    """
    counts = defaultdict(int)
    for item in items:
        counts[item] += 1
    return dict(counts)


# Python >= 2.2 (generators)
def count_sorted_list_items(items):
    """
    :param items: sorted iterable of items to count
    :type items: sorted iterable

    :returns: generator of (item, count) tuples
    :rtype: generator
    """
    if not items:
        return
    elif len(items) == 1:
        yield (items[0], 1)
        return
    prev_item = items[0]
    count = 1
    for item in items[1:]:
        if prev_item == item:
            count += 1
        else:
            yield (prev_item, count)
            count = 1
            prev_item = item
    yield (item, count)
    return


import unittest
class TestListCounters(unittest.TestCase):
    def test_count_unsorted_list_items(self):
        D = (
            ([], []),
            ([2], [(2,1)]),
            ([2,2], [(2,2)]),
            ([2,2,2,2,3,3,5,5], [(2,4), (3,2), (5,2)]),
            )
        for inp, exp_outp in D:
            counts = count_unsorted_list_items(inp) 
            print inp, exp_outp, counts
            self.assertEqual(counts, dict( exp_outp ))

        inp, exp_outp = UNSORTED_WIN = ([2,2,4,2], [(2,3), (4,1)])
        self.assertEqual(dict( exp_outp ), count_unsorted_list_items(inp) )


    def test_count_sorted_list_items(self):
        D = (
            ([], []),
            ([2], [(2,1)]),
            ([2,2], [(2,2)]),
            ([2,2,2,2,3,3,5,5], [(2,4), (3,2), (5,2)]),
            )
        for inp, exp_outp in D:
            counts = list( count_sorted_list_items(inp) )
            print inp, exp_outp, counts
            self.assertEqual(counts, exp_outp)

        inp, exp_outp = UNSORTED_FAIL = ([2,2,4,2], [(2,3), (4,1)])
        self.assertEqual(exp_outp, list( count_sorted_list_items(inp) ))
        # ... [(2,2), (4,1), (2,1)]

To count the number of diverse elements having a common type:

li = ['A0','c5','A8','A2','A5','c2','A3','A9']

print sum(1 for el in li if el[0]=='A' and el[1] in '01234')

gives

3 , not 6

Another way to get the number of occurrences of each item, in a dictionary:

dict((i, a.count(i)) for i in a)

I had this problem today and rolled my own solution before I thought to check SO. This:

dict((i,a.count(i)) for i in a)

is really, really slow for large lists. My solution

def occurDict(items):
    d = {}
    for i in items:
        if i in d:
            d[i] = d[i]+1
        else:
            d[i] = 1
return d

is actually a bit faster than the Counter solution, at least for Python 2.7.

If you want to count all values at once you can do it very fast using numpy arrays and bincount as follows

import numpy as np
a = np.array([1, 2, 3, 4, 1, 4, 1])
np.bincount(a)

which gives

>>> array([0, 3, 1, 1, 2])

Counting the occurrences of one item in a list

For counting the occurrences of just one list item you can use count()

>>> l = ["a","b","b"]
>>> l.count("a")
1
>>> l.count("b")
2

Counting the occurrences of all items in a list is also known as "tallying" a list, or creating a tally counter.

Counting all items with count()

To count the occurrences of items in l one can simply use a list comprehension and the count() method

[[x,l.count(x)] for x in set(l)]

(or similarly with a dictionary dict((x,l.count(x)) for x in set(l)))

Example:

>>> l = ["a","b","b"]
>>> [[x,l.count(x)] for x in set(l)]
[['a', 1], ['b', 2]]
>>> dict((x,l.count(x)) for x in set(l))
{'a': 1, 'b': 2}

Counting all items with Counter()

Alternatively, there's the faster Counter class from the collections library

Counter(l)

Example:

>>> l = ["a","b","b"]
>>> from collections import Counter
>>> Counter(l)
Counter({'b': 2, 'a': 1})

How much faster is Counter?

I checked how much faster Counter is for tallying lists. I tried both methods out with a few values of n and it appears that Counter is faster by a constant factor of approximately 2.

Here is the script I used:

from __future__ import print_function
import timeit

t1=timeit.Timer('Counter(l)', \
                'import random;import string;from collections import Counter;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]'
                )

t2=timeit.Timer('[[x,l.count(x)] for x in set(l)]',
                'import random;import string;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]'
                )

print("Counter(): ", t1.repeat(repeat=3,number=10000))
print("count():   ", t2.repeat(repeat=3,number=10000)

And the output:

Counter():  [0.46062711701961234, 0.4022796869976446, 0.3974247490405105]
count():    [7.779430688009597, 7.962715800967999, 8.420845870045014]

Given an item, how can I count its occurrences in a list in Python?

Here's an example list:

>>> l = list('aaaaabbbbcccdde')
>>> l
['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'e']

list.count

There's the list.count method

>>> l.count('b')
4

This works fine for any list. Tuples have this method as well:

>>> t = tuple('aabbbffffff')
>>> t
('a', 'a', 'b', 'b', 'b', 'f', 'f', 'f', 'f', 'f', 'f')
>>> t.count('f')
6

collections.Counter

And then there's collections.Counter. You can dump any iterable into a Counter, not just a list, and the Counter will retain a data structure of the counts of the elements.

Usage:

>>> from collections import Counter
>>> c = Counter(l)
>>> c['b']
4

Counters are based on Python dictionaries, their keys are the elements, so the keys need to be hashable. They are basically like sets that allow redundant elements into them.

Further usage of collections.Counter

You can add or subtract with iterables from your counter:

>>> c.update(list('bbb'))
>>> c['b']
7
>>> c.subtract(list('bbb'))
>>> c['b']
4

And you can do multi-set operations with the counter as well:

>>> c2 = Counter(list('aabbxyz'))
>>> c - c2                   # set difference
Counter({'a': 3, 'c': 3, 'b': 2, 'd': 2, 'e': 1})
>>> c + c2                   # addition of all elements
Counter({'a': 7, 'b': 6, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1})
>>> c | c2                   # set union
Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1})
>>> c & c2                   # set intersection
Counter({'a': 2, 'b': 2})

Why not pandas?

Another answer suggests:

Why not use pandas?

Pandas is a common library, but it's not in the standard library. Adding it as a requirement is non-trivial.

There are builtin solutions for this use-case in the list object itself as well as in the standard library.

If your project does not already require pandas, it would be foolish to make it a requirement just for this functionality.

You can also use countOf method of a built-in module operator.

>>> import operator
>>> operator.countOf([1, 2, 3, 4, 1, 4, 1], 1)
3

Why not using Pandas?

import pandas as pd

l = ['a', 'b', 'c', 'd', 'a', 'd', 'a']

# converting the list to a Series and counting the values
my_count = pd.Series(l).value_counts()
my_count

Output:

a    3
d    2
b    1
c    1
dtype: int64

If you are looking for a count of a particular element, say a, try:

my_count['a']

Output:

3
from collections import Counter
country=['Uruguay', 'Mexico', 'Uruguay', 'France', 'Mexico']
count_country = Counter(country)
output_list= [] 

for i in count_country:
    output_list.append([i,count_country[i]])
print output_list

Output list:

[['Mexico', 2], ['France', 1], ['Uruguay', 2]]
sum([1 for elem in <yourlist> if elem==<your_value>])

This will return the amount of occurences of your_value

I've compared all suggested solutions (and a few new ones) with perfplot (a small project of mine).

Counting one item

For large enough arrays, it turns out that

numpy.sum(numpy.array(a) == 1) 

is slightly faster than the other solutions.

enter image description here

Counting all items

As established before,

numpy.bincount(a)

is what you want.

enter image description here


Code to reproduce the plots:

from collections import Counter
from collections import defaultdict
import numpy
import operator
import pandas
import perfplot


def counter(a):
    return Counter(a)


def count(a):
    return dict((i, a.count(i)) for i in set(a))


def bincount(a):
    return numpy.bincount(a)


def pandas_value_counts(a):
    return pandas.Series(a).value_counts()


def occur_dict(a):
    d = {}
    for i in a:
        if i in d:
            d[i] = d[i]+1
        else:
            d[i] = 1
    return d


def count_unsorted_list_items(items):
    counts = defaultdict(int)
    for item in items:
        counts[item] += 1
    return dict(counts)


def operator_countof(a):
    return dict((i, operator.countOf(a, i)) for i in set(a))


perfplot.show(
    setup=lambda n: list(numpy.random.randint(0, 100, n)),
    n_range=[2**k for k in range(20)],
    kernels=[
        counter, count, bincount, pandas_value_counts, occur_dict,
        count_unsorted_list_items, operator_countof
        ],
    equality_check=None,
    logx=True,
    logy=True,
    )

2.

from collections import Counter
from collections import defaultdict
import numpy
import operator
import pandas
import perfplot


def counter(a):
    return Counter(a)


def count(a):
    return dict((i, a.count(i)) for i in set(a))


def bincount(a):
    return numpy.bincount(a)


def pandas_value_counts(a):
    return pandas.Series(a).value_counts()


def occur_dict(a):
    d = {}
    for i in a:
        if i in d:
            d[i] = d[i]+1
        else:
            d[i] = 1
    return d


def count_unsorted_list_items(items):
    counts = defaultdict(int)
    for item in items:
        counts[item] += 1
    return dict(counts)


def operator_countof(a):
    return dict((i, operator.countOf(a, i)) for i in set(a))


perfplot.show(
    setup=lambda n: list(numpy.random.randint(0, 100, n)),
    n_range=[2**k for k in range(20)],
    kernels=[
        counter, count, bincount, pandas_value_counts, occur_dict,
        count_unsorted_list_items, operator_countof
        ],
    equality_check=None,
    logx=True,
    logy=True,
    )

If you can use pandas, then value_counts is there for rescue.

>>> import pandas as pd
>>> a = [1, 2, 3, 4, 1, 4, 1]
>>> pd.Series(a).value_counts()
1    3
4    2
3    1
2    1
dtype: int64

It automatically sorts the result based on frequency as well.

If you want the result to be in a list of list, do as below

>>> pd.Series(a).value_counts().reset_index().values.tolist()
[[1, 3], [4, 2], [3, 1], [2, 1]]

May not be the most efficient, requires an extra pass to remove duplicates.

Functional implementation :

arr = np.array(['a','a','b','b','b','c'])
print(set(map(lambda x  : (x , list(arr).count(x)) , arr)))

returns :

{('c', 1), ('b', 3), ('a', 2)}

or return as dict :

print(dict(map(lambda x  : (x , list(arr).count(x)) , arr)))

returns :

{'b': 3, 'c': 1, 'a': 2}

if you want a number of occurrences for the particular element:

>>> from collections import Counter
>>> z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red']
>>> single_occurrences = Counter(z)
>>> print(single_occurrences.get("blue"))
3
>>> print(single_occurrences.values())
dict_values([3, 2, 1])
def countfrequncyinarray(arr1):
    r=len(arr1)
    return {i:arr1.count(i) for i in range(1,r+1)}
arr1=[4,4,4,4]
a=countfrequncyinarray(arr1)
print(a)

It was suggested to use numpy's bincount, however it works only for 1d arrays with non-negative integers. Also, the resulting array might be confusing (it contains the occurrences of the integers from min to max of the original list, and sets to 0 the missing integers).

A better way to do it with numpy is to use the unique function with the attribute return_counts set to True. It returns a tuple with an array of the unique values and an array of the occurrences of each unique value.

# a = [1, 1, 0, 2, 1, 0, 3, 3]
a_uniq, counts = np.unique(a, return_counts=True)  # array([0, 1, 2, 3]), array([2, 3, 1, 2]

and then we can pair them as

dict(zip(a_uniq, counts))  # {0: 2, 1: 3, 2: 1, 3: 2}

It also works with other data types and "2d lists", e.g.

>>> a = [['a', 'b', 'b', 'b'], ['a', 'c', 'c', 'a']]
>>> dict(zip(*np.unique(a, return_counts=True)))
{'a': 3, 'b': 3, 'c': 2}

Count of all elements with itertools.groupby()

Antoher possiblity for getting the count of all elements in the list could be by means of itertools.groupby().

With "duplicate" counts

from itertools import groupby

L = ['a', 'a', 'a', 't', 'q', 'a', 'd', 'a', 'd', 'c']  # Input list

counts = [(i, len(list(c))) for i,c in groupby(L)]      # Create value-count pairs as list of tuples 
print(counts)

Returns

[('a', 3), ('t', 1), ('q', 1), ('a', 1), ('d', 1), ('a', 1), ('d', 1), ('c', 1)]

Notice how it combined the first three a's as the first group, while other groups of a are present further down the list. This happens because the input list L was not sorted. This can be a benefit sometimes if the groups should in fact be separate.

With unique counts

If unique group counts are desired, just sort the input list:

counts = [(i, len(list(c))) for i,c in groupby(sorted(L))]
print(counts)

Returns

[('a', 5), ('c', 1), ('d', 2), ('q', 1), ('t', 1)]