在熊猫中重命名列

I have a DataFrame using pandas and column labels that I need to edit to replace the original column labels.

I'd like to change the column names in a DataFrame A where the original column names are:

['$a', '$b', '$c', '$d', '$e'] 

to

['a', 'b', 'c', 'd', 'e'].

I have the edited column names stored it in a list, but I don't know how to replace the column names.

转载于:https://stackoverflow.com/questions/11346283/renaming-columns-in-pandas

Just assign it to the .columns attribute:

>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df.columns = ['a', 'b']
>>> df
   a   b
0  1  10
1  2  20

Use the df.rename() function and refer the columns to be renamed. Not all the columns have to be renamed:

df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
# Or rename the existing DataFrame (rather than creating a copy) 
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)

The rename method can take a function, for example:

In [11]: df.columns
Out[11]: Index([u'$a', u'$b', u'$c', u'$d', u'$e'], dtype=object)

In [12]: df.rename(columns=lambda x: x[1:], inplace=True)

In [13]: df.columns
Out[13]: Index([u'a', u'b', u'c', u'd', u'e'], dtype=object)

Since you only want to remove the $ sign in all column names, you could just do:

df = df.rename(columns=lambda x: x.replace('$', ''))

OR

df.rename(columns=lambda x: x.replace('$', ''), inplace=True)
old_names = ['$a', '$b', '$c', '$d', '$e'] 
new_names = ['a', 'b', 'c', 'd', 'e']
df.rename(columns=dict(zip(old_names, new_names)), inplace=True)

This way you can manually edit the new_names as you wish. Works great when you need to rename only a few columns to correct mispellings, accents, remove special characters etc.

As documented in http://pandas.pydata.org/pandas-docs/stable/text.html:

df.columns = df.columns.str.replace('$','')
df.columns = ['a', 'b', 'c', 'd', 'e']

It will replace the existing names with the names you provide, in the order you provide.

Column names vs Names of Series

I would like to explain a bit what happens behind the scenes.

Dataframes are a set of Series.

Series in turn are an extension of a numpy.array

numpy.arrays have a property .name

This is the name of the series. It is seldom that pandas respects this attribute, but it lingers in places and can be used to hack some pandas behaviors.

Naming the list of columns

A lot of answers here talks about the df.columns attribute being a list when in fact it is a Series. This means it has a .name attribute.

This is what happens if you decide to fill in the name of the columns Series:

df.columns = ['column_one', 'column_two']
df.columns.names = ['name of the list of columns']
df.index.names = ['name of the index']

name of the list of columns     column_one  column_two
name of the index       
0                                    4           1
1                                    5           2
2                                    6           3

Note that the name of the index always comes one column lower.

Artifacts that linger

The .name attribute lingers on sometimes. If you set df.columns = ['one', 'two'] then the df.one.name will be 'one'.

If you set df.one.name = 'three' then df.columns will still give you ['one', 'two'], and df.one.name will give you 'three'

BUT

pd.DataFrame(df.one) will return

    three
0       1
1       2
2       3

Because pandas reuses the .name of the already defined Series.

Multi level column names

Pandas has ways of doing multi layered column names. There is not so much magic involved but I wanted to cover this in my answer too since I don't see anyone picking up on this here.

    |one            |
    |one      |two  |
0   |  4      |  1  |
1   |  5      |  2  |
2   |  6      |  3  |

This is easily achievable by setting columns to lists, like this:

df.columns = [['one', 'one'], ['one', 'two']]

Pandas 0.21+ Answer

There have been some significant updates to column renaming in version 0.21.

  • The rename method has added the axis parameter which may be set to columns or 1. This update makes this method match the rest of the pandas API. It still has the index and columns parameters but you are no longer forced to use them.
  • The set_axis method with the inplace set to False enables you to rename all the index or column labels with a list.

Examples for Pandas 0.21+

Construct sample DataFrame:

df = pd.DataFrame({'$a':[1,2], '$b': [3,4], 
                   '$c':[5,6], '$d':[7,8], 
                   '$e':[9,10]})

   $a  $b  $c  $d  $e
0   1   3   5   7   9
1   2   4   6   8  10

Using rename with axis='columns' or axis=1

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis='columns')

or

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis=1)

Both result in the following:

   a  b  c  d   e
0  1  3  5  7   9
1  2  4  6  8  10

It is still possible to use the old method signature:

df.rename(columns={'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'})

The rename function also accepts functions that will be applied to each column name.

df.rename(lambda x: x[1:], axis='columns')

or

df.rename(lambda x: x[1:], axis=1)

Using set_axis with a list and inplace=False

You can supply a list to the set_axis method that is equal in length to the number of columns (or index). Currently, inplace defaults to True, but inplace will be defaulted to False in future releases.

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis='columns', inplace=False)

or

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis=1, inplace=False)

Why not use df.columns = ['a', 'b', 'c', 'd', 'e']?

There is nothing wrong with assigning columns directly like this. It is a perfectly good solution.

The advantage of using set_axis is that it can be used as part of a method chain and that it returns a new copy of the DataFrame. Without it, you would have to store your intermediate steps of the chain to another variable before reassigning the columns.

# new for pandas 0.21+
df.some_method1()
  .some_method2()
  .set_axis()
  .some_method3()

# old way
df1 = df.some_method1()
        .some_method2()
df1.columns = columns
df1.some_method3()

I think this method is useful:

df.rename(columns={"old_column_name1":"new_column_name1", "old_column_name2":"new_column_name2"})

This method allows you to change column names individually.