我想合并df1,df2两个表格。
df1 = pd.read_csv("https://raw.githubusercontent.com/uvacw/teaching-bdaca/main/12ec-course/week03/exercises/wozwaarde-clean.csv")
df1 = df1.melt(id_vars=['wijk', 'code', 'stadsdeel'],
var_name='year',
value_name='wozwaarde')
df1
wijk code stadsdeel year wozwaarde
0 Burgwallen-Oude Zijde A00 Centrum 2014 263417.0
1 Burgwallen-Nieuwe Zijde A01 Centrum 2014 267895.0
2 Grachtengordel-West A02 Centrum 2014 490251.0
3 Grachtengordel-Zuid A03 Centrum 2014 469946.0
... ... ... ... ... ...
486 Nellestein T95 Zuidoost 2018 170441.0
487 Holendrecht/Reigersbos T96 Zuidoost 2018 181189.0
488 Gein T97 Zuidoost 2018 199542.0
489 Driemond T98 Zuidoost 2018 267328.0
df2 = pd.read_excel("https://cms.onderzoek-en-statistiek.nl/uploads/2021_jaarboek_2112_28485510ff.xlsx",
skiprows=2,
skipfooter=2)
df2.drop([0,9], inplace=True)
df2['stadsdeel'] = df2['stadsdeel'].apply(lambda s: s.split()[-1])
df2
stadsdeel 2016 2017 2018 2019 2020 2021
1 Centrum 86499.0 86422.0 86851.0 86862.0 87310.0 87083.0
2 Westpoort 200.0 192.0 198.0 177.0 106.0 145.0
3 West 143964.0 143842.0 144210.0 145908.0 147421.0 148030.0
4 Nieuw-West 149397.0 151677.0 155781.0 157964.0 160124.0 159522.0
5 Zuid 143258.0 144432.0 145548.0 145966.0 146291.0 145301.0
6 Oost 132421.0 135767.0 137454.0 140300.0 142049.0 141453.0
7 Noord 92917.0 94766.0 96275.0 97200.0 99238.0 100887.0
8 Zuidoost 86057.0 87854.0 87999.0 88610.0 89841.0 90076.0
在merge之前,我想先将df2变成长数据。
df2 = df2.melt(id_vars = 'stadsdeel',
var_name = 'year',
value_name = 'population')
df2
stadsdeel year population
0 Centrum 2016 86499.0
1 Westpoort 2016 200.0
2 West 2016 143964.0
3 Nieuw-West 2016 149397.0
4 Zuid 2016 143258.0
5 Oost 2016 132421.0
同时,我增加了一部确保合并变量的数据类型相同。
df1['year'] = df1["year"].map(int)
df2['year'] = df2["year"].map(int)
df1['stadsdeel'] = df1['stadsdeel'].map(str)
df2['stadsdeel'] = df2['stadsdeel'].map(str)
最后,我尝试合并,但结果总是NaN。
df = df1.merge(df2, how = 'left', on = ['stadsdeel', 'year'])
df.head(20)
wijk code stadsdeel year wozwaarde population
0 Burgwallen-Oude Zijde A00 Centrum 2014 263417.0 NaN
1 Burgwallen-Nieuwe Zijde A01 Centrum 2014 267895.0 NaN
2 Grachtengordel-West A02 Centrum 2014 490251.0 NaN
3 Grachtengordel-Zuid A03 Centrum 2014 469946.0 NaN
我已经检查了变量名、变量类型(包括dtype)、去了首尾空格,但都没办法解决。希望有人能帮我看下还有什么问题,谢谢!