尝试爬取私募排排网的净值。
第一导出有数据覆盖的问题,
第二产品名如何遍历导出。
代码如下
import requests
from lxml import etree
import json
import pandas as pd
import csv
import openpyxl
main_url="https://dc.simuwang.com/"
headers={
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36'
}
response=requests.get(url=main_url,headers=headers)
# response.encoding='gbk'
page_text=response.text
tree=etree.HTML(page_text)
id_list=[]
all_data_list=[]
divs = tree.xpath('//*[@id="tab-1-1"]//div/@name')# 产品id
title=tree.xpath('//*[@id="tab-1-1"]//a/@title')# 产品名字
for i in divs:
headers={
'Referer': 'https://dc.simuwang.com/',
'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36'
,'cookie':'focus-certification-pop=-1; Hm_lvt_c3f6328a1a952e922e996c667234cdae=1618189142,1618297537,1618310809,1619398821; http_tK_cache=b16cd5c24912be382d0d1ed2818ff61052d56859; cur_ck_time=1619399295; ck_request_key=4Wy8CIVn%2BwKK6tfR%2BRMS8WGfTH7vwsoY6AgOYRTg7iA%3D; passport=773283%09u3457744307362%09AAgMAA1RXAxSUFQFUQ9dAgAHA1cBVlsPCQdTBFRSVFA%3D00ba1ca13f; rz_rem_u_p=6H3dlc4C3f9fGhRVuYbzlrukhq0oUgH2zH4ETGpdofQ%3D%24UuKkSpvRl1u49BZ%2FZQTXr4ewhEWnlkLkc%2FX722YcrJE%3D; certification=1; qualified_investor=1; evaluation_result=3; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%22773283%22%2C%22first_id%22%3A%22773283%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_utm_source%22%3A%2210003%22%2C%22%24latest_utm_medium%22%3A%22cpc%22%2C%22%24latest_utm_campaign%22%3A%22S-%E5%93%81%E7%89%8C%E8%AE%A1%E5%88%92-B-PC%22%2C%22%24latest_utm_content%22%3A%22%E5%93%81%E4%B8%93%E8%AF%8D%22%2C%22%24latest_utm_term%22%3A%22%E7%A7%81%E5%8B%9F%E6%8E%92%E6%8E%92%E7%BD%91%22%2C%22_latest_utm_sign%22%3A%22baidu%22%2C%22_latest_utm_platform%22%3A%22pc%22%7D%2C%22%24device_id%22%3A%221786c594b4a64b-04021956498512-1633685c-1296000-1786c594b4b8bd%22%7D; smppw_tz_auth=1; Hm_lpvt_c3f6328a1a952e922e996c667234cdae=1619399323'
}
#
url='https://ppwapi.simuwang.com/chart/fundNavTrend'
data={
'fund_id': i,
'index_type': 0,
'period': 12,
'rz_type': 3,
'nav_flag':1,#距今多少个月
'muid': 773283,
'USER_ID': 773283, }
wbdata=requests.post(url,headers=headers,data=data).text
j = json.loads(wbdata)
categories = j['data']['categories']
value = []
fillname='/Users/bingtangdunxueli/Desktop/公司/1.csv'
with open(fillname,'w',encoding='utf-8') as f:
f.write("name,categories,value\n")
for i in j['data']['data'][0]:
value.append(i['value'])
a = pd.DataFrame()
a['categories'] = categories
a['value'] = value
a['categories'] =a['categories'].map(lambda x:x.replace('-', '') )
print(a)
#a.to_csv('fund_id.csv')
for li in a.values.tolist():
s = ",".join(map(str,li))
f.write(s+"\n")
导出结果
只有遍历后最后产品的数据。
但是在打印过程中是没有覆盖的,只有在导出后被覆盖了
各位大神如何解决以上问题,使得产品名遍历导出,以及不再覆盖数据
with open(fillname,'w',encoding='utf-8') as f:
改为
with open(fillname,'w+',encoding='utf-8') as f:,,w+是追加模式
您好,我是有问必答小助手,你的问题已经有小伙伴为您解答了问题,您看下是否解决了您的问题,可以追评进行沟通哦~
如果有您比较满意的答案 / 帮您提供解决思路的答案,可以点击【采纳】按钮,给回答的小伙伴一些鼓励哦~~
ps:问答VIP仅需29元,即可享受5次/月 有问必答服务,了解详情>>>https://vip.csdn.net/askvip?utm_source=1146287632
非常感谢您使用有问必答服务,为了后续更快速的帮您解决问题,现诚邀您参与有问必答体验反馈。您的建议将会运用到我们的产品优化中,希望能得到您的支持与协助!
速戳参与调研>>>https://t.csdnimg.cn/Kf0y