爬取私募排排网部分基金历史净值数据

因为工作需要，需爬取私募排排网，有账号，python beginner基础。
有没人可以help指导项目并解析代码？
有尝帮助。

网页可以用beautiful soup解析html，很方便。
API参考：https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html#id32
要登录的话需要给requests对象附带上登录后的身份cookie
参考：https://www.cnblogs.com/liuzhzhao/p/12114453.html

在不考虑反爬机制的情况下，直接使用scrapy搞定。
如果对速度没有特别要求的话，可以使用selenium模拟浏览器访问（速度慢，但通杀）
但由于各类网站都会有其反爬虫机制，所以并没有通用的解决方案。根据我个人项目经验，可能会存在以下反爬机制需要克服

cookie 验证。应对办法：提取你账号登录后产生的cookie，附加在request请求中
动态页面。应对办法：模拟ajax请求
js加密。应对办法：找出对应的加密模块（大多数情况下，加密模块本身也被加密了，因此需要通过猜测、断点调试等方式来寻找），手动生成密钥。
访问次数限制。应对方法：IP池
账号访问次数限制。应对方法：无解，只有获取更多的账号。但是一般不太会有这种情况

我可以有偿协助你来制作（我来写解决方案，然后解析代码）

type 类型，历史净值用lsjz表示

code 基金代码，六位数字

sdate 开始日期，格式是yyyy-mm-dd

edate 结束日期，格式是yyyy-mm-dd

per 一页显示多少条记录

为了便于分析页面数据，要保证所选择日期范围内的净值在一个页面全部显示，可以把per设成很大的值，比如65535。

返回的页面数据比较简单，只有一个历史净值的表格和总记录数，总页数和当前页数。

var apidata={ content:"

净值日期单位净值累计净值日增长率申购状态赎回状态分红送配

2018-03-02 2.3580 2.3580 0.17% 开放申购开放赎回

2018-03-01 2.3540 2.3540 0.56% 开放申购开放赎回

2018-02-28 2.3410 2.3410 -1.35% 开放申购开放赎回

2018-02-27 2.3730 2.3730 -2.06% 开放申购开放赎回

2018-02-26 2.4230 2.4230 0.29% 开放申购开放赎回

2018-02-23 2.4160 2.4160 -0.49% 开放申购开放赎回

2018-02-22 2.4280 2.4280 2.58% 开放申购开放赎回

",records:7,pages:1,curpage:1};

html.PNG

用BeautifulSoup库的findAll找到tbody(表格主体)标签，然后在里面找tr(表格中的一行)标签，单元格内容是：

td:nth-of-type(1)(第1个单元格)是净值日期

td:nth-of-type(2)(第2个单元格)是单位净值

td:nth-of-type(3)(第3个单元格)是累计净值

td:nth-of-type(4)(第4个单元格)是日增长率

范例代码如下：

-- coding:utf-8 --

import requests

from bs4 import BeautifulSoup

from prettytable import *

def get_url(url, params=None, proxies=None):

rsp = requests.get(url, params=params, proxies=proxies)

rsp.raise_for_status()

return rsp.text

def get_fund_data(code, start='', end=''):

record = {'Code': code}

url = ' http://fund.eastmoney.com/f10/F10DataApi.aspx' http://fund.eastmoney.com/f10/F10DataApi.aspx'

params = {'type': 'lsjz', 'code': code, 'page': 1, 'per': 65535, 'sdate': start, 'edate': end}

html = get_url(url, params)

soup = BeautifulSoup(html, 'html.parser')

records = []

tab = soup.findAll('tbody')[0]

for tr in tab.findAll('tr'):

if tr.findAll('td') and len((tr.findAll('td'))) == 7:

record['Date'] = str(tr.select('td:nth-of-type(1)')[0].getText().strip())

record['NetAssetValue'] = str(tr.select('td:nth-of-type(2)')[0].getText().strip())

record['ChangePercent'] = str(tr.select('td:nth-of-type(4)')[0].getText().strip())

records.append(record.copy())

return records

def demo(code, start, end):

table = PrettyTable()

table.field_names = ['Code', 'Date', 'NAV', 'Change']

table.align['Change'] = 'r'

records = get_fund_data(code, start, end)

for record in records:

table.add_row([record['Code'], record['Date'], record['NetAssetValue'], record['ChangePercent']])

return table

if name == "main":

print demo('110022', '2018-02-22', '2018-03-02')

输出结果如下：

+--------+------------+--------+--------+

+--------+------------+--------+--------+

| 110022 | 2018-03-02 | 2.3580 | 0.17% |

| 110022 | 2018-03-01 | 2.3540 | 0.56% |

| 110022 | 2018-02-28 | 2.3410 | -1.35% |

| 110022 | 2018-02-27 | 2.3730 | -2.06% |

| 110022 | 2018-02-26 | 2.4230 | 0.29% |

| 110022 | 2018-02-23 | 2.4160 | -0.49% |

| 110022 | 2018-02-22 | 2.4280 | 2.58% |

+--------+------------+--------+--------+
————————————————
版权声明：本文为CSDN博主「weixin_39640909」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接： https://blog.csdn.net/weixin_39640909/article/details/111457696 https://blog.csdn.net/weixin_39640909/article/details/111457696
有帮助，请采纳

现在遇到的困难是什么？

所以，你想怎么指导？远程？语音？还是其他什么方式？

私聊