python爬虫问题

需求是这样的，我需要从网站上爬取马匹的信息，当我爬取完一匹马的信息，要去爬取它的父母的信息，以及它父母的父母.以此类推下去，每次爬取完父母信息添加到数据库后，在子女的父母字段添加对应的id（刚添加的父母信息中取的id）。
当前有两个网站，一个网站是马匹的详细信息，也就是需要获取数据向数据库里面添加信息的网站，它的网站名格式是这样的 www.mapi.com/edit/马匹名称,下图为网站实图

另一个网站可以获取到马匹父母名称的网站，（有了名称就可以根据名字访问到第一个马匹详细信息的网站），它的网站名格式是这样的 www.mapi.com/马匹名称，下图为网站实图（画圈处为父母名称，后面的内容不用看）

解题思路是啥啊，我怎么才能在添加完第一个马匹，同时添加完他的父母的信息后，去添加父亲的父母，母亲的父母.以此类推下去？完全没思路不会写.

本人刚接触爬虫，望各位伸出援手~

你可以参考下这个问题的回答, 看看是否对你有帮助, 链接: https://ask.csdn.net/questions/729886

以下回答由chatgpt基于相关博客总结生成:

首先，你需要使用Python编写爬虫程序，可以使用requests库发起HTTP请求，使用BeautifulSoup或者正则表达式解析网页内容，获取你需要的马匹信息和父母信息。

接下来，你可以将获取到的信息存储到数据库中，可以使用SQLite或者MySQL等关系型数据库，使用Python的MySQLdb或者pymysql库进行数据库连接和操作。将每一个马匹的父母信息通过外键与马匹信息进行关联。

当你需要根据父母的名字去访问另一个网站获取详细信息时，直接使用之前获取的父母信息中的ID进行关联查询，并将查询到的详细信息存入数据库中。

下面是一个简单的爬虫程序示例，可以帮助你理解：

import requests
from bs4 import BeautifulSoup
import pymysql

# 爬取马匹信息和父母信息
def get_horse_info(url):
    res = requests.get(url)
    soup = BeautifulSoup(res.content, 'html.parser')
    # 获取马匹信息
    horse_name = soup.select('h1')[0].text.strip()
    father_name = soup.select('#father a')[0].text.strip()
    mother_name = soup.select('#mother a')[0].text.strip()
    # 获取父母信息
    father_url = 'http://xxx.xxx.xxx/xxx?name=' + father_name
    mother_url = 'http://xxx.xxx.xxx/xxx?name=' + mother_name
    father_info = get_parent_info(father_url)
    mother_info = get_parent_info(mother_url)
    return horse_name, father_name, mother_name, father_info, mother_info

# 获取父母信息
def get_parent_info(url):
    res = requests.get(url)
    soup = BeautifulSoup(res.content, 'html.parser')
    # 解析网页内容
    return {'name': soup.select('.name')[0].text.strip(),
            'age': soup.select('.age')[0].text.strip(),
            'gender': soup.select('.gender')[0].text.strip()}

# 存储到数据库中
def save_to_db(conn, cursor, horse_name, father_name, mother_name, father_info, mother_info):
    horse_sql = "INSERT INTO horse(name, father_id, mother_id) VALUES('{}', {}, {});".format(
        horse_name, father_info['id'], mother_info['id'])
    cursor.execute(horse_sql)
    father_sql = "INSERT INTO parent(name, age, gender) VALUES('{}', '{}', '{}');".format(
        father_info['name'], father_info['age'], father_info['gender'])
    cursor.execute(father_sql)
    father_id = cursor.lastrowid
    mother_sql = "INSERT INTO parent(name, age, gender) VALUES('{}', '{}', '{}');".format(
        mother_info['name'], mother_info['age'], mother_info['gender'])
    cursor.execute(mother_sql)
    mother_id = cursor.lastrowid
    # 更新马匹表中对应的father_id和mother_id
    update_horse_sql = "UPDATE horse SET father_id={}, mother_id={} WHERE name='{}';".format(
        father_id, mother_id, horse_name)
    cursor.execute(update_horse_sql)
    # 提交事务
    conn.commit()

if __name__ == '__main__':
    # 连接数据库
    conn = pymysql.connect(
        host='localhost',
        port=3306,
        user='root',
        password='123456',
        database='test',
        charset='utf8'
    )
    cursor = conn.cursor()
    # 爬取网页信息
    url = 'http://xxx.xxx.xxx/xxx?name=xxx'
    horse_name, father_name, mother_name, father_info, mother_info = get_horse_info(url)
    # 存储到数据库中
    save_to_db(conn, cursor, horse_name, father_name, mother_name, father_info, mother_info)
    # 关闭数据库连接
    cursor.close()
    conn.close()

具体的数据库表结构和字段可以根据你的需求进行设计，在保存到数据库之前需要先进行数据库连接，然后构造SQL语句进行插入和更新操作，并使用commit方法提交事务。保存完马匹和父母信息后，就可以根据父母的名字和ID查询另一个网站获取详细信息，然后将查询到的结果存入数据库中，并使用外键和马匹信息进行关联。