/www/wwwroot/note.chukogals.top/usr/plugins/AMP/templates/MIPpage.php on line 33
">

【伪爬虫】B站评论数据扒取脚本(非原创)

2025-04-24T14:36:00

[info]本脚本调用的B站api版本为2024年的版本,如后续有更改调用方法和返回方式请勿参考此文[/info]

本文主要介绍其他大神创作的通过调用浏览器直接爬取B站评论区内容的脚本,并通过ChatGPT 4.0o进行脚本微调,实现了输出内容的csv格式化。

下面是StarrySkyVictor通过DrissionPage爬取B站评论的脚本的魔改:

from DrissionPage import ChromiumPage  # pip install DrissionPage
import time
import csv
import os

URL = input("请输入B站视频或动态的链接:")
num = int(input("请输入要爬取的页面次数:"))

page = ChromiumPage()
page.set.load_mode.none()

# 监听特定的网络流
page.listen.start('https://api.bilibili.com/x/v2/reply/wbi/main?')

# 访问B站页面
page.get(f'{URL}')
time.sleep(3)

for _ in range(num + 1):
    page.scroll.to_bottom()
    time.sleep(2)

# 用于存储所有捕获的响应数据
responses = []

try:
    for _ in range(num):
        packet = page.listen.wait()
        page.stop_loading()

        # 直接读取响应体(已经是 dict)
        response_body = packet.response.body
        responses.append(response_body)
        time.sleep(1)

except Exception as e:
    print(f"解析出现错误: {e}")

# 导出CSV
total_comments = 0
output_path = 'comments.csv'

with open(output_path, 'w', encoding='utf-8-sig', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['评论内容', '用户名', '性别', 'IP地址'])

    for response in responses:
        try:
            if 'data' in response and response['data'] and 'replies' in response['data']:
                datas = response['data']['replies']
                total_comments += len(datas)
                for data in datas:
                    comments = data['content']['message']
                    uname = data['member']['uname']
                    sex = data['member']['sex']
                    IP = data['reply_control'].get('location', '未知')
                    print(f"评论内容: {comments}\n用户名: {uname}\n性别: {sex}\nIP地址: {IP}\n")
                    writer.writerow([comments, uname, sex, IP])
        except KeyError as e:
            print(f"处理响应时出现错误: {e}")

page.close()
print(f"总评论数量: {total_comments}")
print(f"评论数据已保存到: {os.path.abspath(output_path)}")

用法:

  • 安装Python
  • 安装DrissionPage: pip install DrissionPage
  • 如果安装过谷歌浏览器无需调整
  • 如果默认Edge浏览器,修改C:\Users\[Username]\AppData\Local\Programs\Python\[Python-version]\Lib\site-packages\DrissionPage\_configs\chromium_options.py路径下脚本中第89行,将返回值调整为:return 'C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe'
  • 先运行一次脚本,随便选一个b站链接和翻页数,打开以后会发现b站未登录,先扫码登入一下。
  • 根据(总评论除以20)+1后取整,计算翻页数,然后将需要调查的动态或视频url复制,执行脚本即可。

参考地址:

当前页面是本站的「Baidu MIP」版。发表评论请点击:完整版 »