FaceBook爬取库：facebook-scraper

简介

无需注册登录，不需要API秘钥即可爬取Facebook；受twitter-scraper的启发。

安装：

pip install facebook-scraper

使用：

from facebook_scraper import get_posts

# 第一个参数为主页唯一标识：nintendo（https://www.facebook.com/Nintendo/）
# 第二个参数为爬取的页数：1
for post in get_posts('nintendo', pages=1):
    print(post['text'][:50])

输出：

Take a first look at the super cool Puma x Super M
We’re talking Triforce and discussing Hyrule Warri

命令行用法：

$ facebook-scraper --filename nintendo_page_posts.csv --pages 1 nintendo

其他参数解析：

group：群组ID，以抓取小组。默认值为None。
pages：要请求多少个帖子页面，通常第一页有2个帖子，其余4个。默认值为10。
timeout：超时设置。默认值为5。
credentials：请求帖子前登录的用户名和密码的元组。默认值为None。
extra_info：布尔值，如果为true，则该函数将尝试执行额外的请求以获取后期响应。默认值为False。
youtube_dl：布尔，请使用Youtube-DL进行（高质量）视频提取。您需要在您的环境中安装youtube-dl。默认值为False。

请求返回结构实例：

{'post_id': '2257188721032235',
 'text': 'Don’t let this diminutive version of the Hero of Time fool you, '
         'Young Link is just as heroic as his fully grown version! Young Link '
         'joins the Super Smash Bros. series of amiibo figures!',
 'time': datetime.datetime(2019, 4, 29, 12, 0, 1),
 'image': 'https://scontent.flim16-1.fna.fbcdn.net'
          '/v/t1.0-0/cp0/e15/q65/p320x320'
          '/58680860_2257182054366235_1985558733786185728_n.jpg'
          '?_nc_cat=1&_nc_ht=scontent.flim16-1.fna'
          '&oh=31b0ba32ec7886e95a5478c479ba1d38&oe=5D6CDEE4',
 'images': ['https://scontent.flim16-1.fna.fbcdn.net'
          '/v/t1.0-0/cp0/e15/q65/p320x320'
          '/58680860_2257182054366235_1985558733786185728_n.jpg'
          '?_nc_cat=1&_nc_ht=scontent.flim16-1.fna'
          '&oh=31b0ba32ec7886e95a5478c479ba1d38&oe=5D6CDEE4'],
 'likes': 2036,
 'comments': 214,
 'shares': 0,
 'reactions': {'like': 135, 'love': 64, 'haha': 10, 'wow': 4, 'anger': 1},  # if `extra_info` was set
 'post_url': 'https://m.facebook.com/story.php'
             '?story_fbid=2257188721032235&id=119240841493711',
 'link': 'https://bit.ly/something'}

设置代理

在__init__.py 第41行后添加如下代码：

proxies = {
    "http": '127.0.0.1:1087',
    "https": '127.0.0.1:1087'
}
_scraper.requests_kwargs['proxies'] = proxies

当前页面是本站的「Baidu MIP」版。发表评论请点击：完整版 »