FaceBook爬取库:facebook-scraper
简介
无需注册登录,不需要API秘钥即可爬取Facebook;受twitter-scraper的启发。
安装:
pip install facebook-scraper
使用:
from facebook_scraper import get_posts
# 第一个参数为主页唯一标识:nintendo(https://www.facebook.com/Nintendo/)
# 第二个参数为爬取的页数:1
for post in get_posts('nintendo', pages=1):
print(post['text'][:50])
输出:
Take a first look at the super cool Puma x Super M
We’re talking Triforce and discussing Hyrule Warri
命令行用法:
$ facebook-scraper --filename nintendo_page_posts.csv --pages 1 nintendo
其他参数解析:
- group:群组ID,以抓取小组。默认值为None。
- pages:要请求多少个帖子页面,通常第一页有2个帖子,其余4个。默认值为10。
- timeout:超时设置。默认值为5。
- credentials:请求帖子前登录的用户名和密码的元组。默认值为None。
- extra_info:布尔值,如果为true,则该函数将尝试执行额外的请求以获取后期响应。默认值为False。
- youtube_dl:布尔,请使用Youtube-DL进行(高质量)视频提取。您需要在您的环境中安装youtube-dl。默认值为False。
请求返回结构实例:
{'post_id': '2257188721032235',
'text': 'Don’t let this diminutive version of the Hero of Time fool you, '
'Young Link is just as heroic as his fully grown version! Young Link '
'joins the Super Smash Bros. series of amiibo figures!',
'time': datetime.datetime(2019, 4, 29, 12, 0, 1),
'image': 'https://scontent.flim16-1.fna.fbcdn.net'
'/v/t1.0-0/cp0/e15/q65/p320x320'
'/58680860_2257182054366235_1985558733786185728_n.jpg'
'?_nc_cat=1&_nc_ht=scontent.flim16-1.fna'
'&oh=31b0ba32ec7886e95a5478c479ba1d38&oe=5D6CDEE4',
'images': ['https://scontent.flim16-1.fna.fbcdn.net'
'/v/t1.0-0/cp0/e15/q65/p320x320'
'/58680860_2257182054366235_1985558733786185728_n.jpg'
'?_nc_cat=1&_nc_ht=scontent.flim16-1.fna'
'&oh=31b0ba32ec7886e95a5478c479ba1d38&oe=5D6CDEE4'],
'likes': 2036,
'comments': 214,
'shares': 0,
'reactions': {'like': 135, 'love': 64, 'haha': 10, 'wow': 4, 'anger': 1}, # if `extra_info` was set
'post_url': 'https://m.facebook.com/story.php'
'?story_fbid=2257188721032235&id=119240841493711',
'link': 'https://bit.ly/something'}
设置代理
在__init__.py 第41行后添加如下代码:
proxies = {
"http": '127.0.0.1:1087',
"https": '127.0.0.1:1087'
}
_scraper.requests_kwargs['proxies'] = proxies