headless-chrome-crawler 教程

公告：“业余草”微信公众号提供免费CSDN下载服务(只下Java资源)，关注业余草微信公众号，添加作者微信：xttblog2，发送下载链接帮助你免费下载！
本博客日IP超过2000，PV 3000 左右，急需赞助商。
极客时间所有课程通过我的二维码购买后返现24元微信红包，请加博主新的微信号：xttblog2，之前的微信号好友位已满，备注：返现
受密码保护的文章请关注“业余草”公众号，回复关键字“0”获得密码
所有面试题(java、前端、数据库、springboot等)一网打尽，请关注文末小程序
视频教程免费领

腾讯云】1核2G5M轻量应用服务器50元首年，高性价比，助您轻松上云

headless-chrome-crawler 是一个自带 JavaScript 执行环境的爬虫插件。它支持分布式，是一款分布式爬虫。它能解决 AngularJS、vue.js 等这些现代化的前端框架编写的网站的爬虫问题。本文将详细的介绍它的相关教程。

前面我写过《Webmagic(爬虫)抓取新浪博客案例》Webmagic 是一个基于 java 实现的爬虫框架，关于它的相关教程都可以在我的博客中找到。

特征

分布式抓取
配置并发性，延迟和重试
支持深度优先搜索和广度优先搜索算法
可插拔的缓存存储，如Redis
支持导出结果的CSV和JSON行
在最大请求时暂停并在任何时候恢复
自动插入jQuery进行刮取
保存抓取证据的截图
仿真设备和用户代理
优先队列提高抓取效率
服从robots.txt
跟随sitemap.xml

安装

headless-chrome-crawler 有两种安装方式，如下：

yarn add headless-chrome-crawler
# or "npm i headless-chrome-crawler"

爬行器包含Puppeteer。在安装过程中，它会自动下载最新版本的Chromium。

用法

它的用法很简单，如下：

const HCCrawler = require('headless-chrome-crawler');

HCCrawler.launch({
  // Function to be evaluated in browsers
  evaluatePage: (() => ({
    title: $('title').text(),
  })),
  // Function to be called with evaluated results from browsers
  onSuccess: (result => {
    console.log(result);
  }),
})
  .then(crawler => {
    // Queue a request
    crawler.queue('https://example.com/');
    // Queue multiple requests
    crawler.queue(['https://example.net/', 'https://example.org/']);
    // Queue a request with custom options
    crawler.queue({
      url: 'https://example.com/',
      // Emulate a tablet device
      device: 'Nexus 7',
      // Enable screenshot by passing options
      screenshot: {
        path: './tmp/example-com.png'
      },
    });
    crawler.onIdle() // Resolved when no queue is left
      .then(() => crawler.close()); // Close the crawler
  });

另外 github 上还提供了很多案例和 api 文档的使用。更多用法请移步到 github 上自行学习。

业余草公众号

最后，欢迎关注我的个人微信公众号：业余草（yyucao）！可加作者微信号：xttblog2。备注：“1”，添加博主微信拉你进微信群。备注错误不会同意好友申请。再次感谢您的关注！后续有精彩内容会第一时间发给您！原创文章投稿请发送至532009913@qq.com邮箱。商务合作也可添加作者微信进行联系！

本文原文出处：业余草： » headless-chrome-crawler 教程

一	二	三	四	五	六	日
« 1月
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

特征

安装

用法

相关文章推荐