site-crawler
    Overview
    Discussions
    Alternatives
    Reviews
    Code
    Tutorials
    Playground
    Chat
    Versions

site-crawler

site crawler for node.

1.0.4  •  Updated 4 years ago  •  by Yusuke Shibata  •  MIT License

Crawler

Simple site crawler for node.js

Install

npm install site-crawler

Example Codes

var Crawler = require('site-crawler')

var site = 'https://techcrunch.com'

var crawler = new Crawler({
	// default is 10
	concurrency:10
})
crawler
.on('found',function(url,next) {
	var ok = url.startsWith(site)
	if(ok) console.error('found:',url)
	// set null argument for next if reject crawling this url.(or you can modify url)
	next(ok ? url : null)
})
.on('crawl',function(url,res,$,next) {
	// res is response object of resuest module
	// $ is cheerio object
	console.error('\tcrawl:',$('title').text())
	next()
})
.on('error',function(url,err) {
	console.error('\terror:',url,':',err.statusCode)
})
.on('complete',function() {
	console.log('done.')
})
crawler.start(site)

Tests

cd crawler
npm test

Licence

MIT

Popularity

Stars
0

Maintenance

Development

Last ver 4 years ago
Created 4 years ago
Last commit 4 years ago
10 minutes between commits

Technology

Node version: 5.2.0
0 unpacked

Compliance

MIT License
OSI Approved
0 vulnerabilities

Contributors

1 contributors
Yusuke Shibata
Maintainer, 9 commits

Tags

crawler
node
Ready for the next level?
Join Openbase's founding team to help us build the ultimate open-source app store, work with the latest technologies, and enjoy great culture, impact and autonomy
Openbase helps developers choose among and use millions of open-source packages, so they can build amazing products faster.
FacebookLinkedIn
© 2020 Devstore, Inc.