A Node.js web crawler with server-side jQuery, rate limiting, and proxy support for efficient scraping.
Node Crawler is a web crawling and scraping library for Node.js that provides server-side DOM manipulation using Cheerio, a jQuery-like API. It enables developers to efficiently extract data from websites with features like automatic charset conversion, priority queuing, and configurable concurrency.
Node.js developers building web scrapers, crawlers, or data extraction tools who need a robust, feature-rich solution with jQuery-like DOM querying capabilities.
Developers choose Node Crawler for its balance of ease of use with advanced features like rate limiting, proxy rotation, HTTP/2 support, and automatic charset handling, making it a reliable and efficient tool for web scraping without unnecessary complexity.
Web Crawler/Spider for NodeJS + server-side jQuery ;-)
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses Cheerio for server-side jQuery syntax, allowing developers to parse HTML with familiar CSS selectors, as shown in the callback examples.
Offers configurable rate limiting and concurrency with independent rate limiters, enabling ethical scraping and proxy management, detailed in the rateLimiters section.
Supports proxy rotation and HTTP/2 for enhanced performance and compatibility, with hassle-free setup as highlighted in the HTTP/2 documentation.
Detects and converts character sets to UTF-8 automatically via the 'forceUTF8' option, reducing issues with international websites.
Dropped CommonJS support in v2, forcing migration that can be challenging for legacy codebases, with only a beta version offering dual support.
v2 introduces significant renaming and behavior shifts from v1, such as using 'form' instead of 'body' for POST, requiring code updates and testing.
Relies solely on Cheerio without jsdom support, so it cannot handle pages requiring client-side JavaScript rendering, limiting use for dynamic sites.
Admits stability issues on Linux with Node.js versions above 18, restricting environment choices and potentially causing bugs in production.