Question 1

How does Browsertrix Crawler compare to wget for web archiving?

Accepted Answer

Browsertrix Crawler uses real browsers to handle JavaScript and dynamic content, making it superior for modern websites, while wget is faster and simpler for static HTML scraping. Choose Browsertrix for accuracy, wget for speed on basic sites.

Question 2

Can I use Chrome instead of Brave with Browsertrix Crawler?

Accepted Answer

The README specifies Brave Browser, but since Puppeteer supports Chromium, modifications might allow Chrome. Out-of-the-box, it's optimized for Brave, so switching may require custom configuration.

Question 3

What are the system requirements for running Browsertrix Crawler?

Accepted Answer

It requires Docker and sufficient resources—typically at least 2GB RAM per browser instance. For parallel crawls, a system with 8+ GB RAM and multiple CPU cores is recommended to avoid performance issues.

Question 4

How to configure parallel crawling in Browsertrix?

Accepted Answer

Parallel crawling is set by adjusting the number of browser windows in the crawl configuration file. Refer to the documentation for parameters like 'parallel' or 'workers' to control concurrency based on your needs.

Question 5

Is Browsertrix Crawler suitable for archiving social media sites?

Accepted Answer

Yes, its high-fidelity capture handles dynamic content well, but you'll need to configure authentication and respect rate limits. Custom scripts or settings may be required for login pages and interactive elements.

Question 6

Browsertrix Crawler vs ArchiveBox: which is better?

Accepted Answer

Browsertrix excels at customizable, high-fidelity crawls with Docker for precise archiving, while ArchiveBox offers a simpler, all-in-one solution with less setup. Use Browsertrix for complex projects; ArchiveBox for straightforward, automated archives.

Browsertrix Crawler

What is Browsertrix Crawler?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions