A Python package and CLI tool for interacting with the Wayback Machine's Save, CDX, and Availability APIs.
Waybackpy is a Python package and command-line tool that interfaces with the Wayback Machine's public APIs. It allows developers to programmatically save web pages to the Internet Archive, retrieve historical snapshots of URLs, and check archive availability. The tool simplifies automating web archiving tasks and accessing historical web data.
Developers, researchers, and archivists who need to automate interactions with the Wayback Machine, such as saving pages, querying historical data, or integrating archive functionality into applications.
Waybackpy offers a unified, well-documented Pythonic interface to all three Wayback Machine APIs, with both library and CLI support. It handles timestamp conversions and API complexities, making it easier to build reliable archiving workflows compared to manual API calls.
Wayback Machine API interface & a command-line tool
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Provides a unified interface to all three Wayback Machine APIs (SavePageNow, CDX Server, Availability), simplifying access to archival functions as shown in the README examples.
Supports multiple timestamp formats including datetime objects, Unix timestamps, and Wayback-specific timestamps for precise snapshot retrieval, demonstrated in the CDX API near() method.
Offers both a Python package for programmatic use and a command-line tool for scripting and automation, with separate documentation and demo videos provided.
Regular updates and testing are indicated by GitHub badges for unit tests, code coverage, and recent commits, ensuring reliability.
The README explicitly recommends avoiding the Availability API due to performance problems, limiting its usefulness for some queries.
Relies entirely on the Wayback Machine's public APIs, which can have rate limits, downtime, or breaking changes that affect functionality without recourse.
Every API call requires a user agent string, adding an extra step that can be cumbersome and error-prone for developers.