The "Awesome Web Archiving" project is a comprehensive collection of resources dedicated to the preservation of web content for future generations. Web archiving involves capturing and storing web pages and their associated data to ensure that digital history is not lost over time. This list includes tools, software, best practices, case studies, and community initiatives that focus on web archiving. It is valuable for researchers, historians, developers, and anyone interested in digital preservation, providing insights into methodologies and technologies used in the field. Users can discover innovative solutions and contribute to the ongoing effort to safeguard our digital heritage.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
The "Awesome" project is a comprehensive exploration of recursion, a fundamental programming technique where a function calls itself to solve problems. This list covers various aspects of recursion, including visual illustrations, examples, and explanations that help demystify the concept. It is beneficial for beginners looking to grasp the basics of recursion, as well as experienced developers seeking to refine their understanding or find new applications for recursive solutions. With a variety of resources available, users can deepen their knowledge and enhance their coding skills through practical examples and insightful discussions.
The "Awesome Self Hosted" project is a curated collection of software applications that can be hosted on your own servers, providing users with full control over their data and services. This list encompasses a wide range of categories, including web applications, databases, file storage solutions, content management systems, and development tools. It is particularly beneficial for developers, system administrators, and privacy-conscious users who seek alternatives to cloud services. By leveraging self-hosted solutions, users can enhance their security, customize their environments, and reduce reliance on third-party providers. Explore this collection to discover powerful tools that empower you to take charge of your digital landscape.
The "Awesome Free for Developers" project is a curated collection of free tools, services, and resources available for developers. This list covers a wide range of categories including cloud services, APIs, software development tools, design resources, and educational platforms that offer free tiers or completely free access. It is particularly beneficial for developers, startups, and students who are looking to leverage high-quality resources without incurring costs. By providing access to these valuable tools, the project empowers users to enhance their projects, improve their skills, and innovate without financial barriers. Explore this collection to discover what you can utilize for your next development endeavor.
The "Awesome Beginner-Friendly Projects" project is a curated collection of coding projects aimed at helping novice developers enhance their programming skills through practical experience. This list includes a variety of beginner-friendly projects across different programming languages, covering categories such as web development, game development, data analysis, and mobile applications. With resources ranging from project ideas and tutorials to sample code and community support, this list is invaluable for beginners looking to build confidence and competence in coding. Whether you're just starting or looking to practice your skills, you'll find engaging projects that inspire creativity and learning.
A compilation of research materials on data resilience, interactivity, and related topics for the Data Together community.
A comprehensive curated list of open-source and hosted tools for monitoring and detecting changes on websites.
Open-source self-hosted web archiving tool that saves websites in multiple durable formats like HTML, PDF, and WARC.
A Python tool to automatically archive web content (videos, images, social media) from Google Sheets and other sources in a secure, verifiable way.
A standalone Docker container for high-fidelity, browser-based web archiving crawls using Puppeteer and Brave.
An offline-first web browser that archives, searches, and crawls websites for personal use.
Offline full-text search and archiving tool for Chromium-based browsers that saves and indexes every page you visit.
A command-line tool and Python library for archiving Facebook data via the Graph API, supporting recursive retrieval of nodes and edges.
A preconfigured web crawler for backing up websites, producing WARC files with a live dashboard and dynamic ignore patterns.
An open-source, extensible, web-scale, archival-quality web crawler from the Internet Archive.
An open-source, extensible, web-scale, archival-quality web crawler from the Internet Archive.
CLI tool and library for saving complete web pages as a single, self-contained HTML file.
A Go package and CLI tool that saves web pages as single HTML files with all assets embedded.
A high-fidelity, browser-based web archiving library and CLI for capturing single web pages with provenance.
A high-fidelity, user-scriptable archival web crawler using Chrome/Chromium to preserve JavaScript-rendered content.
A command line tool and Python library for collecting and archiving Twitter JSON data via the Twitter API.
A graphical desktop application that simplifies web archiving by providing a one-click interface to preserve and replay web pages using Heritrix and OpenWayback.
A privacy-focused web archiving tool with an IM-style interface that captures pages to multiple archival services.
A Python package and CLI tool for interacting with the Wayback Machine's Save, CDX, and Availability APIs.
A distributed and persistent web archive replay system that uses IPFS to store and serve WARC files.
Legacy web archive replay engine for accessing historical web content from WARC files.
A research-driven web crawler for building and analyzing curated web corpora as networks of web entities.
A toolkit for indexing and exploring web archive content from ARC and WARC files using OpenSearch/Elasticsearch.
A web application for searching, browsing, and analyzing archived web content (ARC/WARC files) with a Solr backend.
A toolkit for indexing and exploring web archive content from ARC and WARC files using OpenSearch/Elasticsearch.
A Python toolkit for extracting, filtering, and analyzing data from web archives, JSON files, and imageboards.
A Go tool and library for downloading URLs and files from Common Crawl and Wayback Machine web archives.
A portable concurrent Memento aggregator CLI and server for retrieving archived web pages from multiple sources.
WarcDB is an SQLite-based file format that makes web crawl data easier to share and query.
A set of Python tools for downloading and preserving wikis, including MediaWiki wikis and Wikimedia projects.
A collection of robust and fast Python tools for parsing, extracting, and analyzing web archive data, including a high-performance WARC parser.
A Node.js library for parsing and creating Web ARChive (WARC) files with support for Chrome, Puppeteer, and Electron.
Python command-line tools and libraries for handling, validating, and converting WARC and ARC web archive files.
An Apache Spark framework for efficient data processing, extraction, and derivation from web archives and archival collections.
An open-source toolkit for analyzing web archives at scale using Apache Spark.
A collection of Jupyter notebooks for analyzing Common Crawl web archive data using columnar indexes and webgraph datasets.
A powerful, open-source screenshot tool with built-in annotation and editing capabilities for Linux, Windows, and macOS.
A command-line tool for simulating keyboard/mouse input and automating window management on X11 systems.
A curated list of software, literature, and resources for the Memento protocol (RFC7089) enabling time-based access to archived web content.