Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Web Archiving
  3. Auto Archiver

Auto Archiver

MITPythonv1.2.7

A Python tool to automatically archive web content (videos, images, social media) from Google Sheets and other sources in a secure, verifiable way.

Visit WebsiteGitHubGitHub
1.1k stars100 forks0 contributors

What is Auto Archiver?

Auto Archiver is a Python tool that automatically archives web content—such as videos, images, social media posts, and webpages—from sources like Google Sheets or CSV files. It solves the problem of preserving online content securely and verifiably, ensuring data integrity and accessibility for future reference.

Target Audience

Journalists, researchers, archivists, and organizations needing to systematically preserve online content from social media and other web sources for verification or archival purposes.

Value Proposition

Developers choose Auto Archiver for its automation capabilities, support for multiple content types and storage backends, and its focus on secure, verifiable archiving—making it a robust solution for critical data preservation workflows.

Overview

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

Use Cases

Best For

  • Automatically archiving social media posts from a Google Sheets list
  • Preserving videos and images from URLs in CSV files for research
  • Securely storing web content to S3 or Google Drive with metadata enrichment
  • Tracking archiving status by appending reports back to source spreadsheets
  • Running verifiable archiving pipelines in Docker containers
  • Batch processing URLs from command-line inputs for content preservation

Not Ideal For

  • Real-time or streaming URL archiving from live feeds or APIs
  • Casual users needing quick, one-off downloads without configuration setup
  • Projects requiring archiving from unsupported or niche websites not listed in the documentation
  • Environments where Docker or Python dependencies cannot be installed or managed

Pros & Cons

Pros

Flexible URL Ingestion

Supports multiple input sources like Google Sheets, CSV files, and command-line, enabling integration with various data pipelines and workflows.

Broad Content Support

Archives social media posts, videos, images, and webpages from URLs, covering most common web content types for comprehensive preservation.

Cloud Storage Integration

Allows saving to remote storage backends such as S3 buckets and Google Drive, facilitating scalable and accessible archive management.

Automated Status Reporting

Appends archiving status back to the source spreadsheet or CSV report, providing built-in tracking and audit trails for batch processes.

Cons

Configuration Complexity

Requires setup of configuration files (e.g., orchestration.yaml) and secrets management, which can be time-consuming and error-prone for new users.

Platform Limitations

Archiving is limited to supported content types and platforms; unsupported sites may require custom extensions or development effort.

Documentation Dependency

Users must rely heavily on external documentation for installation and troubleshooting, which might not cover all edge cases or advanced scenarios.

Frequently Asked Questions

Quick Stats

Stars1,074
Forks100
Contributors0
Open Issues8
Last commit1 day ago
CreatedSince 2021

Tags

#service#python#s3#open-source-research#archive#docker#web-archiving#google-drive#automation#scraping#google-sheets

Built With

P
Python
D
Docker

Links & Resources

Website

Included in

Web Archiving2.5k
Auto-fetched 6 hours ago

Related Projects

ArchiveBoxArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Stars27,327
Forks1,515
Last commit9 days ago
monolithmonolith

⬛️ CLI tool and library for saving complete web pages as a single HTML file

Stars15,043
Forks451
Last commit2 days ago
DiskerNetDiskerNet

💾 dn - offline full-text search and archiving for your Chromium-based browser.

Stars3,899
Forks148
Last commit1 month ago
HeritrixHeritrix

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

Stars3,220
Forks782
Last commit6 hours ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub