A Python tool that accelerates uploading many small files to Amazon S3 by performing multiple PUT operations in parallel.
s3-parallel-put is a command-line utility written in Python that accelerates the upload of many small files to Amazon AWS S3 by performing multiple PUT operations concurrently. It solves the performance bottleneck of sequential uploads, making it ideal for batch uploads and data synchronization tasks where transfer time is critical.
Developers, system administrators, and data engineers who need to efficiently upload large numbers of small files to AWS S3 from the command line, particularly for tasks like website asset deployment, log file archiving, or dataset synchronization.
Developers choose s3-parallel-put for its straightforward parallel upload architecture that significantly reduces transfer times compared to single-threaded tools, along with flexible upload heuristics, resumable uploads via log files, and tunable parameters for optimizing performance based on specific use cases.
Parallel uploads to Amazon AWS S3
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses multiple processes (walker, putter, statter) to upload files concurrently, significantly reducing transfer times for many small files, as detailed in the architecture section.
Supports modes like 'update', 'add', and 'stupid' to optimize uploads based on file existence and changes, avoiding unnecessary HEAD requests or MD5 calculations for speed.
Allows resuming interrupted uploads from a log file with the --resume option, which is crucial for handling network failures in large batch operations.
Can guess Content-Type via 'magic' or filename, apply gzip compression to text files, and set custom headers, improving efficiency and S3 metadata management.
Requires Python 2.X, which is no longer supported, posing security risks and compatibility challenges with modern systems and tooling.
The README explicitly lists 'Limited error checking' as a bug, meaning failures during uploads might not be gracefully handled or logged, risking data integrity.
Lacks automatic chunking for large files, as noted in the 'To Do' section, making it inefficient for uploads exceeding S3's single PUT limits.