A splitable Hadoop InputFormat for processing concatenated GZIP files and web archive (*.warc.gz) data efficiently in distributed systems.
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.