A command line tool and Python library for collecting and archiving Twitter JSON data via the Twitter API.
twarc is a command line tool and Python library specifically designed for collecting and archiving Twitter JSON data via the Twitter API. It addresses the need for systematic data gathering from Twitter for research, analysis, and preservation purposes, supporting both the older v1.1 and newer v2 APIs.
Researchers, data scientists, and developers who need to programmatically collect and archive Twitter data for academic studies, social media analysis, or digital preservation projects.
Developers choose twarc for its focused approach to Twitter data collection, dual API support, and extensible plugin system, which allows for sustainable and high-quality data archiving while maintaining consistency with the Twitter API.
A command line tool (and Python library) for archiving Twitter JSON
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Supports both Twitter API v1.1 and v2, including Academic Access, enabling flexible data collection across historical and current endpoints, as highlighted in the README's separate commands (twarc and twarc2).
Uses click-plugins to allow external packages, keeping the core small and sustainable while enabling community-driven extensions, as per the project's philosophy on sustainability and utility.
Designed specifically for research and archiving, with principles like API consistency and broad applicability, making it ideal for systematic data gathering in academic projects.
Features detailed documentation on ReadTheDocs with active community support via GitHub, Slack, and Matrix, as noted in the README for troubleshooting and contributions.
The project is no longer actively supported due to Twitter's API quota changes, making it potentially unreliable for current use, as explicitly stated in the README warning.
Relies entirely on Twitter's API, which has become more restrictive and costly, limiting practical usability and requiring users to navigate evolving API limitations on their own.
Requires manual configuration of API keys and environment variables, as shown in the development setup instructions, which can be a barrier for quick or casual data collection efforts.