A lightweight MongoDB schema analyzer that reveals document structure, field frequencies, and data outliers.
Variety is a lightweight command-line tool that analyzes MongoDB collections to reveal their implicit schema. It scans documents to report every field, its data types, how frequently it appears, and any outliers, helping developers quickly understand unfamiliar or messy datasets. It solves the problem of exploring unstructured data by providing a clear, statistical view of document structures.
MongoDB developers, database administrators, and data engineers who need to explore, document, or clean up collections, especially when inheriting legacy codebases or dealing with evolving schemas.
Developers choose Variety because it's simple, dependency-free, and provides immediate insight into MongoDB collections without requiring predefined schemas. Its ability to highlight outliers and field frequencies helps identify data quality issues and legacy cruft that other tools might miss.
Variety: a MongoDB schema analyzer
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Automatically scans entire collections to list every key and its BSON types, providing a clear, statistical overview of document structure, as demonstrated in the README's example output with keys like 'pets' showing mixed types.
Highlights rare or legacy keys with low occurrence percentages, such as 'someWeirdLegacyKey' in the example, helping identify data quality issues and unused fields in inherited datasets.
Supports queries, limits, sorting, and depth constraints via parameters like 'maxDepth' and 'query', allowing targeted analysis of document subsets without scanning entire collections.
Offers both ASCII tables for human readability and JSON for programmatic consumption, with 'outputFormat' and 'quiet' options enabling easy integration into scripts or tools.
Variety lacks a progress bar or percent-complete indicator; users must monitor MongoDB server logs for updates, which may not be available in all environments, as cautioned in the 'See Progress' section.
Default usage requires passing variables via --eval in the MongoDB shell, which is error-prone and complex, prompting the README to recommend variety-cli for a simpler interface.
Can run out of memory on deeply nested or messy collections, necessitating workarounds like 'logKeysContinuously' and 'excludeSubkeys' to manage analysis, indicating scalability limitations.