A Clojure/ClojureScript library that parses HTML into Clojure data structures for analysis, transformation, and serialization.
Hickory is a Clojure library that parses HTML into Clojure data structures, allowing developers to analyze, transform, and serialize HTML programmatically. It solves the problem of treating HTML as opaque text by providing a functional, data-oriented interface for web scraping, template manipulation, and static site generation.
Clojure and ClojureScript developers working with HTML processing, such as web scrapers, static site generator authors, or those needing to manipulate HTML documents in a functional style.
Developers choose Hickory for its dual-format output (Hiccup for convenience, Hickory maps for completeness), its expressive CSS-style selector API, and its seamless integration with Clojure's zipper abstraction for tree editing, all while maintaining cross-platform compatibility between JVM and JavaScript environments.
HTML as data
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Provides both Hiccup vectors for compact, familiar syntax and Hickory maps for lossless parsing of all HTML elements, including comments and doctypes, as shown in the parsing examples.
The hickory.select namespace offers CSS-style selectors for querying Hickory-format documents, with combinators like child and descendant for precise extraction, demonstrated in the web scraping example.
Built-in zippers for both formats enable immutable traversal and modification using Clojure's zipper abstraction, allowing edits like replacing nodes with zip/replace as illustrated.
Uses Jsoup on JVM and browser DOM in ClojureScript, with Node.js support via external libraries, ensuring consistent HTML processing across environments as documented.
The advanced selector API only works on Hickory format data, not Hiccup vectors, forcing conversions for queries and adding complexity for Hiccup-focused workflows.
Requires manual configuration of external DOM libraries like jsdom or xmldom for Node.js, with cautions about compatibility issues, increasing setup effort compared to browser usage.
Parsing large documents with Jsoup or DOM parsers, plus format conversions, can introduce latency versus native tools, and the functional approach may not suit high-throughput scenarios.