A declarative struct-tag-based HTML unmarshaling and web scraping library for Go built on goquery.
goq is a Go library for declaratively unmarshaling HTML into structured Go types using struct tags with CSS selectors. It solves the problem of verbose and error-prone HTML parsing by allowing developers to define mappings between HTML elements and Go struct fields directly in their code, similar to JSON or XML decoding.
Go developers who need to scrape or parse HTML from websites, APIs, or documents and want a type-safe, declarative approach to extract structured data.
Developers choose goq because it reduces boilerplate code, provides compile-time safety through struct definitions, and offers a familiar API akin to Go's standard encoding packages, making HTML parsing more maintainable and less error-prone.
A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses struct tags with CSS selectors to map HTML elements directly to Go fields, similar to JSON/XML decoding, which reduces boilerplate manual parsing code.
Mirrors Go's standard encoding/json and encoding/xml patterns, making it intuitive for developers already comfortable with Go's ecosystem.
Implements the Unmarshaler interface for manual control over unmarshaling logic, providing flexibility when automatic mapping falls short.
CannotUnmarshalError includes specific field and value information, aiding in debugging selector issues during unmarshaling.
Decoder lacks streaming capabilities due to goquery limitations, making it inefficient for processing large HTML documents in chunks or real-time.
Rules for unmarshaling into maps and nested structs are intricate, with precedence issues that can lead to unexpected behavior, as detailed in the README.
Relies solely on CSS selectors via goquery, which may not handle advanced parsing needs like XPath or custom traversals without manual workarounds.