A simple HTML parser for Elixir that enables search for nodes using CSS selectors.
Floki is an HTML parsing library for the Elixir programming language that allows developers to search and manipulate HTML documents using CSS selectors. It solves the problem of extracting data from HTML by providing a familiar CSS selector syntax instead of requiring manual tree traversal.
Elixir developers who need to parse, query, or manipulate HTML documents, such as those building web scrapers, content extractors, or HTML processing tools.
Developers choose Floki for its simple Elixir-native API, comprehensive CSS selector support, and the ability to switch between different parser backends for optimal performance or correctness in their specific use case.
Floki is a simple HTML parser that enables search for nodes using CSS selectors.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Supports a wide range of CSS selectors including attribute selectors, pseudo-classes, and combinators, as detailed in the extensive selector table, making it versatile for querying HTML.
Allows switching between parsers like fast_html for performance (up to 20x faster) and html5ever for correctness, addressing limitations of the default mochiweb_html parser.
Provides intuitive functions such as find/2, attribute/3, and text/1, which are easy to use and integrate into Elixir workflows, as shown in the usage examples.
Represents HTML nodes as tuples {tag_name, attributes, children_nodes}, which aligns with Elixir's pattern matching for straightforward manipulation and extraction.
The built-in mochiweb_html parser is slower and can parse incorrectly per HTML5 specs, forcing users to opt for external parsers to avoid performance and correctness pitfalls.
Using the recommended fast_html parser requires installing C compiler, GNU Make, and CMake, adding significant deployment and setup complexity compared to pure-Elixir alternatives.
Pseudo-selectors like :has and :not only support simple selectors, restricting more complex querying scenarios that developers might expect from full CSS implementations.