A standards-compliant HTML5 parser and serializer written entirely in PHP for server-side HTML processing.
Masterminds/html5-php is a robust HTML5 parser and serializer library built entirely in PHP. It enables server-side processing of HTML5 documents by providing reliable parsing into DOMDocument objects and serialization back to well-formed HTML5 output, adhering closely to the HTML5 specification while correcting some HTML automatically. It is designed for practical use in production environments, with over five million downloads.
PHP developers working on server-side applications that need to parse, manipulate, or generate HTML5 content, such as web scrapers, content management systems, or tools processing mixed PHP/HTML documents. It is also suitable for those requiring interoperability with libraries like QueryPath for jQuery-style DOM traversal.
Developers choose HTML5-PHP for its standards compliance, stability, and comprehensive feature set including both high-level and low-level APIs (like SAX-like event-based parsing). Its unique selling points are seamless integration with PHP's DOMDocument, support for legacy HTML tags, and the ability to handle namespaces and mixed PHP/HTML documents, making it a versatile choice for server-side HTML5 processing.
An HTML5 parser and serializer for PHP.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
With over five million downloads and use in many production websites, it is a reliable and stable choice for server-side HTML5 processing.
Offers both high-level DOM-based parsing for easy manipulation and low-level SAX-like event-based parsing for efficient, custom processing.
Seamlessly integrates with QueryPath for jQuery-style DOM traversal, enhancing productivity in PHP projects that need familiar manipulation patterns.
Supports Composer and PHP namespaces, making it easy to install and use in modern PHP applications with minimal setup.
Does not enforce full HTML5 specification compliance; it only corrects some HTML automatically, which can mask syntax errors in input documents.
Admits unsupported features like HTML manifests, PLAINTEXT, and the adoption agency algorithm, limiting its usefulness for full HTML5 standard implementation.
Attribute names that do not conform to XML 1.0 standards are ignored during parsing, potentially leading to data loss in non-standard HTML.
XML namespaces are not natively supported and require explicit setup through options like 'xmlNamespaces', adding complexity for such use cases.