Actually, you can’t even parse html (5) with specialized tools or by converting it and then using xml linters (they quit out due to too many errors). Only tools capable of reliably parsing html (mostly) are the big 3 browser engines. Experience from converting saved webpages to asciidoctor, it involves cleaning up manually, despite tidy and pandoc.
This isn’t true. HTML5 made a very strict set of rules and there are a large handful of compliant parsers. But yes, you absolutely can’t use an XML parser. You can’t even use an XML emitter, as you can emit valid XML that means something completely different in HTML.
Actually, you can’t even parse html (5) with specialized tools or by converting it and then using xml linters (they quit out due to too many errors). Only tools capable of reliably parsing html (mostly) are the big 3 browser engines. Experience from converting saved webpages to asciidoctor, it involves cleaning up manually, despite tidy and pandoc.
This isn’t true. HTML5 made a very strict set of rules and there are a large handful of compliant parsers. But yes, you absolutely can’t use an XML parser. You can’t even use an XML emitter, as you can emit valid XML that means something completely different in HTML.
…what a fucking disaster. I still wish XHTML won.
Real question, why? I feel like there’s a story there