• BluesF@lemmy.world
      link
      fedilink
      arrow-up
      18
      arrow-down
      3
      ·
      2 年前

      I don’t think that using regex to basically do regex stuff on strings that happen to also be HTML really counts as parsing HTML

    • hperrin@lemmy.world
      link
      fedilink
      arrow-up
      11
      ·
      edit-2
      2 年前

      Technically, regex can’t pull out every link in an HTML document without potentially pulling fake links.

      Take this example (using curly braces instead of angle brackets, because html is valid markdown):

      {template id="link-template"}
          {a href="javascript:void(0);"}link{/a}
      {/template}
      

      That’s perfectly valid HTML, but you wouldn’t want to pull that link out, and POSIX regex can’t really avoid it. At least not with just a single regex. Imagine a link nested within like 3 template tags.

        • hperrin@lemmy.world
          link
          fedilink
          arrow-up
          9
          arrow-down
          1
          ·
          2 年前

          I would argue that that is not parsing. That’s just pattern matching. For something to be parsing a document, it would have to have some “understanding” of the structure of the document. Since regex is not powerful enough to correctly “understand” the document, it’s not parsing.