this post was submitted on 14 Jan 2026
106 points (99.1% liked)
PieFed Meta
2200 readers
3 users here now
Discuss PieFed project direction, provide feedback, ask questions, suggest improvements, and engage in conversations related to the platform organization, policies, features, and community dynamics.
Wiki
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
the bad thing oabout regex parsing html (or xml) in general is that how often it just works. like 90% of times, it works 100%, it is just the last 10% where shit breaks. I in most of my scripts use regex or grep, or in language with string methods, use find, and the amount of times it works is just so appealing to implement because all xml parsing libraries suck, and their bindings suck and it is just way to much work when grep 'title' gets you 90% there. I feel this.
It's somewhat ok in our situation though because the HTML we're dealing with was generated from Markdown rather than typed by people so it's well structured and the same each time.
The code is very not fun to read though, regex is just impossible gibberish.
I mean, it's not that bad if you've spent far far too long on regex101.com...
I guess I'm one of the few weirdos who actually likes messing around with multiple capture groups and complex patterns.