Alan Quatermain

The Tumblog of one Jim Dovey, iOS Software Chief Architect at Kobo in Toronto, Ontario.
He Twitters, he has an , and can occasionally be found on LinkedIn or Facebook.
If you have a query, you can ask it here.

This blog contains personal opinions, and is not endorsed by any company.

95928231

I Can Haz HTML Parsur?

So, for a little top-secret project I’m helping out with, I had a need to pull some data from a <select> list while scraping an HTML page. The original source code was a PHP script using preg_match() and preg_match_all(), but I didn’t want to pull in any regex engine larger than RegexKitLite, which doesn’t provide a high-level method doing what preg_match_all() does. My solution was to leave regexes behind and tweak the AQXMLParser to optionally use the libxml HTML parsing routines.