W3C Semantic Web group's webapp implementation of pyRDFa: parse RDFa from a URL, uploaded file, or text area; get bookmarklets to parse RDFa directly from the current page.
My experience with document interchange led me to classify document formats using the essential distinction that some are "programmable" and some are not. [..]
The reason that this distinction is essential with respect to document interchange is that extracting information from documents in "programmable" document formats is equivalent to the halting problem. That is, it is arbitrarily difficult and cannot be automated in a general fashion.
For example, I conjecture that it is impossible to write a program that will extract the third word from a TeX document.