The dark side of XML

Opinion — The single most pathetic feature of XML is this: When a document is not XML valid, the specs insist you should throw an error instead of parsing it. Next time a coder brags about how useful this it, try this for an answer: Laugh at him.

It’s a lot simpler to parse a valid document, they go. And it’s not that difficult to produce a valid one. Moreover, there’s no possible ambiguity when you validate an XML document.

But then, here I am, hapless as a hapless user can be, subscribed to a feed with an ampersand in the content. As it happens, ampersands are not & in XML. They are &. Why not “& unless this could be ambiguous”, you ask? Go figure… The fact remains: I get a parse error.

As a hapless user, I don’t care what the specs say. I want my news reader to parse the news feed whether it is valid or not. If an obvious correction can correct the XML, then do not even notify me. Just correct it. This is like… basic usabiliy.

Comments on The dark side of XML

  1. Parsing with error recovery is very very hard, although it’s probably much easier with XML than with a programming language. For a modern example, the IE parser will handle nearly everything you throw at it, but Firefox is much less lenient. And I’m not talking about html here… i’m talking about complete garbage…

  2. There must be thousands of potential errors. How are you going to correct them all? And if you manage to do that, how about the rest of the world? Standards make life more easy!

  3. Sure, standards make the life of programmers easier. I just think it shouldn’t say in the specs that error correction should never occur. Obviously, you cannot catch all errors. But `&` and html entity problems, for instance… They’re so common. The user is suffering in the end.