NodeJs CMS - XML or JSON as storage format when building a CMS

I'm writing a CMS in javascript on top of NodeJs with Express. My goal was to build something that I've built daily for the past years in .NET, but now purely in javascript. At the moment, I have the basic flow working, editing pages inline (with "contenteditable" attribute), then parsing the HTML of those editable divs to JSON and storing it in MongoDB.

The other way around off course, the JSON for the needed sections is parsed back to HTML server side and inserted in a JSDom document with jQuery and then the whole document is sent to the client.

This all works perfectly, but now I'm in a discussion with a colleague who is questioning the part where HTML is stored as JSON. In his opinion this should be XML, but before changing everything to XML I would like to hear some more opinions on this matter.

Does XML have an advantage over JSON in any part of the process? I would have to use XSLT to format the XML to HTML, instead of parsing the JSON back to HTML as i do know.

Any opinion on this would be highly appreciated.

I would tend to store the data in XML using a XML Database solution like BaseX for the following reasons:

  • It is very well suited to generate other markup formats like HTML in a standards compliant way using XSLT
  • It is easy to query using a standards compliant language like XPath
  • It's relatively easy to convert to almost all other formats (like PDF for example using XSL:FO)
  • It is easy to read/comprehend in almost any editor

To me it feels complex to convert a markup language like HTML (which is the input data format I assume of the CMS) to JSON and then convert the JSON back to HTML when rendering the CMS pages in "read" mode. Assuming that the input format is valid XHTML you could store and retrieve it as native XML which feels more natural to me.

JSON is definitely a more "node native" storage format. It also works nicely with mongo (couch, riak), as you've noted. One additional advantage of storing data as JSON rather than as (Stringified) XML, is that you can start indexing and querying properties in mongoDB whenever you feel like it.

If you organize your app nicely, it is very simple to replace your toJSON() page serializer with a toXML() method and vice versa. Your decision isn't set in stone, but since it seems to work for you and "it's the node way", I would stick with JSON.

I'd love to see some answers that are pro-XML, if anyone is up for it.

I suggestion you using JSON, it doesn't need much storage.

Most important it's super fast, I have tested parsing a querystring and a json object in nodejs, json is more faster than querystring. I believe it will more faster than XML too.

My latest open source project ourjs.org is also based on JSON. It's also a small CMS and it's very easy to cache data in the memory, so JSON is the one.