WordPress export file headaches

Highly irregular blue object with "Your data" label

As my migration process from WordPress continues I want to talk about things that gave me some headache in WordPress’s export file.

Old links

If you aren’t developer, this one is for you.

In case you have migrated your WordPress blog from one web address to another you might want to check URLs1 on your new one. In my case my blog used to on DyslexicPanda.com , so after inspecting the xml file I found that internal links that were supposed to link to new blog address are still pointing to the old one.

CDATA and PHP objects everywhere

As any developer my love for good data formats is endless. So it feels a little bit disappointing that mixing data that’s not 100% XML compliant occurs. In my case it forced me to explore Python’s breadth of XML libraries, now two of them are in my arsenal because the most recommended and fastest one doesn’t support CDATA.

As for PHP objects, as a developer I understand that someone didn’t have time/budget/patience to create a valid XML for some parts of the data. Hopefully it’s nothing important.

Tags and Categories

This one really confuses me. At the beginning of the XML file there are plenty declarations of categories and tags, so the data gal in me expected that there would be something similar in each post item or at least clean looking. Instead they are there as if they bothered someone:

Everything is a post type…

..including menus.

This one is bothering me a lot because it’s hard to choose what to bring over to a different CMS and when everything is a post type of one kind or another it feels like noise. (For the time being I chose to ignore other post types than post and page for practical reasons.)

Conclusion

Although I’d like to see a change in how WordPress exports the content I don’t think anything will happen. First we must look at it as another web technology (~20% of all websites are powered by WordPress) and as with any other web technology backwards compatibility wins. But if you are working on at the moment less influential CMS you can learn from my frustrations.

  1. official name for web address, Uniform Resource Locator