I'm looking for an efficient way to reorganize parts of an XML document that contain multiple children of a type such as 'SmallCat' or 'BigCat'.
Here are the rules:
The input document looks like:
<Zoo>
<Habitat HabitatID="habitat.cage.1">
<Type>Cats</Type>
<Food>Birds</Food>
<BigCat AnimalID="Tiger.1">
<Type>Bengal</Type>
</BigCat>
<SmallCat AnimalID="bobcat.1">
<Type>Bobcat</Type>
</SmallCat>
<BodyTemp>endothermic</BodyTemp>
</Habitat>
<Habitat HabitatID="cage.2">
<Type>Cats</Type>
<Food>Birds</Food>
<SmallCat AnimalID="tabycat.1">
<Type>Tabycat</Type>
</SmallCat>
<BodyTemp>endothermic</BodyTemp>
</Habitat>
<ConsessionStand>
<Type>PopcornStand</Type>
</ConsessionStand>
</Zoo>
The output should look like:
<Zoo>
<Habitat HabitatID="sub_habitat.1.habitat.cage.1">
<Type>Cats</Type>
<Food>Birds</Food>
<BigCat AnimalID="Tiger.1">
<Type>Bengal</Type>
</BigCat>
</Habitat>
<Habitat HabitatID="sub_habitat.2.habitat.cage.1">
<Type>Cats</Type>
<Food>Birds</Food>
<SmallCat AnimalID="bobcat.1">
<Type>Bobcat</Type>
</SmallCat>
</Habitat>
<Habitat HabitatID="habitat.cage.1">
<BodyTemp>endothermic</BodyTemp>
<Child>
<HabitatID>sub_habitat.1.habitat.cage.1</HabitatID>
</Child>
<Child>
<HabitatID>sub_habitat.2.habitat.cage.1</HabitatID>
</Child>
</Habitat>
<Habitat HabitatID="cage.2">
<Type>Cats</Type>
<Food>Birds</Food>
<SmallCat AnimalID="tabycat.1">
<Type>Tabycat</Type>
</SmallCat>
<BodyTemp>endothermic</BodyTemp>
</Habitat>
<ConsessionStand>
<Type>PopcornStand</Type>
</ConsessionStand>
</Zoo>
The ideal solution will use XSLT but, any solution (bash, javascript, php, python, ruby, go, etc) that gets the job done is a worthy contender.
Here's an implementation that does ~90% of the work.
This solution does not reconstruct the first Habitat node with references to the new sub_habitat child nodes.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Habitat[count(BigCat|SmallCat) > 1]">
<xsl:param name="i"/>
<xsl:for-each select="BigCat|SmallCat">
<xsl:choose>
<xsl:when test="self::BigCat">
<Habitat HabitatID="sub_habitat.{position()}.{../@HabitatID}">
<xsl:copy-of select="../*[not(self::SmallCat|self::BodyTemp)]"/>
</Habitat>
</xsl:when>
<xsl:when test="self::SmallCat">
<Habitat HabitatID="sub_habitat.{position()}.{../@HabitatID}">
<xsl:copy-of select="../*[not(self::BigCat|self::BodyTemp)]"/>
</Habitat>
</xsl:when>
</xsl:choose>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
The resulting output is seen here.
<Zoo>
<Habitat HabitatID="sub_habitat.1.habitat.cage.1">
<Type>Cats</Type>
<Food>Birds</Food>
<BigCat AnimalID="Tiger.1">
<Type>Bengal</Type>
</BigCat>
</Habitat>
<Habitat HabitatID="sub_habitat.2.habitat.cage.1">
<Type>Cats</Type>
<Food>Birds</Food>
<SmallCat AnimalID="bobcat.1">
<Type>Bobcat</Type>
</SmallCat>
</Habitat>
<Habitat HabitatID="cage.2">
<Type>Cats</Type>
<Food>Birds</Food>
<SmallCat AnimalID="tabycat.1">
<Type>Tabycat</Type>
</SmallCat>
<BodyTemp>endothermic</BodyTemp>
</Habitat>
<ConsessionStand>
<Type>PopcornStand</Type>
</ConsessionStand>
</Zoo>
What have you tried? Each of the rules in your prose description of the problem translates pretty directly into a template rule. For example, the rule:
This experience contains more than 1 element (Audiovisual and Gallery), it will be reorganized as a set of 2 discrete experience children
becomes something like
<xsl:template match="Experience[count(Audiovisual|Gallery) gt 1]">
<xsl:for-each select="AudioVisual|Gallery">
<Experience ExperienceID="{../@ExperienceID}.ce.{position()}"/>
<xsl:copy-of select="../*[not(self::AudioVisual|self::Gallery)]"/>
<xsl:copy-of select="."/>
</Experience>
</xsl:for-each>
</xsl:template>
Just go through all your rules and write a template rule for each one.
Consider the following stylesheet:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/Zoo">
<xsl:copy>
<xsl:apply-templates select="Habitat/BigCat | Habitat/SmallCat"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="BigCat| SmallCat">
<Habitat HabitatID="sub_habitat.{position()}.{../@HabitatID}">
<xsl:copy-of select="../*[not(self::BigCat or self::SmallCat or self::BodyTemp)]"/>
<xsl:copy-of select="."/>
</Habitat>
</xsl:template>
<xsl:template match="Habitat">
<xsl:copy>
<xsl:copy-of select="@* | BodyTemp"/>
<xsl:apply-templates select="BigCat | SmallCat" mode="child"/>
</xsl:copy>
</xsl:template>
<xsl:template match="BigCat| SmallCat" mode="child">
<Child>
<HabitatID>
<xsl:text>sub_habitat.</xsl:text>
<xsl:value-of select="position()"/>
<xsl:text>.</xsl:text>
<xsl:value-of select="../@HabitatID"/>
</HabitatID>
</Child>
</xsl:template>
</xsl:stylesheet>
Applied to your input example, the result will be:
<?xml version="1.0" encoding="UTF-8"?>
<Zoo>
<Habitat HabitatID="sub_habitat.1.habitat.cage.1">
<Type>Cats</Type>
<Food>Birds</Food>
<BigCat AnimalID="Tiger.1">
<Type>Bengal</Type>
</BigCat>
</Habitat>
<Habitat HabitatID="sub_habitat.2.habitat.cage.1">
<Type>Cats</Type>
<Food>Birds</Food>
<SmallCat AnimalID="bobcat.1">
<Type>Bobcat</Type>
</SmallCat>
</Habitat>
<Habitat HabitatID="habitat.cage.1">
<BodyTemp>endothermic</BodyTemp>
<Child>
<HabitatID>sub_habitat.1.habitat.cage.1</HabitatID>
</Child>
<Child>
<HabitatID>sub_habitat.2.habitat.cage.1</HabitatID>
</Child>
</Habitat>
</Zoo>