Unfortunatly I have to work in a older web application on a PHP4
server; It now needs to parse a lot of XM
L for calling webservices (custom protocol, no SOAP/REST)
;
Under PHP5
I would use SimpleXML
but that isn't available; There is Dom XML
in PHP4
, but it isn't default any more in PHP5
.
What are the other options? I'm looking for a solution that still works on PHP5
once they migrate.
A nice extra would be if the XML
can be validated with a schema.
There is a simpleXML backport avaliable: http://www.ister.org/code/simplexml44/index.html
If you can install that, then that will be the best solution.
It might be a bit grass roots, but if it's applicable for the data you're working with, you could use XSLT to transform your XML in to something usable. Obviously once you upgrade to PHP5 the XSLT will still work and you can migrate as and when to DOM parsing.
Andrew
I would second Rich Bradshaw's suggestion about the simpleXML backport, but if that's not an option, then xml_parse will do the job in PHP4, and still works after migration to 5.
$xml = ...; // Get your XML data
$xml_parser = xml_parser_create();
// _start_element and _end_element are two functions that determine what
// to do when opening and closing tags are found
xml_set_element_handler($xml_parser, "_start_element", "_end_element");
// How to handle each char (stripping whitespace if needs be, etc
xml_set_character_data_handler($xml_parser, "_character_data");
xml_parse($xml_parser, $xml);
There's a good tutorial here about parsing XML in PHP4 that may be of some use to you.
If you can use xml_parse, then go for that. It's robust, fast and compatible with PHP5. It is however not a DOM parser, but a simpler event-based one (Also called a SAX parser), so if you need to access a tree, you will have to marshal the stream into a tree your self. This is fairly simple to do; Use s stack, and push items to it on start-element
and pop on end-element
.
I would definitely recommend the SimpleXML backport, as long as its performance is good enough for your needs. The demonstrations of xml_parse look simple enough, but it can get very hairy very quickly in my experience. The content handler functions don't get any contextual information about where the parser is in the tree, unless you track it and provide it in the start and end tag handlers. So you're either calling functions for every start/end tag, or throwing around global variables to track where you are in the tree.
Obviously the SimpleXML backport will be a bit slower, as it's written in PHP and has to parse the whole document before it's available, but the ease of coding more than makes up for it.
Maybe also consider looking at the XML packages available in PEAR, particularly XML_Util, XML_Parser, and XML_Serializer...
XML Parser with parse_into_struct turned into a tree-array structure:
<?php
/**
* What to use for XML parsing / reading in PHP4
* @link http://stackoverflow.com/q/132233/367456
*/
$encoding = 'US-ASCII';
// https://gist.github.com/hakre/46386de578619fbd898c
$path = dirname(__FILE__) . '/time-series-example.xml';
$parser_creator = 'xml_parser_create'; // alternative creator is 'xml_parser_create_ns'
if (!function_exists($parser_creator)) {
trigger_error(
"XML Parsers' $parser_creator() not found. XML Parser "
. '<http://php.net/xml> is required, activate it in your PHP configuration.'
, E_USER_ERROR
);
return;
}
$parser = $parser_creator($encoding);
if (!$parser) {
trigger_error(sprintf('Unable to create a parser (Encoding: "%s")', $encoding), E_USER_ERROR);
return;
}
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);
xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 1);
$data = file_get_contents($path);
if ($data === FALSE) {
trigger_error(sprintf('Unable to open file "%s" for reading', $path));
return;
}
$result = xml_parse_into_struct($parser, $data, $xml_struct_values);
unset($data);
xml_parser_free($parser);
unset($parser);
if ($result === 0) {
trigger_error(sprintf('Unable to parse data of file "%s" as XML', $path));
return;
}
define('TREE_NODE_TAG', 'tagName');
define('TREE_NODE_ATTRIBUTES', 'attributes');
define('TREE_NODE_CHILDREN', 'children');
define('TREE_NODE_TYPE_TAG', 'array');
define('TREE_NODE_TYPE_TEXT', 'string');
define('TREE_NODE_TYPE_NONE', 'NULL');
/**
* XML Parser indezies for parse into struct values
*/
define('XML_STRUCT_VALUE_TYPE', 'type');
define('XML_STRUCT_VALUE_LEVEL', 'level');
define('XML_STRUCT_VALUE_TAG', 'tag');
define('XML_STRUCT_VALUE_ATTRIBUTES', 'attributes');
define('XML_STRUCT_VALUE_VALUE', 'value');
/**
* XML Parser supported node types
*/
define('XML_STRUCT_TYPE_OPEN', 'open');
define('XML_STRUCT_TYPE_COMPLETE', 'complete');
define('XML_STRUCT_TYPE_CDATA', 'cdata');
define('XML_STRUCT_TYPE_CLOSE', 'close');
/**
* Tree Creator
* @return array
*/
function tree_create()
{
return array(
array(
TREE_NODE_TAG => NULL,
TREE_NODE_ATTRIBUTES => NULL,
TREE_NODE_CHILDREN => array(),
)
);
}
/**
* Add Tree Node into Tree a Level
*
* @param $tree
* @param $level
* @param $node
* @return array|bool Tree with the Node added or FALSE on error
*/
function tree_add_node($tree, $level, $node)
{
$type = gettype($node);
switch ($type) {
case TREE_NODE_TYPE_TEXT:
$level++;
break;
case TREE_NODE_TYPE_TAG:
break;
case TREE_NODE_TYPE_NONE:
trigger_error(sprintf('Can not add Tree Node of type None, keeping tree unchanged', $type, E_USER_NOTICE));
return $tree;
default:
trigger_error(sprintf('Can not add Tree Node of type "%s"', $type), E_USER_ERROR);
return FALSE;
}
if (!isset($tree[$level - 1])) {
trigger_error("There is no parent for level $level");
return FALSE;
}
$parent = & $tree[$level - 1];
if (isset($parent[TREE_NODE_CHILDREN]) && !is_array($parent[TREE_NODE_CHILDREN])) {
trigger_error("There are no children in parent for level $level");
return FALSE;
}
$parent[TREE_NODE_CHILDREN][] = & $node;
$tree[$level] = & $node;
return $tree;
}
/**
* Creator of a Tree Node
*
* @param $value XML Node
* @return array Tree Node
*/
function tree_node_create_from_xml_struct_value($value)
{
static $xml_node_default_types = array(
XML_STRUCT_VALUE_ATTRIBUTES => NULL,
XML_STRUCT_VALUE_VALUE => NULL,
);
$orig = $value;
$value += $xml_node_default_types;
switch ($value[XML_STRUCT_VALUE_TYPE]) {
case XML_STRUCT_TYPE_OPEN:
case XML_STRUCT_TYPE_COMPLETE:
$node = array(
TREE_NODE_TAG => $value[XML_STRUCT_VALUE_TAG],
// '__debug1' => $orig,
);
if (isset($value[XML_STRUCT_VALUE_ATTRIBUTES])) {
$node[TREE_NODE_ATTRIBUTES] = $value[XML_STRUCT_VALUE_ATTRIBUTES];
}
if (isset($value[XML_STRUCT_VALUE_VALUE])) {
$node[TREE_NODE_CHILDREN] = (array)$value[XML_STRUCT_VALUE_VALUE];
}
return $node;
case XML_STRUCT_TYPE_CDATA:
// TREE_NODE_TYPE_TEXT
return $value[XML_STRUCT_VALUE_VALUE];
case XML_STRUCT_TYPE_CLOSE:
return NULL;
default:
trigger_error(
sprintf(
'Unkonwn Xml Node Type "%s": %s', $value[XML_STRUCT_VALUE_TYPE], var_export($value, TRUE)
)
);
return FALSE;
}
}
$tree = tree_create();
while ($tree && $value = array_shift($xml_struct_values)) {
$node = tree_node_create_from_xml_struct_value($value);
if (NULL === $node) {
continue;
}
$tree = tree_add_node($tree, $value[XML_STRUCT_VALUE_LEVEL], $node);
unset($node);
}
if (!$tree) {
trigger_error('Parse error');
return;
}
if ($xml_struct_values) {
trigger_error(sprintf('Unable to process whole parsed XML array (%d elements left)', count($xml_struct_values)));
return;
}
// tree root is the first child of level 0
print_r($tree[0][TREE_NODE_CHILDREN][0]);
Output:
Array
(
[tagName] => dwml
[attributes] => Array
(
[version] => 1.0
[xmlns:xsd] => http://www.w3.org/2001/XMLSchema
[xmlns:xsi] => http://www.w3.org/2001/XMLSchema-instance
[xsi:noNamespaceSchemaLocation] => http://www.nws.noaa.gov/forecasts/xml/DWMLgen/schema/DWML.xsd
)
[children] => Array
(
[0] => Array
(
[tagName] => head
[children] => Array
(
[0] => Array
(
[tagName] => product
[attributes] => Array
(
[srsName] => WGS 1984
[concise-name] => time-series
[operational-mode] => official
)
[children] => Array
(
[0] => Array
(
[tagName] => title
[children] => Array
(
[0] => NOAA's National Weather Service Forecast Data
)
)
[1] => Array
(
[tagName] => field
[children] => Array
(
[0] => meteorological
)
)
[2] => Array
(
[tagName] => category
[children] => Array
(
[0] => forecast
)
)
[3] => Array
(
[tagName] => creation-date
[attributes] => Array
(
[refresh-frequency] => PT1H
)
[children] => Array
(
[0] => 2013-11-02T06:51:17Z
)
)
)
)
[1] => Array
(
[tagName] => source
[children] => Array
(
[0] => Array
(
[tagName] => more-information
[children] => Array
(
[0] => http://www.nws.noaa.gov/forecasts/xml/
)
)
[1] => Array
(
[tagName] => production-center
[children] => Array
(
[0] => Meteorological Development Laboratory
[1] => Array
(
[tagName] => sub-center
[children] => Array
(
[0] => Product Generation Branch
)
)
)
)
[2] => Array
(
[tagName] => disclaimer
[children] => Array
(
[0] => http://www.nws.noaa.gov/disclaimer.html
)
)
[3] => Array
(
[tagName] => credit
[children] => Array
(
[0] => http://www.weather.gov/
)
)
[4] => Array
(
[tagName] => credit-logo
[children] => Array
(
[0] => http://www.weather.gov/images/xml_logo.gif
)
)
[5] => Array
(
[tagName] => feedback
[children] => Array
(
[0] => http://www.weather.gov/feedback.php
)
)
)
)
)
)
[1] => Array
(
[tagName] => data
[children] => Array
(
[0] => Array
(
[tagName] => location
[children] => Array
(
[0] => Array
(
[tagName] => location-key
[children] => Array
(
[0] => point1
)
)
[1] => Array
(
[tagName] => point
[attributes] => Array
(
[latitude] => 40.00
[longitude] => -120.00
)
)
)
)
[1] => Array
(
[tagName] => moreWeatherInformation
[attributes] => Array
(
[applicable-location] => point1
)
[children] => Array
(
[0] => http://forecast.weather.gov/MapClick.php?textField1=40.00&textField2=-120.00
)
)
[2] => Array
(
[tagName] => time-layout
[attributes] => Array
(
[time-coordinate] => local
[summarization] => none
)
[children] => Array
(
[0] => Array
(
[tagName] => layout-key
[children] => Array
(
[0] => k-p24h-n4-1
)
)
[1] => Array
(
[tagName] => start-valid-time
[children] => Array
(
[0] => 2013-11-02T08:00:00-07:00
)
)
[2] => Array
(
[tagName] => end-valid-time
[children] => Array
(
[0] => 2013-11-02T20:00:00-07:00
)
)
[3] => Array
(
[tagName] => start-valid-time
[children] => Array
(
[0] => 2013-11-03T07:00:00-08:00
)
)
[4] => Array
(
[tagName] => end-valid-time
[children] => Array
(
[0] => 2013-11-03T19:00:00-08:00
)
)
[5] => Array
(
[tagName] => start-valid-time
[children] => Array
(
[0] => 2013-11-04T07:00:00-08:00
)
)
[6] => Array
(
[tagName] => end-valid-time
[children] => Array
(
[0] => 2013-11-04T19:00:00-08:00
)
)
[7] => Array
(
[tagName] => start-valid-time
[children] => Array
(
[0] => 2013-11-05T07:00:00-08:00
)
)
[8] => Array
(
[tagName] => end-valid-time
[children] => Array
(
[0] => 2013-11-05T19:00:00-08:00
)
)
)
)
[3] => Array
(
[tagName] => time-layout
[attributes] => Array
(
[time-coordinate] => local
[summarization] => none
)
[children] => Array
(
[0] => Array
(
[tagName] => layout-key
[children] => Array
(
[0] => k-p24h-n5-2
)
)
[1] => Array
(
[tagName] => start-valid-time
[children] => Array
(
[0] => 2013-11-01T20:00:00-07:00
)
)
[2] => Array
(
[tagName] => end-valid-time
[children] => Array
(
[0] => 2013-11-02T09:00:00-07:00
)
)
[3] => Array
(
[tagName] => start-valid-time
[children] => Array
(
[0] => 2013-11-02T19:00:00-07:00
)
)
...
[10] => Array
(
[tagName] => end-valid-time
[children] => Array
(
[0] => 2013-11-06T08:00:00-08:00
)
)
)
)
[4] => Array
(
[tagName] => time-layout
[attributes] => Array
(
[time-coordinate] => local
[summarization] => none
)
[children] => Array
(
[0] => Array
(
[tagName] => layout-key
[children] => Array
(
[0] => k-p12h-n9-3
)
)
[1] => Array
(
[tagName] => start-valid-time
[children] => Array
(
[0] => 2013-11-01T17:00:00-07:00
)
)
...