I am using CURL to download a page. Now I want to extract this from the page:
<object classid="clsid:67DABFBF-D0AB-41fa-9C46-CC0F21721616" width="640"
height="303.33333333333"
codebase="http://go.divx.com/plugin/DivXBrowserPlugin.cab"
id="object701207571">
<param name="autoPlay" value="false" />
<param name="custommode" value="Stage6" />
<param name="src" value="" />
<param name="movieTitle" value="Titanic" />
<param name="bannerEnabled" value="false" />
<param name="previewImage"
value="http://stagevu.com/img/thumbnail/oripmqeqzrccbig.jpg" />
<embed type="video/divx" src="" width="640" height="303.33333333333"
autoPlay="false" custommode="Stage6" movieTitle="Titanic"
bannerEnabled="false"
previewImage="http://stagevu.com/img/thumbnail/oripmqeqzrccbig.jpg"
pluginspage="http://go.divx.com/plugin/download/"
id="embed701207571">
</embed>
</object>
Please help!
See Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why this is probably the wrong thing to do.
That said you might be able to get away with something like /(<object>.*?<\/object>)/s
. This matches the string "<object>"
followed by any number of characters up to the string "</object>"
. The s
on the end tells .
to match newlines (it normally doesn't).
This is partially in response to Owens (because I can't put code in a comment very well). That regex might not work for the object tag, basically because the opening <object>
tag has attributes in it. Try this one instead:
/(<object[^>]*>)(.*?)(<\/object>)/si
It's case insensitive and broken into the three groupings for easy reference. It's not 100% perfect, but should help.
this regex will match all the line breaks between the opening and closing tags and capture the entire thing in one group
/(<object[^>]*?>(?:[\s\S]*?)<\/object>)/gi
Using SimpleXML:
$sxe = new SimpleXMLElement($xml);
$objects = $sxe->xpath('//object[@id="object701207571"]');
$object = $objects[0];
$params = $object->xpath('param');
foreach($params as $param)
{
$attrs = $param->attributes();
echo $attrs['name'] . ' = ' . $attrs['value'] . "
";
}
// Get plain XML:
echo $object->asXML();
$doc = DOMDocument::loadHTML($html);
foreach($node->getElementsByTagName('object') as $object)
{
echo $doc->saveXML($object);
}