必须从html页面检索字符串值并将其存储在类别下的xml中

For a project, I need to pull out the value of a character ('v') from an HTML page generated by me.

The HTML page contains the following links with much garbage data around it:

/watch?v=blablabla&list=blablabla&index=7&feature=blablabla
/watch?v=blablabla&list=blablabla&index=8&feature=blablabla

The task is the values of 'v' has to be retrieved & stored under categories in an XML.

Try using regular expressions with preg_match_all

$file = file('path/file.html');
preg_match_all("/\/watch\?v=([a-z0-9]+)&list=[a-z0-9]*&index=[0-9]*/i", $file, $matches);

I'm not sure what the URL's will look like, so the regexp will have to be altered for that.

Try http://gskinner.com/RegExr/ to fine-tune your expression