在html页面中提取以.js扩展名结尾的所有链接

I want to extract all links that ends with .js within html page.I am able to fetch links that are within script tag but how could i fetch links from properties like {"yui":"http://l.yimg.com/nn/lib/metro/g/uicontrib/yui/yui_3.4.1.js"}. I want this to be done in php

A simple PHP HTML DOM parser written in PHP5+, supports invalid HTML, and provides a very easy way to handle HTML elements. Find tags on an HTML page with selectors just like jQuery. Extract contents from HTML in a single line.

Here is the link to get it: http://sourceforge.net/projects/simplehtmldom/

...and here is the official web site: http://simplehtmldom.sourceforge.net/

For basic HTML elements you can use http://code.google.com/p/phpquery/ to parse DOM content (it handle jquery like CSS selectors, functions like attr, find). Here is example howto use selectors with PhpQuery http://code.google.com/p/phpquery/wiki/Selectors.

For properties, it depends:

  • Some kind of regexp if they are in Javascripts or something else,
  • If they are in data attributes and you know attributes name, then you can get that json string and simply run json_decode php function on it.