脚本循环遍历html / php / js文件以构建已使用资源的列表

As some of my websites have progressed, the server has become cluttered with files no longer in use be it due to versioning, jquery plugins no longer being used, etc...

I'm thinking about writing a script using grep with some regex but if theres already something that exists, it would make things easier.

Can someone point me in the right direction of a script / program that I can feed a listing of html / php / js files that could loop through them, reading the code and tell me what .php, .js, .jpg, etc... files are included?

The script could look at src='', include(), require(), etc...

I'm not looking for someone to do it for me; just a starting point on how to proceed or if something already exists.

for images, scripts etc you can use firebug for firefox (just click the net tag)

In php you could use "var_dump( get_included_files() )" (at the end of your code) to get all included files for that particular page.

This thread might give you some answers although probably only works for Java.

Know that javascript code might include few more js/css files in not so grep-friendly way:

var extraScript = document.createElement('script');
extraScript.src = scriptUrl;
document.head.appendChild(extraScript);

//or
document.write('<script src="' + scriptUrl + '"></script>');

Also php include*() may also use variable or expression which is also impossible to grep as it needs to be executed.

To counter all that you should use javascript to parse rendered HTML after all other js has been executed or use firebug (Net tab) or Chrome dev tools, and for PHP use get_included_files() to get list of included php files, and write finds into a file or db or whatever...Doing this may take a while.

I'm not sure if such a solution already exists for PHP (probably), but if I find it I'll let you know.