如何从网站上抓取所有内容? [关闭]

I develop websites and sometimes clients already have websites but need them totally revamped but most of the content and images need to stay the same. I'm looking for software, even if it costs or is a desktop application that will easily allow me to enter a URL and scrape all content to a designated folder on my local machine. Any help would be much appreciated.

htttrack will work just fine for you. It is an offline browser that will pull down websites. You can configure it as you wish. This will not pull down PHP obviously since php is server side code. The only thing you can pull down is html and javascript and any images pushed to the browser.

file_put_contents('/some/directory/scrape_content.html', file_get_contents('http://google.com'));

Save your money for charity.

By content do you mean the entire page contents, cause you can just "save as..." the whole page with most of the included media.

Firefox, in Tool -> Page Info -> Media, includes a listing of every piece of media on the page that you can download.

You can achieve this by save as option of the browser go to file->save page as in firefox and all the images and js will be saved in one folder

Don't bother with PHP for something like this. You can use wget to grab an entire site trivially. However, be aware that it won't parse things like CSS for you, so it won't grab any files referenced via (say) background-image: URL('/images/pic.jpg'), but will snag most everything else for you.

I started using HTTrack a couple of years ago and I'm happy with it. It seems to go out of its way to get pages I wouldn't even see on my own.

This class can help you scrape the content: http://simplehtmldom.sourceforge.net/

You can scrape websites with http://scrapy.org and get the content you want.

Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.