怎么把解析网站的电影信息添加到自己的数据库
java抓取优酷[url]http://movie.youku.com/[/url]中强档专区的简单例子,你参考一下。
[code="java"]
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
private static final String regex = "
public static void main(String[] args) {
try {
HttpURLConnection urlconn = (HttpURLConnection) new URL(
"http://movie.youku.com/").openConnection();
BufferedReader rd = new BufferedReader(new InputStreamReader(
urlconn.getInputStream(), "utf-8"));
String temp = null;
StringBuffer sb = new StringBuffer();
temp = rd.readLine();
while (temp != null) {
sb.append(temp);
temp = rd.readLine();
}
rd.close();
urlconn.disconnect();
String content = sb.toString();
System.out.println(content);
System.out.println(content);
System.out.println(content);
System.out.println(content);
System.out.println(content);
Pattern p = Pattern.compile(regex);
Matcher ma = p.matcher(content);
System.out.println("强档专区的电影:");
while (ma.find()) {
System.out.println(ma.group(1));
}
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
[/code]
调用数据库接口,用SQl存进去就是。
写一个程序去访问要抓取数据的页面,得到html后,然后解析html,找到想要的数据,保存到数据库中。这也是爬虫的原理,看看爬虫的原理吧。
当然要用爬虫框架,或httpclient.jar包