用java对固定网址进行解析,获取页面部分内容。

比如说我通过地址获取到了网页的html文件,我现在想获取<span class="value" id="sku-discount-price" itemprop="price">6.89</span>  标签之间的6.89这个值,用java该怎么写呢?怎么做才是最合理的,我自己也尝试的写了一点,请各位高手们指教。如果有更好的方案,欢迎分享一下。

<div class="inf-pnl-price-detail">   
   <dl>   
   <dt>Price:</dt>   
   <dd>   
   <div class="price price-highlight">   
   <del class="original-price">US $    
   <span class="" id="sku-price">7.66</span>   
   <span class="separator">/</span>   
   <span class="unit">piece</span>   
   </del>   
   </div>   
   </dd>   
   <dt>Discount Price:</dt>   
   <dd>   
   <div class="price price-highlight">   
   <span class="currency" itemprop="priceCurrency" content="USD">US $</span>   
   <span class="value" id="sku-discount-price" itemprop="price">6.89</span>   
   <span class="separator">/</span><span class="unit"> piece </span>   
   <span class="time-left">(7  days left )</span>   
   </div>   
   </dd>   
   </dl>   
   </div>   

 我自己尝试写的代码:

public class TestUrl {

    public static void main(String[] args) {
        Long l1 = System.currentTimeMillis();
        
        String string = "http://www.aliexpress.com/item/10pcs-lot-New-arrival-Hot-sale-fashion-hoomia-jonadab-magicpencil-magic-pencil-earphones-in-earfree-shipping/848760252.html";
        String str3 = "";
        String str[] = new String[750];
        String str2 = "";
        int i = 0;
        try {
            URL readSource = new URL(string);
            BufferedReader input = new BufferedReader(new InputStreamReader(readSource.openStream()));
            input.skip(15555);
            
            
            while((str2 = input.readLine()) !=null){
                    str[i] = str2;
                    i++;
            }
            str3 = str[1]+str[2]+str[3]+str[4]+str[5]+str[6]+str[7];
            System.out.println("1====================>"+str3);
        } catch (Exception e) {
            e.printStackTrace();
        }
        
        String tempStr2 = str3.replaceAll(".*itemprop=\"price\">", "");
        String tempStr3 = tempStr2.replaceAll("</span>.*", "");
        System.out.println("tempStr2:"+tempStr3);
        
        Long l2 = System.currentTimeMillis();
        System.out.println("time:"+(l2-l1));
    }
}

 

直接使用jsoup css选择器语法 进行选择

String content = "你的网页内容";
Pattern p = Pattern.compile("]+sku-discount-price[^>]+>([0-9.]+)");
Matcher m = p.matcher(content);
if (m.find()) {
String str =m.group(1);//你的要的结果
System.out.println(str);
}