总的问题是这样的:因为我需要在两个城市来回跑,但是机票价格不断变化,就想通过.NET的HttpClient对携程网的网页进行访问,定时为每几小时一次,然后在后台进行一下分析,当某个机票价格贴近预期的最低价的时候通知我。
我用的VB.NET,但是C#应该更广泛,如果哪位大神可以帮忙回答用C#就可以。
代码如下,我先写了一个类
Imports System.IO
Imports System.Net.Http
Public Class MySpider
Dim callback As CallBackSub
Public Sub CallBackTo(ByVal callclass As CallBackSub)
callback = callclass
End Sub
Public Shared Async Sub GetPage(ByVal url As String, ByVal callclass As CallBackSub)
Dim MyHttpClient As New HttpClient()
Dim str As String = Await GetStringFromUrl(url)
Dim callback0 As CallBackSub = callclass
callback0.Invoke(str)
End Sub
Private Shared Async Function GetStringFromUrl(ByVal Url As String) As Task(Of String)
Using client As HttpClient = New HttpClient()
Try
Dim response As HttpResponseMessage = Await client.GetAsync(Url)
response.EnsureSuccessStatusCode()
Dim responseBody As String = Await response.Content.ReadAsStringAsync()
Return responseBody
Catch e As HttpRequestException
Console.WriteLine(vbLf & "Exception Caught!")
Console.WriteLine("Message :{0} ", e.Message)
Return Nothing
End Try
End Using
End Function
End Class
这个类中的GetStringFromUrl是微软给的例程,然后我用了一个委托来把获取的字符串传递给Form窗体。(我的.NET仅仅勉强入门,正准备task和线程这些,就姑且用委托了,如果有正确的处理方法还麻烦告诉一下)
窗体代码就不贴出来了,就是一个textbox用来接收委托传过去的数据,然后一个button控制getpage
然后问题就在于,我发现我取出response中的数据后,根本找不到机票价格
然后我在浏览器中把网页保存了之后,发现里面有很多js和css文件,我就想是不是有些数据是通过js动态获取的。
最后我试了Windows窗体控件WebBroswer,在赋值了URL之后,仅仅这一句代码就得到了我想要的机票价格数据
TextBox1.Text = WebBrowser1.Document.All(1).InnerText
虽然问题可以说暂时解决,但是我还是觉得,HttpClient作为微软推荐使用的一个类,应该不会处理不了这种动态获取数据的情况,就想请教下各位使用.NET得前辈,是否可以用HttpClient类处理这个问题呢?
如果能给出代码我会感激不尽,如果给一个资料的链接也一样很感谢!
用Chrome/IE的F12去抓包,这些动态加载的东西应该是ajax方式得到的,抓包可以获取,然后用HttpWebRequest类照着模拟提交。
我postasync函数中的httpcontent里面的内容写的:
paramList.Add(New KeyValuePair(Of String, String)("airportParams", "[{dcity: ""sha"", acity: ""ckg"", dcityname: ""上海"", acityname: ""重庆"", date: ""2019-02-12""]}")) paramList.Add(New KeyValuePair(Of String, String)("army", "False")) paramList.Add(New KeyValuePair(Of String, String)("classType", "ALL")) paramList.Add(New KeyValuePair(Of String, String)("flightWay", "Oneway")) paramList.Add(New KeyValuePair(Of String, String)("hasBaby", "False")) paramList.Add(New KeyValuePair(Of String, String)("hasChild", "False")) paramList.Add(New KeyValuePair(Of String, String)("params", "[{dcity: ""SHA"", acity: ""NKG"", dcityname: ""上海"", acityname: ""南京"", date: ""2019-02-12"", dcityid: 2,dcityname: ""上海""]}")) paramList.Add(New KeyValuePair(Of String, String)("searchIndex", "1"))
,request的header写的
client.DefaultRequestHeaders.Add("Accept", "*/*") client.DefaultRequestHeaders.Add("AcceptEncoding", "gzip, deflate, br") client.DefaultRequestHeaders.Add("AcceptLanguage", "zh-CN,zh;q=0.9") client.DefaultRequestHeaders.Add("ContentLength", "340") client.DefaultRequestHeaders.Add("ContentType", "application/json") client.DefaultRequestHeaders.Add("UserAgent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36") client.DefaultRequestHeaders.Add("Authority", "flights.ctrip.com") client.DefaultRequestHeaders.Add("Method", "POST") client.DefaultRequestHeaders.Add("Path", "/itinerary/api/12808/products") client.DefaultRequestHeaders.Add("Scheme", "https") client.DefaultRequestHeaders.Add("Origin", "https://flights.ctrip.com") client.DefaultRequestHeaders.Add("Referer", "https://flights.ctrip.com/itinerary/oneway/sha-ckg?date=2019-02-12“)
然后运行之后得到的信息是{"status":0,"data":{"error":{"code":"1004","msg":"查询异常,请稍后再试"},"loginState":0}},请问是我哪里出错了呢?
我的content里的信息是按照payload来的,request的header是按照网页上的requestheader来的