如何通过php中的常规表达式从文本文件中提取不同的字段

207.46.13.93 - - [31/Mar/2012:19:43:19 +0530]  GET /robots.txt HTTP/1.1 404 613 -   Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
64.242.88.10 - - [07/Mar/2004:16:05:49 -0800] "GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12846
64.242.88.10 - - [07/Mar/2004:16:06:51 -0800] "GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1" 200 4523
64.242.88.10 - - [07/Mar/2004:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291

It's far better to use a log parser but you could use this regex to extract log information :

(?<ip>\d+\.\d+\.\d+\.\d+)[\s-]+(?<date>\[.*\])[\s"]+(?<method>\w+)\s+(?<url>[^\s]+)\s+(?<protocol>[^\s]+)[\s"]+(?<status>\d+)\s+(?<length>\d+)([ -]+(?<useragent>.*))?

Explanation

(?<ip>\d+\.\d+\.\d+\.\d+) An IP => ip group

[\s-]+ Either a whitespace or an hyphen, at least one time

(?<date>\[.*\]) [ followed by any character but a ] => capture date

[\s"]+ Either a whitespace or a double-quote, at least one time

(?<method>\w+) some word characters => capture method

\s+ at least one whitespace

(?<url>[^\s]+) Any character but a whitespace, at least one time => capture url

\s+ at least one whitespace

(?<protocol>[^\s]+) Any character but a whitespace, at least one time => capture protocol

[\s"]+ Either a whitespace or a double-quote, at least one time

(?<status>\d+) some digit characters => capture status

\s+ at least one whitespace

(?<length>\d+) some digit characters => capture length

([ -]+(?<useragent>.*))? Optional spaces or hyphens followed by user-agent => capture useragent

Demo