207.46.13.93 - - [31/Mar/2012:19:43:19 +0530] GET /robots.txt HTTP/1.1 404 613 - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) 64.242.88.10 - - [07/Mar/2004:16:05:49 -0800] "GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12846 64.242.88.10 - - [07/Mar/2004:16:06:51 -0800] "GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1" 200 4523 64.242.88.10 - - [07/Mar/2004:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291
It's far better to use a log parser but you could use this regex to extract log information :
(?<ip>\d+\.\d+\.\d+\.\d+)[\s-]+(?<date>\[.*\])[\s"]+(?<method>\w+)\s+(?<url>[^\s]+)\s+(?<protocol>[^\s]+)[\s"]+(?<status>\d+)\s+(?<length>\d+)([ -]+(?<useragent>.*))?
Explanation
(?<ip>\d+\.\d+\.\d+\.\d+)
An IP => ip group
[\s-]+
Either a whitespace or an hyphen, at least one time
(?<date>\[.*\])
[ followed by any character but a ] => capture date
[\s"]+
Either a whitespace or a double-quote, at least one time
(?<method>\w+)
some word characters => capture method
\s+
at least one whitespace
(?<url>[^\s]+)
Any character but a whitespace, at least one time => capture url
\s+
at least one whitespace
(?<protocol>[^\s]+)
Any character but a whitespace, at least one time => capture protocol
[\s"]+
Either a whitespace or a double-quote, at least one time
(?<status>\d+)
some digit characters => capture status
\s+
at least one whitespace
(?<length>\d+)
some digit characters => capture length
([ -]+(?<useragent>.*))?
Optional spaces or hyphens followed by user-agent => capture useragent