正则表达式组选择

I'm stuck with log parsing. I've this rows in log file. Everything ends with line end

[2018.07.10 00:30:03:125] VersionInfo\886
[2018.07.10 00:30:03:109][TraceID: 8HRWSI105YVO91]->IncomingTime\16
[2018.07.10 00:30:03:109][TraceID: 8HRWSI105YVO91]->IncomingData\397
[2018.07.10 00:30:03:109][TraceID: 8HRWSI105YVO91]->ThreadID\8
[2018.07.10 00:30:03:109][TraceID: 8HRWSI105YVO91]->RequestExecuteStart\16
[2018.07.10 00:30:03:109][TraceID: 8HRWSI105YVO91]->RequestInfo\25
[2018.07.10 00:30:03:109][TraceID: 8HRWSI105YVO91]->CheckUserInfo\139
[2018.07.10 00:30:03:218]->Start RTS
[2018.07.10 00:30:03:640][TraceID: 8HRWSI105YVO91]->StartExecuteTask\35
[2018.07.10 00:30:03:749][TraceID: 8HRWSI105YVO91]->EndExecuteTask\36
[2018.07.10 00:30:03:749][TraceID: 8HRWSI105YVO91]->RequestExecuteEnd\16
[2018.07.10 00:30:03:749][TraceID: 8HRWSI105YVO91]->OutgoingData\26651

I want to parse each row in groups - time, traceid (if exists) and block name. To select datetime (which is always there) i use \[(.*?)\]. It's first group. Next must be traceid, if it exists. Get separator (?:\[|->| ) - [ or -> or . Group select is same as first \[(.*?)\]. And then goes third group with block name ([a-zA-Z ]+) - any text at the end without numbers.

I'm completely confused with how to connect it all. What i want to get is:

  • group 1 - datetime
  • group 2 - traceid | zero
  • group 3 - block name

This should do the trick: ^\[(.*?)\](?:\[(.*?)\])?->([a-zA-Z ]+). Make sure you're using the multi-line flag. Here's a Python demo:

>>> for x in re.finditer(r'^\[(.*?)\](?:\[(.*?)\])?->([a-zA-Z ]+)', file, re.M):
    print(x.group(1), x.group(2), x.group(3))
2018.07.10 00:30:03:109 TraceID: 8HRWSI105YVO91 IncomingTime
2018.07.10 00:30:03:109 TraceID: 8HRWSI105YVO91 IncomingData
2018.07.10 00:30:03:109 TraceID: 8HRWSI105YVO91 ThreadID
2018.07.10 00:30:03:109 TraceID: 8HRWSI105YVO91 RequestExecuteStart
2018.07.10 00:30:03:109 TraceID: 8HRWSI105YVO91 RequestInfo
2018.07.10 00:30:03:109 TraceID: 8HRWSI105YVO91 CheckUserInfo
2018.07.10 00:30:03:218 None Start RTS
2018.07.10 00:30:03:640 TraceID: 8HRWSI105YVO91 StartExecuteTask
2018.07.10 00:30:03:749 TraceID: 8HRWSI105YVO91 EndExecuteTask
2018.07.10 00:30:03:749 TraceID: 8HRWSI105YVO91 RequestExecuteEnd
2018.07.10 00:30:03:749 TraceID: 8HRWSI105YVO91 OutgoingData

You could make it only give you the actual trace ID using ^\[(.*?)\](?:\[TraceID: (.*?)\])?->([a-zA-Z ]+).