需要帮助编写正则表达式[关闭]

These are some samples of the data I am working with(I made some comments on the side):

TSG MUM 

BS06-312
RQWE. FKB 

BS06-204
NM. JAK 

BS06-E05
DB. FKB 

BS06-312
IGT. resetk 

Wender.   //--> special CASE 
ENG I. 

WEHN BS06-E06 

ENG II 

FLEM BS06-203 //--> special CASE: 2 Subjects
ITSI. MUM 

BS06-E02
PQT. RIE 

BS11-QCR PQT 

MARK BS11-QCR 

PQT FIS 

BS11-QCR //--> special CASE: several Subjects
INC FEY 

BS06-309
FU MAT 

SKU BS06-309
ABS. DOE 

BS06 ABS 

VOG BS06 

ABS HEI 

BS06 ABS 

MOR BS06 

ABS REM 

BS06 ABS 

DEI BS06 

ABS THA 

BS06
ENG III. 

GLIT BS06-209 

ENG II 

WANN BS06-208

These are subjects in a class schedule. The first letters represent the taught subject. After that its the teachers initials separated by a space. The last position is the building and room number.

Sometimes there are several subjects being taught on a specific time.

The data comes from an ics calendar file and I simply copied it here. The new line characters also need to be considered.

I need to extract the subject name, teachers initials and the room number so i can work with it. Any ideas on how to proceed? A complete regex pattern would be ideal.

I am working with php.

Thank you for your help.

I will exclude this line :

FU MAT 

SKU BS06-309

What have we got here ?

FU : subject
MAT : teacher
SKU : ???
BS06-309 : room

Solution :

Anyway, for the rest of the block you can user this regex :

(?:\s|\
\
)*(?<subject>\S+(?:\s[IVX]+\.?)?)(?:\s|\
\
)+(?<teacher>\S+)(?:\s|\
\
)+(?<room>\S+)(?:\s|\
\
)*

Regular expression visualization

Details :

(?:\s|\
\
)*                   # spaces or 

 - not caught
(?<subject>\S+(?:\s[IVX]+\.?)?)  # non-spaces plus I., II., III, IV... -> subject
(?:\s|\
\
)+                   # spaces or 

 - not caught
(?<teacher>\S+)                  # non-spaces -> teacher
(?:\s|\
\
)+                   # spaces or 

 - not caught
(?<room>\S+)                     # non-spaces -> room
(?:\s|\
\
)*                   # spaces or 

 - not caught

Result :

+-------+----------+---------+----------+
| MATCH | SUBJECT  | TEACHER | ROOM     |
+-------+----------+---------+----------+
| 1     | TSG      | MUM     | BS06-312 |
| 2     | RQWE.    | FKB     | BS06-204 |
| 3     | NM.      | JAK     | BS06-E05 |
| 4     | DB.      | FKB     | BS06-312 |
| 5     | IGT.     | resetk  | Wender.  |
| 6     | ENG I.   | WEHN    | BS06-E06 |
| 7     | ENG II   | FLEM    | BS06-203 |
| 8     | ITSI.    | MUM     | BS06-E02 |
| 9     | PQT.     | RIE     | BS11-QCR |
| 10    | PQT      | MARK    | BS11-QCR |
| 11    | PQT      | FIS     | BS11-QCR |
| 12    | INC      | FEY     | BS06-309 |
| 13    | ABS.     | DOE     | BS06     |
| 14    | ABS      | VOG     | BS06     |
| 15    | ABS      | HEI     | BS06     |
| 16    | ABS      | MOR     | BS06     |
| 17    | ABS      | REM     | BS06     |
| 18    | ABS      | DEI     | BS06     |
| 19    | ABS      | THA     | BS06     |
| 20    | ENG III. | GLIT    | BS06-209 |
| 21    | ENG II   | WANN    | BS06-208 |
+-------+----------+---------+----------+

Try it :

Demo

Improve it !

There're roman numbers sometimes : ENG I., ENG II...
I assume you'll only numbers from 1 to 39, that's why I only use [IVX]. You can improve this part adding L, C, M... Or using a real regex for roman numbers.