I need to check if a string contains a table, which is like this:
+--------+------+-------+
| <info>number</info> | <info>char</info> | <info>word</info> |
+--------+------+-------+
| 1 | a | alfa |
| 2 | b | beta |
| 3 | c | gamma |
+--------+------+-------+
I do not know the number of columns, nor the number of rows, but this is the structure of the table.
This regex works with Unix, but not with Windows
[\+\-]+[
](\|(\s+<info>[^<]+<\/info>\s+\|)+)[
](\+|\-)+[
]((\|(\s+[^\|]+\s+\|)+[
])+)(\+|\-)+
This is a test:
https://regex101.com/r/TSxSd7/1
And this is a part of code:
$regexRowDivider = '[\+\-]+';
$regexHeader = '(\|(\s+<info>[^<]+<\/info>\s+\|)+)';
$regexRow = '\|(\s+[^\|]+\s+\|)+';
$regexRows = '((' . $regexRow . '[
])+)';
$regexTable = $regexRowDivider . '[
]' . $regexHeader . '[
]' . $regexRowDivider . '[
]' . $regexRows . $regexRowDivider;
preg_match('/' . $regexTable . '/', $output, $matches);
After hours of testing, I can not figure out what the problem is. Do you have any idea? preg_last_error()
returns a 0
(no errors)
This is probably because you're only accepting one of either or
, and Windows uses both (
). You can match both of these newline characters (and more) by using
\R
.
I removed a few pieces of complexity, so it doesn't enforce spaces in every cell:
[+-]+\R+\|(\s*<info>[^<]+<\/info>\s*\|)+\R+[+-]+\R+\|([^|]+\|)+\R+[+-]+
\---/ \------------------------------/ \---/ \----------/ \---/
Line Column Headers Line Contents Line