正则表达式匹配行分隔的大小字符串

I am writing a reular expression to validate input string, which is a line separated list of sizes ([width]x[height]).

Valid input example:

300x200
50x80


100x100

The regular expression I initially came up with is (https://regex101.com/r/H9JDjA/1):

^(\d+x\d+[
||
]*)+$

This regular expression matches my input but also matches this invalid input (size can't be 100x100x200):

300x200
50x80
100x100x200

Adding a word boundary at the end seems to have fixed this issue:

^(\d+x\d+[
||
]*\b)+$

My questions:

  1. Why does the initial regular expression without the word boundary fail? It looks like I am matching one or more instances of a \d+(number), followed by character 'x', followed by a \d+(number), followed by one or more new lines from various operating systems.
  2. How to validate input having multiple training new line characters in this input? The following doesn't work for some kind of inputs like this:

    500x500 100x100 384384

    ^(\d+x\d+[ || ]\b)+|[ || ]$

Isolate the problem with this target 100x100x200

For now, forget about the anchors in the regex.

The minimum regex is \d+x\d+ since it only has to be satisfied once
for a match to take place.

The maximum is something like this \d+x\d+ (?: (?:? | )* \d+x\d+ )*

Since ? | is optional, it can be reduced to this \d+x\d+ (?: \d+x\d+ )*

The result, when you applied to the target string is:

100x100x200 matches.

But, since you've anchored the regex ^$, it is forced to break up
the middle 100 to make it match.

100x10 from \d+x\d+
0x200 from (?: \d+x\d+ )*

So, that is why the first regex seemingly matches 100x100x200.

To avoid all of that, just require a line break between them, and
make the trailing linebreaks optional (if you need to validate the whole
string, otherwise leave it and the end anchor off).

^\d+x\d+(?:(?:? |)+\d+x\d+)*(?:? |)*$

A better view of it

 ^ 
 \d+ x \d+ 
 (?:
      (?: ? 
 |  )+
      \d+ x \d+ 
 )*
 (?: ? 
 |  )*
 $

Try this regex out

^[0-9]{1,4}x[0-9]{1,4}|[( || )]+$

It'll match these inputs.

1x1 10x10 100x100 2000x2938 but not this 100x100x200

Your initial regular expression "fails" because of the +:

^(\d+x\d+[
||
]*)+$
-----------------------^ here

Your parenthesis pattern (\d+x\d+[ || ]*) says match one or more number followed by an "x" followed by one or more number followed by zero or more newlines. The + after that says match one or more of the entire parenthesis pattern, which means that for an input like 100x200x300 your pattern matches 100x200 and then 200x300, so it looks like it matches the entire line.

If you're simply trying to extract dimensions from a newline-separated string, I would use the following regular expression with a multiline flag:

^(\d+x\d+)$

https://regex101.com/r/H9JDjA/2

Side note: In your expression, [ || ] is actually saying match any one instance of , , |, , |, or (i.e. it's quite redundant, and you probably aren't meaning to match |). If you want to match a sequential set of any combination of or , you can simply use [ ]+.

You can use multiline modifier, which should make life easier:

var input = "
\
300x200x400
\
50x80
\

\

\
300x200
\
50x80
\
100x100x200x100
";

var allSizes = input.match(/^\d+x\d+/gm); // multiline modifier assumes each line has start and end
for (var size in allSizes)
    console.log(allSizes[size]);

Prints:

300x200
50x80
300x200
50x80
100x100