正则表达式在Javascript中无法正常工作?

In an earlier thread about inserting brackets around "comments" in a chess pgn-like string, I got excellent help finishing a regex that matches move lists and comments separately.

Here is the current regex:

((?:\s?[\(\)]?\s?[\(\)]?\s?[0-9]{1,3}\.{1,3}\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2}(?:\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2})?\s?[()]?\s?[()]?\s?)+)|((?:(?!\s?[\(\)]?\s?[\(\)]?\s?[0-9]{1,3}\.{1,3}\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2}).)+)

The three capture groups are:

  1. "e4 e5 2. f4 exf4 3.Nf3" etc -- i.e. lists of moves
  2. "Blah blah blah" -- i.e. "comments"
  3. comment ") (" comment -- i.e. close and begin parens, when a chess variation with a comment at the end "completes", and another chess variation with a comment at the beginning "starts"

In action here: http://regex101.com/r/dQ9lY5

Everything works correctly for "Your regular expression in" PCRE(PHP): it matches all three groups correctly. When I switch to "Your regular expression in" Javascript, however, it matches everything as Capture Group 1. Is there something in my regex that isn't supported by the Javascript regex engine? I tried to research this, but haven't been able to solve it. There is so much information on this topic, and I've already spent hours and hours.

I know one solution is to use the regex as-is, and pass it to PHP through AJAX, etc, but I don't know how to do that yet (it's on my list to learn).

Question 1: But I am also very curious about what it is in this regex that doesn't work on the Javascript regex engine.

Also, here is my Javascript CleanPgnText function. I am most interested in the while, but if anything else seems wrong, I would appreciate any help.

function CleanPgnText(pgn) {
  var pgnTextEdited = '';
  var str;
  var pgnInputTextArea = document.getElementById("pgnTextArea");
  var pgnOutputArea = document.getElementById("pgnOutputText");
  str = pgnInputTextArea.value;
  str = str.replace(/\[/g,"(");     //sometimes he uses [ incorrectly for variations
  str = str.replace(/\]/g,")"); 
  str = str.replace(/[
¬]*/g,"");  // remove newlines and that weird character that MS Word sticks in
  str = str.replace(/\s{2,}/g," "); // turn more than one space into one space

  while ( str =~ /((?:\s?[\(\)]?\s?[\(\)]?\s?[0-9]{1,3}\.{1,3}\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2}(?:\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2})?\s?[()]?\s?[()]?\s?)+)|((?:(?!\s?[\(\)]?\s?[\(\)]?\s?[0-9]{1,3}\.{1,3}\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2})[^\)\(])+)|((?:\)\s\())/g ) {
    if ($1.length > 0) {  //
      pgnTextEdited += $1;
    }
    else if ($2.length > 0) {
      pgnTextEdited += '{' + $2 + '}';
    }
    else if ($3.length > 0) {
      pgnTextEdited += $3;
    }
  }

  pgnOutputArea.innerHTML = pgnTextEdited;
}

Question 2: Regarding the =~ in the while statement

while ( str =~

I got the =~ from helpful code in my original thread, but it was written in Perl. I don't quite understand how the =~ operator works. Can I use this same operator in Javascript, or should I be using something else?

Question 3: Can I use .length the way I am, when I say

if ($1.length > 0) 

to see if the first capture group had a match?

Thank you in advance for any help. (If the regex101 link doesn't work for you, you can get a sample pgn to test on from the original thread).

I corrected your javascript code and got the following:

http://jsfiddle.net/ZXG2H/

  1. Personally I think the matching (group) problems are related to http://regex101.com/. Your expression works definitly in JavaScript (see the fiddle) and in Java (with escaping corrections). I minimalized your JavaScript slightly and used the pgn data from a parameter not a text input.

  2. I am not aware that =~ is available in JavaScript, but maybe I am wrong. Using JavaScript you loop through the matches using something like: (Why does it not format like code???)

    pattern=/myregexp/; while ((match=pattern.exec(mytext))!=null) { //do something }

  3. If no match is found for a group it returns null. You adress the groups by using the match variable from above with an index like match[2] is matching group 2.

I was looking at your new regex, its not quite right. Even though it looks to work with @wumpz 's JS code,
You can't just exclude [^)(] parenth's in the comment's section, because you are
only matching a string literal ) ( sequence (in capture group 3).
This could potentially exclude parenths from a match, where it doesn't become part of the newstring
that is constructed. Its not likely because the moves matches parenths.

To fix that, just exclude ') (`'s from comments, then match it first (group 1).
Also, I left some notes of the changes made from your new regex.
Try it out. I think @wumpz deserves the credit.

    #  /(\)\s*\()|((?:\s?[()]?\s?[()]?\s?[0-9]{1,3}\.{1,3}\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2}(?:\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2})?\s?[()]?\s?[()]?\s?)+)|((?:(?!\s?[()]?\s?[()]?\s?[0-9]{1,3}\.{1,3}\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-])(?!\)\s*\()[\S\s])+)/


    ( \) \s* \( )              # (1), 'Special Comment' configuration (must match first)
 |                           # OR,
    (                          # (2 start), 'Moves' configuration
         (?:
              \s? 
              [()]? \s? [()]? 
              \s? 
              [0-9]{1,3} \.{1,3}
              \s 
              [NBRQK]? [a-h1-8]? x? [a-hO] [1-8-] [O-]{0,3} [!?+#=]{0,2} [NBRQ]? 
              [!?+#]{0,2} 
              (?:
                   \s 
                   [NBRQK]? [a-h1-8]? x? [a-hO] [1-8-] [O-]{0,3} [!?+#=]{0,2} [NBRQ]? [!?+#]{0,2} 
              )?
              \s? 
              [()]? \s? [()]? 
              \s? 
         )+
    )                          # (2 end)
 |                           # OR,  
    (                          # (3 start), 'Normal Comment' configuration
         (?:
              (?!                        # Not the 'Moves configuration'
                   \s? 
                   [()]? \s? [()]? 
                   \s? 
                   [0-9]{1,3} \.{1,3}
                   \s 
                   [NBRQK]? [a-h1-8]? x? [a-hO] [1-8-] 

                   # ---- 
                   # Next line is not needed
                   # because all its items are
                   # optional
                   # ---- 
                   ### [O-]{0,3} [!?+#=]{0,2} [NBRQ]? [!?+#]{0,2}  <-  not needed
              )
              ### [^)(]    <- replaced by   '[\S\s]'  below
              # ---- 
              # The above line is replaced by any char.
              # because it excludes all ()'s and is not appropriate

              (?! \) \s* \( )            # Also, Not the 'Sspecial comment' configuration

              [\S\s]                     # Consume any char
         )+
    )                          # (3 end)

Modifing @wumpz JS code, it would look like this with modified regex

 function CleanPgnText(pgn) {
     var pgnTextEdited = '';
     var str;
     var pgnOutputArea = document.getElementById("pgnOutputText");
     str = pgn;
     str = str.replace(/\[/g, "("); //sometimes he uses [ incorrectly for variations
     str = str.replace(/\]/g, ")");
     str = str.replace(/[
¬]*/g, ""); // remove newlines and that weird character that MS Word sticks in
     str = str.replace(/\s{2,}/g, " "); // turn more than one space into one space

     //Start regexp processing
     var pattern = /(\)\s*\()|((?:\s?[()]?\s?[()]?\s?[0-9]{1,3}\.{1,3}\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2}(?:\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2})?\s?[()]?\s?[()]?\s?)+)|((?:(?!\s?[()]?\s?[()]?\s?[0-9]{1,3}\.{1,3}\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-])(?!\)\s*\()[\S\s])+)/g;

     while ((match = pattern.exec(str)) != null) {
         if (match[1] != null) {           // Special Comment configuration, don't add '{}'
             pgnTextEdited += match[1];
         } else if (match[2] != null) {    // Moves configuration  
             pgnTextEdited += match[2];
         } else if (match[3] != null) {    // Normal Comment configuration, add '{}'
             pgnTextEdited += '{' + match[3] + '}';
         }
     }
     //end regexp processing

     pgnOutputArea.innerHTML = pgnTextEdited;
 }

Running this in a Perl program, the output is:

{Khabarovsk is the capital of Far East of Russia. My 16-year-old opponent was a promising local prodigy. Now he is a very strong FM with a FIDE rating of 2437 and lives... in the USA, too! A small world.} 1. e4 c5 2. Nf3 e6 3. c3 Nf6 4. e5 Nd5 5. d4 cxd4 6. cxd4 d6 7. Nc3 Nc6 8. Bd3!? Nxc3 9. bxc3 dxe5 10. dxe5 Qa5 11. O-O Be7 12. Qb3 Nxe5 13. Nxe5 Qxe5 14. Bb5+ Kf8 15. Ba3 Qc7 16. Rad1 g6 17. c4! Bxa3 18. Qxa3+ Kg7 19. Rd6 Rd8 20. c5 Bd7 21. Bc4 Bc6 22. Rfd1 Rd7 23. Qg3 Rad8 {Finally with accurate, solid play Black has consolidated yet White still keeps some pressure and has some compensation for the pawn.} 24. h4 {A typical march in such positions, simply nothing else to do better.} 24... h5?! ( 24... h6 {would be a more careful response. }) ({ But the best defense was} 24... Rd6! 25. cd6 Qa5 ) 25. Qe5+ Kh7 26. Bd3 {Very natural} 26... Kh6? ( {Missing} 26... Ba4! 27. Qxh5+ Kg7 28. Qe5+ Kg8! {and now Black has many own threats. White would have to force a perpetual after} 29. h5! Bxd1 30. h6 f6 31. Qxf6 Bh5 32. Qxe6+ Kh7 33. Bxg6+ Bxg6 34. Qxg6+ Kh8 35. Qf6+ {Now, after 26...Kh6 everything is ready for preparing a decisive blow.} ) 27. Qf6! Kh7 ( {There is no} 27... Rxd6 28. cxd6 Rxd6? {due to} 29. Qh8# ) 28. g4! hxg4 29. h5 Rxd6 30. cxd6 Rxd6 31. hxg6+ Kg8 32. g7! {This pawn is the vital factor until the end now. With any other move, White loses.} 32... Qd8! {The only defense against Qh6 and Qh8 checkmating or queening.} 33. Qh6 f5 34. Rd2!! {The idea is the white rook cannot be taken with a check anymore. The bishop will be easily unpinned with the crushing Bxf5 or Bc4. The Black pin on d file was an illusion! In fact it's Black's rook that is pinned and cannot leave d file.} 34... Bd5 ( {The best try - to close d file with protecting more e6 pawn. No help is} 34... Rd7 35. Bf5 ef5 36. Qh8 Kf7 37. Rd7 ) ( {But maybe the best practical chance was} 34... g3!? {and now} 35. Bxf5 {doesn't win because of} 35... gxf2+ 36. Kh2 f1=N+! 37. Kh3 Bg2+! 38. Rxg2 Rd3+! 39. Bxd3 Qxd3+ {with an amazing perpetual} 40. Kh4 Qe4+ 41. Rg4 Qh1+ 42. Kg5 Qd5+ 43. Kf6 Qd8+ 44. Kg6 Qd3+ ) ( {But after} 34... g3!? {White wins using another wing tactic:} 35. Bc4! Bd5 36. Bxd5 exd5 37. Qh8+ Kf7 38. Rc2 gxf2+ 39. Kf1! {and there is no defense against Rc8. Now after 35...Bd5 again everything looks well protected.} ) 35. Qh8 Kf7 36. Bb5! {The bishop still makes his way breaking through. The coming Be8 is a killer.} 36... Qg8 37. Be8+! Qxe8 38. Qe8+ Kxe8 39. g8=Q+ Kd7 40. Qg7+ {It was White's 40th move Which means time control was over for me. I was short on time. A piece and three pawns for a queen is not enough. Black resigned. 1-0 }