php和js中引号的正则表达式

I need to change bad quotes “” and "" to „“. For example I will take a few sentences with a bad input and what it should look like after a replace:

Bad input: „Do It“: „Another sentence in quotes“ “Bayern”
Good output: „Do It“: „Another sentence in quotes“ „Bayern“

Another example:
Bad input: „Do It“: „Another "sentence" in “Bayer” quotes“ “Ba “yer” n”
Good output: „Do It“: „Another „sentence“ in „Bayer“ quotes“ „Ba „yer“ n“

The tricky thing is that the same quote is used in a good quotes (at the end) and in a bad quotes (at the beginning).

Try the following regex(DEMO):

([“])+|([”])+

Edit: Try something like this in your php to replace them individually :

$str = '“Ba“Bayern”yern”';
$replaced = preg_replace(['/“/', '/”/'], ['„', '“'], $str);

To say that I have toiled away at this is an understatement. I put my tinfoil hat on and tested my patterns against as many uncooperative strings and fringe cases as I could. If you discover a valid quoted expression that is not matched by my pattern(s), please leave me a comment and I'll see if I can patch up my solution.

That said, I have made some assumptions:

  • All quotations are balanced (every opening quotation mark is paired/closed by the correct, related quotation mark). e.g. is followed by , " is followed by ", and is followed by
  • Nested quotations are never more than 2 layers. This means there are no "grandparent" quotations; only "lone" and "parent - child" quotations will occur.
  • I have built the patterns to accommodate "squished quotes" -- meaning two separate quoted expressions can be written side-by-side without a space separating them.
  • Leading quotation marks MAY NOT have a space immediately following them (This would only occur in sloppy/invalid input).
  • Trailing quotation marks MAY NOT have a space immediately preceding them (This would only occur in sloppy/invalid input), but may have 0 or more punctuation marks. This consideration allows "Then she said..." and “The price is WHAT?!?”

The short explanation:

My one-function method corrects quotations in two waves (two patterns each wave).

  • Pattern #1 replaces "bad curly double quotes" around "lone" and "nested" quotes.
  • Pattern #2 replaces "bad standard double quotes" around "lone" and "nested" quotes.
  • Pattern #3 replaces "bad curly double quotes" around "parent" quotes ONLY.
  • Pattern #4 replaces "bad standard double quotes" around "parent" quotes ONLY.

Wave 1 Pattern Demo (I piped #1 and #2 together)

Wave 2 Pattern Demo (I piped #3 and #4 together)

Code: (Demo)

$ins_outs=[
    '“Lone Bad Quote” 
    "Lone Bad Quote."
    „Lone Good Quote“
    „Start Good Parent "Bad Child?" „Good Child,“ “Bad Child...” End Good Parent!“
    "Start Bad Parent „Good Child!“ “Bad Child???” "Bad Child," End Bad Parent?"
    “Start Bad Parent „Good Child.“ End Bad Parent”
    „Start Good Parent “Bad Child,” End Good Parent“
    "Start Bad Parent “Bad Child,” End Bad Parent"'
    =>
    '„Lone Bad Quote“ 
    „Lone Bad Quote.“
    „Lone Good Quote“
    „Start Good Parent „Bad Child?“ „Good Child,“ „Bad Child...“ End Good Parent!“
    „Start Bad Parent „Good Child!“ „Bad Child???“ „Bad Child,“ End Bad Parent?“
    „Start Bad Parent „Good Child.“ End Bad Parent“
    „Start Good Parent „Bad Child,“ End Good Parent“
    „Start Bad Parent „Bad Child,“ End Bad Parent“',
    '„Do It“: „Another sentence in quotes“ “Bayern”'=>'„Do It“: „Another sentence in quotes“ „Bayern“',
    '„Do It“: „Another "sentence" in “Bayer” quotes“ “Ba “yer” n”'=>'„Do It“: „Another „sentence“ in „Bayer“ quotes“ „Ba „yer“ n“',
    '“Ba“Bayern”yern”'=>'„Ba„Bayern“yern“',
    '„Do “Bayern” It“'=>'„Do „Bayern“ It“',
    '„Do It“: „Another good quotes“ “Bayern”'=>'„Do It“: „Another good quotes“ „Bayern“'
];
foreach($ins_outs as $input=>$expected){
    echo " input = $input
";
    //   $bad_inners=preg_replace(['/“(\b[^"“”„]+\b[,.!?]*)”/u','/"\b([^"“”„]+\b[,.!?]*)"/u'],'„$1“',$input);
    //                                                             $bad_outers=preg_replace(['/“\b((?:[^"“”„]*„\b[^"“”„]+\b[,.!?]*“[^"“”„]*)+\b[,.!?]*)”/u','/"\b((?:[^"“”„]*„\b[^"“”„]+\b[,.!?]*“[^"“”„]*)+\b[,.!?]*)"/u'],'„$1“',$replace_bad_inners);
    echo "output = ",preg_replace(['/“(\b[^"“”„]+\b[,.!?]*)”/u','/"\b([^"“”„]+\b[,.!?]*)"/u','/“\b((?:[^"“”„]*„\b[^"“”„]+\b[,.!?]*“[^"“”„]*)+\b[,.!?]*)”/u','/"\b((?:[^"“”„]*„\b[^"“”„]+\b[,.!?]*“[^"“”„]*)+\b[,.!?]*)"/u'],'„$1“',$input),"
";
    //                               ^^^------inners1 ------^^^  ^^^------inners2 ------^^^   ^^^-----------------------outers1 -----------------------^^^   ^^^-----------------------outers2 -----------------------^^^
    echo "expect = $expected
---
";
}

I'll admit the regex patterns look rather convoluted at first glance. The good news is, once you separate the four patterns and break them into logical chunks, reading them is simplified.

/“(\b[^"“”„]+\b[,.!?]*)”/u - match all non-parent curly quotations (no internal quotes) and allow punctuation.

/"\b([^"“”„]+\b[,.!?]*)"/u - match all non-parent standard quotations (no internal quotes) and allow punctuation.

/“\b((?:[^"“”„]*„\b[^"“”„]+\b[,.!?]*“[^"“”„]*)+\b[,.!?]*)”/u - match all parent curly quotations. Parent quotes must contain one or more valid child quotations with optional leading and trailing text (the text may not contain any type of loose/unmatched double quote characters)

/"\b((?:[^"“”„]*„\b[^"“”„]+\b[,.!?]*“[^"“”„]*)+\b[,.!?]*)"/u - match all parent standard quotations. Parent quotes must contain one or more valid child quotations with optional leading and trailing text (the text may not contain any type of loose/unmatched double quote characters)

These four patterns are replaced by the same string each time „$1“. My patterns should work seamlessly in both php and javascript. ...I didnt' bother to code up the javascript equivalent, I'll leave you something to play with ;)