I need to change bad quotes “”
and ""
to „“
. For example I will take a few sentences with a bad input and what it should look like after a replace:
Bad input: „Do It“: „Another sentence in quotes“ “Bayern”
Good output: „Do It“: „Another sentence in quotes“ „Bayern“
Another example:
Bad input: „Do It“: „Another "sentence" in “Bayer” quotes“ “Ba “yer” n”
Good output: „Do It“: „Another „sentence“ in „Bayer“ quotes“ „Ba „yer“ n“
The tricky thing is that the same quote “
is used in a good quotes (at the end) and in a bad quotes (at the beginning).
Try the following regex(DEMO):
([“])+|([”])+
Edit: Try something like this in your php to replace them individually :
$str = '“Ba“Bayern”yern”';
$replaced = preg_replace(['/“/', '/”/'], ['„', '“'], $str);
To say that I have toiled away at this is an understatement. I put my tinfoil hat on and tested my patterns against as many uncooperative strings and fringe cases as I could. If you discover a valid quoted expression that is not matched by my pattern(s), please leave me a comment and I'll see if I can patch up my solution.
That said, I have made some assumptions:
“
is followed by ”
, "
is followed by "
, and „
is followed by “
0 or more
punctuation marks. This consideration allows "Then she said..."
and “The price is WHAT?!?”
The short explanation:
Wave 1 Pattern Demo (I piped #1 and #2 together)
Wave 2 Pattern Demo (I piped #3 and #4 together)
Code: (Demo)
$ins_outs=[
'“Lone Bad Quote”
"Lone Bad Quote."
„Lone Good Quote“
„Start Good Parent "Bad Child?" „Good Child,“ “Bad Child...” End Good Parent!“
"Start Bad Parent „Good Child!“ “Bad Child???” "Bad Child," End Bad Parent?"
“Start Bad Parent „Good Child.“ End Bad Parent”
„Start Good Parent “Bad Child,” End Good Parent“
"Start Bad Parent “Bad Child,” End Bad Parent"'
=>
'„Lone Bad Quote“
„Lone Bad Quote.“
„Lone Good Quote“
„Start Good Parent „Bad Child?“ „Good Child,“ „Bad Child...“ End Good Parent!“
„Start Bad Parent „Good Child!“ „Bad Child???“ „Bad Child,“ End Bad Parent?“
„Start Bad Parent „Good Child.“ End Bad Parent“
„Start Good Parent „Bad Child,“ End Good Parent“
„Start Bad Parent „Bad Child,“ End Bad Parent“',
'„Do It“: „Another sentence in quotes“ “Bayern”'=>'„Do It“: „Another sentence in quotes“ „Bayern“',
'„Do It“: „Another "sentence" in “Bayer” quotes“ “Ba “yer” n”'=>'„Do It“: „Another „sentence“ in „Bayer“ quotes“ „Ba „yer“ n“',
'“Ba“Bayern”yern”'=>'„Ba„Bayern“yern“',
'„Do “Bayern” It“'=>'„Do „Bayern“ It“',
'„Do It“: „Another good quotes“ “Bayern”'=>'„Do It“: „Another good quotes“ „Bayern“'
];
foreach($ins_outs as $input=>$expected){
echo " input = $input
";
// $bad_inners=preg_replace(['/“(\b[^"“”„]+\b[,.!?]*)”/u','/"\b([^"“”„]+\b[,.!?]*)"/u'],'„$1“',$input);
// $bad_outers=preg_replace(['/“\b((?:[^"“”„]*„\b[^"“”„]+\b[,.!?]*“[^"“”„]*)+\b[,.!?]*)”/u','/"\b((?:[^"“”„]*„\b[^"“”„]+\b[,.!?]*“[^"“”„]*)+\b[,.!?]*)"/u'],'„$1“',$replace_bad_inners);
echo "output = ",preg_replace(['/“(\b[^"“”„]+\b[,.!?]*)”/u','/"\b([^"“”„]+\b[,.!?]*)"/u','/“\b((?:[^"“”„]*„\b[^"“”„]+\b[,.!?]*“[^"“”„]*)+\b[,.!?]*)”/u','/"\b((?:[^"“”„]*„\b[^"“”„]+\b[,.!?]*“[^"“”„]*)+\b[,.!?]*)"/u'],'„$1“',$input),"
";
// ^^^------inners1 ------^^^ ^^^------inners2 ------^^^ ^^^-----------------------outers1 -----------------------^^^ ^^^-----------------------outers2 -----------------------^^^
echo "expect = $expected
---
";
}
I'll admit the regex patterns look rather convoluted at first glance. The good news is, once you separate the four patterns and break them into logical chunks, reading them is simplified.
/“(\b[^"“”„]+\b[,.!?]*)”/u
- match all non-parent curly quotations (no internal quotes) and allow punctuation.
/"\b([^"“”„]+\b[,.!?]*)"/u
- match all non-parent standard quotations (no internal quotes) and allow punctuation.
/“\b((?:[^"“”„]*„\b[^"“”„]+\b[,.!?]*“[^"“”„]*)+\b[,.!?]*)”/u
- match all parent curly quotations. Parent quotes must contain one or more valid child quotations with optional leading and trailing text (the text may not contain any type of loose/unmatched double quote characters)
/"\b((?:[^"“”„]*„\b[^"“”„]+\b[,.!?]*“[^"“”„]*)+\b[,.!?]*)"/u
- match all parent standard quotations. Parent quotes must contain one or more valid child quotations with optional leading and trailing text (the text may not contain any type of loose/unmatched double quote characters)
These four patterns are replaced by the same string each time „$1“
. My patterns should work seamlessly in both php and javascript. ...I didnt' bother to code up the javascript equivalent, I'll leave you something to play with ;)