带有“奇怪”字符的PHP POST

So, I've created a little online test portal. Essentially the user clicks a radio button next to what they think is the correct answer on this test. The code then does a string compare with the answer they clicked compared to what the actual answer should be. If the string is different it marks them wrong.

I have a few questions where I have "weird" characters in the questions. Things like em dashes, or even as simple as double quotation marks. It seems that when the user clicks one of these answers, the weird character isn't posted to my scoring page properly, therefore the string compare isn't working, and it's marking them incorrect.

Is there something I'm doing wrong?

Here's a snippet of code I use...

//Question 4
$question[$i]   = 'When are steel or composite toe boots required in the field?';
$answer[$i][1]  = 'Always-unless... actually, there is no “unless”';
$answer[$i][2]  = 'Never-crocs are truly a groundbreaking innovation appropriate in all settings.';
$correct[$i]    = $answer[$i][1];
$explanation[$i] = '';
$i++;

The code "breaks" at the ldquo; line.

The comparison code is here:

//Find incorrect answers and print them out with correct answers  formatted for the browser.
    for($i=1; $i<=$totalquest; $i++){
        if($_POST[$i]!=$correct[$i]){
            $WrongAnswers .= "<b>You answered Question $i incorrectly:</b><br>$question[$i]<br>You answered: $_POST[$i]<br>The correct answer is: $correct[$i]<p>";
            $score=$score-1;
        }
    }
echo $WrongAnswers;

And the code that creates the test is here:

for($i=1; $i<=$totalquest; $i++)
{
    echo $i.'. '.$question[$i]."<br>";
    $totalans=count($answer[$i]);
    for($j=1; $j<=$totalans; $j++)
    {
        echo '<input type="radio" name="'.$i.'" value="'.$answer[$i][$j].'" required>'.$answer[$i][$j].'<br>';
    }
    echo '<p>';
    
}

</div>

The specific issue you reference is to do with HTML character encoding of the characters.

for instance you're &ldquo; should be replaced with a human-readable character for comparsion. To achieve this you can html_entity_decode() the string, as thus:

$var = "Always-unless... actually, there is no &ldquo;unless&rdquo;";
$string = html_entity_decode($var, ENT_QUOTES);

String data sent through POSTed variables (as form data should be), are not by default html-encoded, so you need to decode the comparison string OR html encode the submitted value.

Also there may be a secondary issue with character sets, you need to add this to your form:

 <head>
 <meta charset="UTF-8"> /* HTML-5 */
 </head>
 <body>
 <form ... accept-charset='UTF-8'>
 ...

And also research PHP mb_string so you can typically come out with something like this:

PHP Page (in this instance receiving the form data):

 mb_internal_encoding('UTF-8');
 mb_http_output('UTF-8');
 mb_http_input('UTF-8');

  ...

 $string = html_entity_decode($var, ENT_QUOTES, "UTF-8");
 /*** 
  Or alternatively you encode the $_POSTed value.
  ***/

 if($_POST['radiochoice'] == $string){
       //... correct.
 }

As a logic point of view, what would be better would be not to need to pass the whole textual values back and forth between pages at all, but simply a reference, the radio button value="" is the value passed to the next page, so you can simply set this value to a unique attribute, such as a number, and then simply check:

  if((int)$_POST['radiochoice']) == 4){
     //open 4 picked, this is correct!
  }

The user will still read the same texts on the page, from their point of view nothing's changed.

(int) is used to force the input to being an integer to minimise and prevent CSRF and other security dilemmas (this is simply a side point on a much wider topic of form security).