This is about json_encode dropping/nullifying values that contain non standard utf8 string like accented characters.
I've read many different solutions for this, and conceptualized creating my own function that will do this automatically for me as a blanket solution.
By the way, I only expect plain english strings.
Basically I want a function to replace json_encode , one that will have a routine to correct the string/objects passed before passing it to the actual json_encode function.
So the question is, what's the best routine for this? And by best, it takes into account the most practical and efficient.
For example, i've created a routine where I break up each character in the string , evaluates if its valid utf8, if not encodes it.. (this was exampled by one of the users at php.net
This is a character by character check and fix w/c seems to be bullet proof.
But my problem is this will impact performance.
So another way is to run utf8_encode on each string passed.. I'm thinking about checking each value / string if it contains any invalid UTF8 characters first, before running utf8_encode() on it to save on overhead.. but then again , each "check" routine is also an overhead in itself..
So is checking first really necessary? what if i just run utf8_encode() on all string passed through the function regardless of the string needing any utf8 corrections?
Is utf8_encode() a lightweight function? Will i have significant performance impact if i run hundreds of strings through utf8_encode()? (like a templating engine that i have using json output)
Looking for some solid advises and feedbacks.
regards
----UPDATE----
So i injected an array walk recursive function to go through all objects and utf8_encode all of it..
$response is a multidimensional array containing around 1000 nodes:
$response['stat']="ok";
$response['contacts'][0]['name']="Brad";
$response['contacts'][0]['email']="Brad@domain.com";
$response['contacts'][0]['number']="1800-55850";
$response['contacts'][1]['name']="Johj";
$response['contacts'][1]['email']="Johj@domain.com";
$response['contacts'][1]['number']="1800-7777";
...and so on..
my script time results are as follows:
w/o utf8_encode hammering - 1st run: 0.86414098739624
2nd run: 0.86342883110046
3rd run: 0.88974404335022
wtih utf8_encode hammering - 1st run: 0.91330289840698
2nd run: 0.90936899185181
3rd run: 0.89101815223694
about 100ms trade of..hmmm