I have a piece of code here which I need either assurance, or "no no no!" about in regards to if I'm thinking about this in the right or entirely wrong way.
This has to deal with cutting a variable of binary data at a specific spot, and also dealing with multi-byte overloaded functions. For example substr
is actually mb_substr
and strlen
is mb_strlen
etc.
Our server is set to UTF-8
internal encoding, and so theres this weird little thing I do to circumvent it for this binary data manipulation:
// $binary_data is the incoming variable with binary
// $clip_size is generally 16, 32 or 64 etc
$curenc = mb_internal_encoding();// this should be "UTF-8"
mb_internal_encoding('ISO-8859-1');// change so mb_ overloading doesnt screw this up
if (strlen($binary_data) >= $clip_size) {
$first_hunk = substr($binary_data,0,$clip_size);
$rest_of_it = substr($binary_data,$clip_size);
} else {
// skip since its shorter than expected
}
mb_internal_encoding($curenc);// put this back now
I can't really show input and output results, since its binary data. But tests using the above appear to be working just fine and nothing is breaking...
However, parts of my brain are screaming "what are you doing... this can't be the way to handle this"!
Notes:
So, I guess my question is:
MY SOLUTION TO THE WORRY
I dislike answering my own questions... but I wanted to share what I have decided on nonetheless.
Although what I had, "worked", I still wanted to change the hack-job-altering of the charset encoding. It was old code I admit, but for some reason, I never looked at hex2bin
bin2hex
for doing this. So I decided to change it to use those.
The resulting new code:
// $clip_size remains the same value for continuity later,
// only spot-adjusted here... which is why the *2.
$hex_data = bin2hex( $binary_data );
$first_hunk = hex2bin( substr($hex_data,0,($clip_size*2)) );
$rest_of_it = hex2bin( substr($hex_data,($clip_size*2)) );
if ( !empty($rest_of_it) ) { /* process the result for reasons */ }
Using the hex functions, turns the mess into something mb will not screw with either way. A 1 million bench loop, showed the process wasn't anything to be worried about (and its safer to run in parallel to itself than the mb_encoding mangle method).
So I'm going with this. It sits better in my mind, and resolves my question for now... until I revisit this old code again in a few years and go "what was I thinking ?!".
However, parts of my brain are screaming "what are you doing... this can't be the way to handle this"!
Your brain is right, you shouldn't be doing that in PHP in the first place. :)
Is this actually fine to be doing?
It depends the purpose of your code.
I can't see any reason of the top of my head to cut a binary like that. So my first instinct would be "no no no!" use unpack() to properly parse the binary into usable variables.
That being said if you just need to split your binary because reasons, then I guess this is fine. As long as your tests confirm that the code is working for you, I can't see any problem.
As a side note, I don't use mbstring overloading exactly for this kind of use case - i.e. for whenever you need the default string functions.