I have a large number of strings to process in php. I want to "fix" them to be title case (using ucwords(strtolower($str))
) but only if they are all upper or all lower case already. If they are already mixed case, I'd just rather just leave them as they are.
What is the fastest way to check for this? It seems like for
ing through the string would be a rather slow way to go about it.
Here's what I have, which I think will be too slow:
function fixCase($str)
{
$uc = 0;
$lc = 0;
for($i=0;$i<strlen($str);$i++)
{
if ($str[$i] >= 'a' && $str[$i] <= 'z')
$lc++;
else if ($str[$i] >= 'A' && $str[$i] <= 'Z')
$uc++;
}
if ($uc == 0 || $lc == 0)
{
return ucwords(strtolower($str));
}
}
just use a string compare (case sensitive)
function fixCase($str)
{
if (
(strcmp($str, strtolower($str)) === 0) ||
(strcmp($str, strtoupper($str)) === 0) )
{
$str = ucwords(strtolower($str));
}
return $str;
}
Wouldn't it be easier to check if the string = lowercase(string) or string = uppercase(string) and if so then leave it. Otherwise perform your operation.
There's not going to be any amazing optimization, because by the nature of the problem you need to look at every character.
Personally, I would just loop over the characters of the string with this sort of algorithm:
Edit: actual code, I think this is about as good as you're going to get.
// returns 0 if non-alphabetic char, 1 if uppercase, 2 if lowercase
function getCharType($char)
{
if ($char >= 'A' && $char <= 'Z')
{
return 1;
}
else if ($char >= 'a' && $char <= 'z')
{
return 2;
}
else
{
return 0;
}
}
function fixCase($str)
{
for ($i = 0; $i < strlen($str); $i++)
{
$charType = getCharType($str[$i]);
if ($charType != 0)
{
$firstCharType = $charType;
break;
}
}
for ($i = $i + 1; $i < strlen($str); $i++)
{
$charType = getCharType($str[$i]);
if ($charType != $firstCharType && $charType != 0)
{
return $str;
}
}
if ($firstCharType == 1) // uppercase, need to convert to lower first
{
return ucwords(strtolower($str));
}
else if ($firstCharType == 2) // lowercase, can just ucwords() it
{
return ucwords($str);
}
else // there were no letters at all in the string, just return it
{
return $str;
}
}
Well I decided to do a test of the 2 proposed answers thus far and my original solution. I wouldn't have thought the results would turn out this way, but I guess native methods are /that/ much faster over all.
Code:
function method1($str)
{
if (strcmp($str, strtolower($str)) == 0)
{
return ucwords($str);
}
else if (strcmp($str, strtoupper($str)) == 0)
{
return ucwords(strtolower($str));
}
else
{
return $str;
}
}
// returns 0 if non-alphabetic char, 1 if uppercase, 2 if lowercase
function getCharType($char)
{
if ($char >= 'A' && $char <= 'Z')
{
return 1;
}
else if ($char >= 'a' && $char <= 'z')
{
return 2;
}
else
{
return 0;
}
}
function method2($str)
{
for ($i = 0; $i < strlen($str); $i++)
{
$charType = getCharType($str[$i]);
if ($charType != 0)
{
$firstCharType = $charType;
break;
}
}
for ($i = $i + 1; $i < strlen($str); $i++)
{
$charType = getCharType($str[$i]);
if ($charType != $firstCharType && $charType != 0)
{
return $str;
}
}
if ($firstCharType == 1) // uppercase, need to convert to lower first
{
return ucwords(strtolower($str));
}
else if ($firstCharType == 2) // lowercase, can just ucwords() it
{
return ucwords($str);
}
else // there were no letters at all in the string, just return it
{
return $str;
}
}
function method0($str)
{
$uc = 0;
$lc = 0;
for($i=0;$i<strlen($str);$i++)
{
if ($str[$i] >= 'a' && $str[$i] <= 'z')
$lc++;
else if ($str[$i] >= 'A' && $str[$i] <= 'Z')
$uc++;
}
if ($uc == 0 || $lc == 0)
{
return ucwords(strtolower($str));
}
}
function test($func,$s)
{
$start = gettimeofday(true);
for($i = 0; $i < 1000000; $i++)
{
$s4 = $func($s);
}
$end = gettimeofday(true);
echo "$func Time: " . ($end-$start) . " - Avg: ".sprintf("%.09f",(($end-$start)/1000000))."
";
}
$s1 = "first String";
$s2 = "second string";
$s3 = "THIRD STRING";
test("method0",$s1);
test("method0",$s2);
test("method0",$s3);
test("method1",$s1);
test("method1",$s2);
test("method1",$s3);
test("method2",$s1);
test("method2",$s2);
test("method2",$s3);
Results:
method0 Time: 19.2899270058 - Avg: 0.000019290
method0 Time: 20.8679389954 - Avg: 0.000020868
method0 Time: 24.8917310238 - Avg: 0.00002489
method1 Time: 3.07466816902 - Avg: 0.000003075
method1 Time: 2.52559089661 - Avg: 0.000002526
method1 Time: 4.06261897087 - Avg: 0.000004063
method2 Time: 19.2718701363 - Avg: 0.000019272
method2 Time: 35.2485661507 - Avg: 0.000035249
method2 Time: 29.3357679844 - Avg: 0.000029336
Note that anything that looks only at [A-Z] will be incorrect as soon as there are accented or umlaut characters. Optimizing for speed is meaningless if the result is incorrect (hey, if the result doesn't have to be correct, it can write you a REALLY fast implementation...)
You could try the string case test function I posted here
function getStringCase($subject)
{
if (!empty($subject))
{
if (preg_match('/^[^A-Za-z]+$/', $subject))
return 0; // no alphabetic characters
else if (preg_match('/^[^A-Z]+$/', $subject))
return 1; // lowercase
else if (preg_match('/^[^a-z]+$/', $subject))
return 2; // uppercase
else
return 3; // mixed-case
}
else
{
return 0; // empty
}
}
If the reason you want to avoid fixing already mixed-case strings is for efficiency then you are likely wasting your time, convert every string no matter its current condition:
function fixCase($str)
{
return ucwords(strtolower($str));
}
I would be very surprised if it ran any slower than the accepted answer for strings the length of those you would generally want to title case, and it's one less condition you need to worry about.
If, however, there is good reason to avoid converting already mixed-case strings, for example you want to preserve some intended meaning in the casing, then yes, jcinacio's answer is certainly the simplest and very efficient.