PHP - 检查字符串是否只有少于4个字符的单词

I need to check if a query string is made up solely with words of less than 4 characters, then remove all white spaces if it is true.

So something like: this has four character words or higher ... would return FALSE

Something like: hd 1 kit ... would return TRUE as no word in the string is greater than 3 characters.

I'd try coding it but haven't the slightest clue on how to write a regex for something like this.

Hope this simple solution will help you out.

Regex: /\b[a-zA-Z0-9]{4,}\b/

1. \b[a-zA-Z0-9]{4,}\b will match will four characters and \b is for boundry condition.

<?php

$string1="this has four character words or higher";
$string2="hd 1 kit";

if(!preg_match_all("/\b[a-zA-Z0-9]{4,}\b/", $string1))
{
    echo "Should be allowed";
}

You can do this with regex like @SahilGulati proposed, but it is probably more efficient to use explode():

$string = "this has four character words or higher";
$array = explode(" ", $string);
$success = true;
foreach ($array as $word) {
    if(strlen($word) < 4) {
        $success = false;
        break;
    }
}
if($success) {
    echo "ok";
} else {
    echo "nok";
}

Here is a live example.


And here is a live comparison for using regex and non-regex (about 35% faster when not using regex):

<?php
function noRegex() {
    $string = "this has four character words or higher";
    $array = explode(" ", $string);
    $success = true;
    foreach ($array as $word) {
        if(strlen($word) < 4) {
            $success = false;
            break;
        }
    }
    return $success;
}
function regex() {
    $string = "this has four character words or higher";
    $success = false;
    if(!preg_match_all("/\b[a-zA-Z0-9]{4}\b/", $string)) {
        $success = true;
    }
    return $success;
}

$before = microtime(true);
for($i=0; $i<2000000; $i++) {
    noRegex();
}
echo "no regex: ";
echo $noRegexTime = microtime(true) - $before;
echo $noRegexTime;
echo "
";

$before = microtime(true);
for($i=0; $i<2000000; $i++) {
    regex();
}
echo "regex: ";
echo $regexTime = microtime(true) - $before;
echo $regexTime;
echo "
";

echo "Not using regex is " . round((($regexTime / $noRegexTime) - 1) * 100, 2) . "% faster than using regex.";
?>

If you do not have punctuation marks in the string then the most efficient way will be to use strpos:

function checkWordsLenght($string, $limit)
{
    $offset = 0;
    $string .= ' ';

    while(($position = strpos($string, ' ', $offset)) !== false) {
        if (($position - $offset) > $limit) {
            return false;
        }

        $offset = $position + 1;
    }

    return true;
}

Here is working demo.

It is important, when providing regex-based solutions, that the answer deemed "best" is the most refined. This means providing the most accurate result, and when there is a tie on result accuracy, performance should be the next criteria, followed by pattern brevity if it comes to that.

For this reason, I am compelled to post an answer that is superior to the currently accepted answer. I will be using the variable name that V_RocKs uses in a comment under ssc-hrep3's answer.

Code using first sample string:

$query="this has four character words or higher";
$query=preg_match("/[^ ]{4,}/",$query)?str_replace(" ","",$query):$query;
echo "$query";

Output:

thishasfourcharacterwordsorhigher

Code using second sample string:

$query="hd 1 kit";
$query=preg_match("/[^ ]{4,}/",$query)?str_replace(" ","",$query):$query;
echo "$query";

Output:

hd 1 kit

Not only is my regex pattern equally accurate, it is shorter and more efficient (requires less steps). For this question, the use of boundary characters is needless and it negatively impacts performance by nearly 50%.

After dropping the word boundaries from the pattern, there are several ways to target the desired substrings. The following patterns have the exact same meaning and steps count:

  • /[a-zA-Z0-9]{4,}/
  • /[a-z0-9]{4,}/i
  • /[a-z\d]{4,}/i
  • /[^ ]{4,}/

My point is: readers don't come to SO in search of "meh, it's good enough" answers, they come here to pull inspiring/educational approaches from the vast knowledge base of the talented and diverse SO community. Let's press to achieve the best possible approach on every answer so that future readers can learn from our insights and become educated on all that coding languages have to offer.

When sub-optimal patterns are upvoted/green-ticked on SO, there is a missed opportunity to properly educate readers as to the best way to accomplish coding tasks.