在字符串中查找主题标签和标记用户

I am working on a project and am trying to add the ability to detect hashtags and tagged users.

The problem is I don't know how to make it stop reading when it reaches symbols or emojis (Except underscores) and not let the length go over 20 characters

For hashtags

#HelloWorld -> helloworld, #Hello_W0rld -> hello_w0rld, #Hello(World -> hello,

Also for tagged users (Which only allows A-Z a-z 0-9 and _

@HelloWorld -> helloworld, @Hello_W0rl.d -> hello_w0rl

My attempted code is (basically same for users or hashtags)

$words = explode(" ", $body);

        foreach($words as $word){
            if(substr($word, 0, 1) == "@"){
                $tagged_user = DB::query('SELECT id FROM users WHERE username=:username', array(':username' => ltrim($word, '@')))[0];
                $users .= $tagged_user,",";
            }
        }

        $users = rtrim($users, ',');

Also would it know not to save #% as a blank space

Edit: I updated it to this, is this correct?

$postid = "test_id";
        $matches = [];
        preg_replace_callback("/#([a-z_0-9]+)/i", function($res) use(&$matches) {
            $matches[] = strtolower($res[1]);
        }, $body);

        $matches2 = [];

        $tagholder = array_fill(0, count($matches), "?");
        $tagholderString = implode(", ", $tagholder);

        foreach($matches as $tagstring){
            if(DB::query('SELECT * FROM tags WHERE tag=:tag', array(':tag' => $tagstring))){
                $tag = DB::query('SELECT * FROM tags WHERE tag=:tag', array(':tag' => $tagstring))[0];
                DB::query ( "INSERT INTO post_tags VALUES(:tagid, :postid)", array (':tagid' => $tag['id'], ':postid' => $postid) );
            }else{
                $id = hash(sha256, $tagstring);
                DB::query ( "INSERT INTO tags VALUES(:id, :tag, :mode)", array (':id' => $id, ':tag' => $tagstring, ':mode' => 0) );
                DB::query ( "INSERT INTO post_tags VALUES(:tagid, :postid)", array (':tagid' => $id, ':postid' => $postid) );
            }
        }

        preg_replace_callback("/@([a-z_0-9]+)/i", function($res) use(&$matches2) {
            $matches2[] = strtolower($res[1]);
        }, $body);

        $userholder = array_fill(0, count($matches2), "?");
        $userholderString = implode(", ", $userholder);
        $user_query = DB::query("SELECT * FROM users WHERE username IN (".$userholderString.")", $matches2);

        $users_result = "";
        foreach($user_query as $result){
            $users_result .= $result['id'].",";
        }
        $users_result = rtrim($users_result, ',');

        //User string result
        $users_result;

You can use preg_replace_callback() to pass each result into strtolower(). You need to patterns, one for each of your requirements. For hashtags:

/#([a-z_0-9]+)/i

Demo

And for tags:

/@([a-z_0-9]+)/i

Demo

With each of those you're asking for a starting @ or #, then one or more occurrences of either a letter, a number or an underscore, case-insensitive.

The resulting code looks like this:

$matches = [];
$string = "#HelloWorld -> helloworld, #Hello_W0rld -> hello_w0rld, #Hello(World -> hello,";

preg_replace_callback("/#([a-z_0-9]+)/i", function($res) use(&$matches) {
    $matches[] = strtolower($res[1]);
}, $string);

var_dump($matches);

$matches2 = [];
$string2 = "@HelloWorld -> helloworld, @Hello_W0rl.d -> hello_w0rl,";

preg_replace_callback("/@([a-z_0-9]+)/i", function($res) use(&$matches2) {
    $matches2[] = strtolower($res[1]);
}, $string2);

var_dump($matches2);

Demo

Result:

array (size=3)
0 => string 'helloworld' (length=10)
1 => string 'hello_w0rld' (length=11)
2 => string 'hello' (length=5)

array (size=2)
0 => string 'helloworld' (length=10)
1 => string 'hello_w0rl' (length=10)


As a side note, you shouldn't be doing a query for each tag found. This will rapidly grow out of control and may severely hamper your database performance. Since you have all the tags in an array, do just one query with a WHERE IN clause, something like this:

$placeholders = array_fill(0, count($matches), "?"); // get a ? for each match
$placeholdersString = implode(", ", $placeholders); // make it a string
DB::query("SELECT id FROM users WHERE username IN (".$placeholderString.")", $matches); // bind each value
<?php

function hashtag($in) {

 preg_match_all('/#(\w+)/', $in, $found);

  foreach ($found[1] as $f) {
    $ht[] = $f;
  }

 return (array) $ht;
}

 function username($in) {

 preg_match_all('/@(\w+)/', $in, $found);

  foreach ($found[1] as $f) {
    $ht[] = $f;
  }

 return (array) $ht;
}

$string = "#hash1 #hash2 @user1 @user2 #hash3";
var_dump(hashtag($string));
var_dump(username($string));
?>

Two functions I just wrote, hope it helps. Using regex to extract the hashtags and usernames.