I am working on a project and am trying to add the ability to detect hashtags and tagged users.
The problem is I don't know how to make it stop reading when it reaches symbols or emojis (Except underscores) and not let the length go over 20 characters
For hashtags
#HelloWorld -> helloworld, #Hello_W0rld -> hello_w0rld, #Hello(World -> hello,
Also for tagged users (Which only allows A-Z a-z 0-9 and _
@HelloWorld -> helloworld, @Hello_W0rl.d -> hello_w0rl
My attempted code is (basically same for users or hashtags)
$words = explode(" ", $body);
foreach($words as $word){
if(substr($word, 0, 1) == "@"){
$tagged_user = DB::query('SELECT id FROM users WHERE username=:username', array(':username' => ltrim($word, '@')))[0];
$users .= $tagged_user,",";
}
}
$users = rtrim($users, ',');
Also would it know not to save #%
as a blank space
Edit: I updated it to this, is this correct?
$postid = "test_id";
$matches = [];
preg_replace_callback("/#([a-z_0-9]+)/i", function($res) use(&$matches) {
$matches[] = strtolower($res[1]);
}, $body);
$matches2 = [];
$tagholder = array_fill(0, count($matches), "?");
$tagholderString = implode(", ", $tagholder);
foreach($matches as $tagstring){
if(DB::query('SELECT * FROM tags WHERE tag=:tag', array(':tag' => $tagstring))){
$tag = DB::query('SELECT * FROM tags WHERE tag=:tag', array(':tag' => $tagstring))[0];
DB::query ( "INSERT INTO post_tags VALUES(:tagid, :postid)", array (':tagid' => $tag['id'], ':postid' => $postid) );
}else{
$id = hash(sha256, $tagstring);
DB::query ( "INSERT INTO tags VALUES(:id, :tag, :mode)", array (':id' => $id, ':tag' => $tagstring, ':mode' => 0) );
DB::query ( "INSERT INTO post_tags VALUES(:tagid, :postid)", array (':tagid' => $id, ':postid' => $postid) );
}
}
preg_replace_callback("/@([a-z_0-9]+)/i", function($res) use(&$matches2) {
$matches2[] = strtolower($res[1]);
}, $body);
$userholder = array_fill(0, count($matches2), "?");
$userholderString = implode(", ", $userholder);
$user_query = DB::query("SELECT * FROM users WHERE username IN (".$userholderString.")", $matches2);
$users_result = "";
foreach($user_query as $result){
$users_result .= $result['id'].",";
}
$users_result = rtrim($users_result, ',');
//User string result
$users_result;
You can use preg_replace_callback()
to pass each result into strtolower()
. You need to patterns, one for each of your requirements. For hashtags:
/#([a-z_0-9]+)/i
And for tags:
/@([a-z_0-9]+)/i
With each of those you're asking for a starting @
or #
, then one or more occurrences of either a letter, a number or an underscore, case-insensitive.
The resulting code looks like this:
$matches = [];
$string = "#HelloWorld -> helloworld, #Hello_W0rld -> hello_w0rld, #Hello(World -> hello,";
preg_replace_callback("/#([a-z_0-9]+)/i", function($res) use(&$matches) {
$matches[] = strtolower($res[1]);
}, $string);
var_dump($matches);
$matches2 = [];
$string2 = "@HelloWorld -> helloworld, @Hello_W0rl.d -> hello_w0rl,";
preg_replace_callback("/@([a-z_0-9]+)/i", function($res) use(&$matches2) {
$matches2[] = strtolower($res[1]);
}, $string2);
var_dump($matches2);
Result:
array (size=3)
0 => string 'helloworld' (length=10)
1 => string 'hello_w0rld' (length=11)
2 => string 'hello' (length=5)array (size=2)
0 => string 'helloworld' (length=10)
1 => string 'hello_w0rl' (length=10)
As a side note, you shouldn't be doing a query for each tag found. This will rapidly grow out of control and may severely hamper your database performance. Since you have all the tags in an array, do just one query with a WHERE IN
clause, something like this:
$placeholders = array_fill(0, count($matches), "?"); // get a ? for each match
$placeholdersString = implode(", ", $placeholders); // make it a string
DB::query("SELECT id FROM users WHERE username IN (".$placeholderString.")", $matches); // bind each value
<?php
function hashtag($in) {
preg_match_all('/#(\w+)/', $in, $found);
foreach ($found[1] as $f) {
$ht[] = $f;
}
return (array) $ht;
}
function username($in) {
preg_match_all('/@(\w+)/', $in, $found);
foreach ($found[1] as $f) {
$ht[] = $f;
}
return (array) $ht;
}
$string = "#hash1 #hash2 @user1 @user2 #hash3";
var_dump(hashtag($string));
var_dump(username($string));
?>
Two functions I just wrote, hope it helps. Using regex to extract the hashtags and usernames.