For my users I need to present a screen where they can input multiple domain names in a textarea. The users can put the domain names on different lines, or separate them by spaces or commas (maybe even semicolons - I dont know!)
I need to parse and identify the individual domain names with extension (which will be .com, anything else can be ignored).
User input can be as:
asdf.com
qwer.com
AND/OR
wqer.com, gwew.com
AND/OR
ertert.com gdfgdf.com
No one will input a 3 level domain like www.abczone.com, but if they do I'm only interested in extracting the abczone.com part. (I can have a separate regex to verify/extract that from each).
This will do it:
(\b[a-zA-Z][a-zA-Z0-9-]*)(?=\.com\b)
"Find all sequences of a letter followed by letters, digits, or hyphens, followed by .com
then a word break."
(You need the last bit to protect against picking up bim.com
from bim.command.com
.)
Python test case because I don't have a PHP test environment to hand:
DATA = "asdf.com
x-123.com, gwew.com bim.command.com 123.com, x_x.com"
import re
print re.findall(r'(\b[a-zA-Z][a-zA-Z0-9-]*)(?=\.com\b)', DATA)
# Prints ['asdf', 'x-123', 'gwew', 'command']
Here it is, you can use the i modifier and delete all the uppercase A-Z if you want to:
\b([a-zA-Z][0-9a-zA-Z\-]{1,62})\.com\b