正则表达式PHP代码刮去有换行符的街道地址

Been searching for two days now with Google, and a lot on SOF here, but I can't solve this regex preg_match problem. I want to simple scrape a street address, and normally I can do this easily, but because some street addresses have line breaks in the middle of them with around 25 characters of whitespace, my code displays an empty array or just NULL.

Below I have included the source code to show an example of what I'm trying to scrape, and also the failed code I have so far. Any help from someone with more experience than I, would be greatly appreciated this Sunday morning.

Sample of source code here;

<span style="font-size:14px;">736 
                  E 17th St</span><br />

My attempt so far;

$new_data = file_get_contents('someURLaddress');

$street_address_regex = '~14px\;\"\>(.*?)\<\/span\>\<br\s\/\>\s~s';

preg_match($street_address_regex,$new_data,$extracted_street_address);

var_dump ($extracted_street_address);

I'm only doing this because it is horrible practice to use a dot. The giveaway that you're doing something wrong in Regular Expressions is when you use the Single-Line option. That's a huge waste of resources and bound to break at some point.

This is 99.9% positively what you need to use:

$street_address_regex = '~14px;">([^<]*)~i';

Or, if you are (for some reason) expecting a < as a legitimate character, either meaning Less-than or formatting tags like bold or italics, then you can do this:

$street_address_regex = '~14px;">([^<]*<)*?\/span~i';

And if it bothers you enough that you don't want to have to format out the last < character you'll get in your string, you can do this:

$street_address_regex = '~14px;">((?:[^<]*(?(?!<\/span)<))*)~i';

.

Test it With This Tester

.

But honestly, you shouldn't even be using Regex. Find the stripos of <span style="font-size:14px;"> and add its length (to get the Address Starting Point)... Then find the stripos of </span> and input the offset point of the previously found Index (to get the Address Ending Point). Subtract them to get the length. Then pull the substr using the OriginalString, StartIndex, And Length.

Sounds like a lot, but make that a small function that you use instead of Regex, and just input the OriginalString, StartString, and EndString... then return the contents between StartString and EndString using the method I just said. Make the function re-usable.

With that function, that portion of your code will literally run 10 times faster, at least. Regex is powerful as hell for patterns, but you don't have a pattern, you have two static strings from which you want the contents between them. Regex is slow as hell for static string manipulation... Especially using the Dot with Single-Line ~Shiver~

$Input = '<span style="font-size:14px;">736 E 17th St</span><br />';
echo GetBetween($Input, '14px;">', '</span');

function GetBetween($OrigStr, $StartStr, $EndStr) {
    $StartPos = stripos($OrigStr, $StartStr) + strlen($StartStr);
    $EndPos = stripos($OrigStr, $EndStr, $StartPos);
    return substr($OrigStr, $StartPos, $EndPos - $StartPos);
}