I'm not very comfortable with RegEx.
I use three variables, namely $url
, $pattern
and $replacement
and intend to use them as follows:
$url = $node->attr("href");
$resource = ExtractResourceWithoutHtmlExtension($url); // This is jus to abstract the stripping off of the prepended path and cutting the `.html` (see Edit 2 & 3 below).
$pattern = ...
$replacement = ${1}; // Not very sure of this value
$partno = preg_replace($pattern, replacement, $resource);
echo '"'.$partno.'";"'.$node->attr("title").'";"'.$url.'"'."
";
35000-0295
=> designation-of-the-products-as-slug-35000-0295
27021-0012
=> designation-of-the-products-as-slug-27021-0012
38811
=> designation-of-the-products-as-slug-38811
last but not the least (edge case => nothing to extract)
In case of non availability of Part number, the Resource substring would be simply
designation-of-the-products-as-slug
I still prefer RegEx solution because there might be a variation in the length of number within the segments constituting the Part number.
What should I assign to $pattern
and $replacement
?
The substring designation-of-the-products-as-slug
are mutable and path/to/ could be of any arbitrary depth.
On second thought I realise that there is no need to use RegEx for the whole URL path: http://path/to/ could be stripped of using parse_url
, explode
and array_pop
. Edited accordingly my post.
The the complexity could also reduce by cutting the immutable trailing substring .html
. Cf. @bloodyKnuckles's comment below. Post edited accordingly.
To start with I'd use a combination of parse_url
and pathinfo
to strip off extraneous bits from the string, then use preg_filter
with a regex like /.*?(\d+[\d-]*)$/
to grab the last chunk of digits plus optional following hyphens and digits.
$urls = [
"http://example.com/path/to/designation-of-the-products-as-slug-35000-0295.extension",
"http://example.com/path/to/designation-of-the-products-as-slug-35000.html",
"http://example.com/path/to/designation-of-the-products-as-slug.ext?foo=bar.baz"
];
$regex = '/.*?(\d+[\d-]*)$/';
foreach ($urls as $url) {
$resource = pathinfo(parse_url($url, PHP_URL_PATH), PATHINFO_FILENAME);
echo preg_filter($regex, '$1', $resource), "
";
}
35000-0295
35000