regex / preg_replace提取部件号(子串)

I'm not very comfortable with RegEx.


The Use Case

I use three variables, namely $url, $pattern and $replacement and intend to use them as follows:

$url = $node->attr("href");

$resource = ExtractResourceWithoutHtmlExtension($url); // This is jus to abstract the stripping off of the prepended path and cutting the `.html` (see Edit 2 & 3 below).

$pattern =  ...
$replacement = ${1}; // Not very sure of this value

$partno = preg_replace($pattern, replacement, $resource);

echo '"'.$partno.'";"'.$node->attr("title").'";"'.$url.'"'."
";

The Part number and Resouce scheme mapping (string)

  • most of the time

35000-0295 => designation-of-the-products-as-slug-35000-0295

27021-0012 => designation-of-the-products-as-slug-27021-0012

  • or rarely

38811 => designation-of-the-products-as-slug-38811

  • last but not the least (edge case => nothing to extract)

  • In case of non availability of Part number, the Resource substring would be simply

designation-of-the-products-as-slug

I still prefer RegEx solution because there might be a variation in the length of number within the segments constituting the Part number.


The Question

What should I assign to $pattern and $replacement?


Edit 1 (for reference)

The substring designation-of-the-products-as-slug are mutable and path/to/ could be of any arbitrary depth.

Edit 2 (for reference)

On second thought I realise that there is no need to use RegEx for the whole URL path: http://path/to/ could be stripped of using parse_url, explode and array_pop. Edited accordingly my post.

Edit 3 (for reference)

The the complexity could also reduce by cutting the immutable trailing substring .html. Cf. @bloodyKnuckles's comment below. Post edited accordingly.

To start with I'd use a combination of parse_url and pathinfo to strip off extraneous bits from the string, then use preg_filter with a regex like /.*?(\d+[\d-]*)$/ to grab the last chunk of digits plus optional following hyphens and digits.


Example:

$urls = [
    "http://example.com/path/to/designation-of-the-products-as-slug-35000-0295.extension",
    "http://example.com/path/to/designation-of-the-products-as-slug-35000.html",
    "http://example.com/path/to/designation-of-the-products-as-slug.ext?foo=bar.baz"
];

$regex = '/.*?(\d+[\d-]*)$/';

foreach ($urls as $url) {
    $resource = pathinfo(parse_url($url, PHP_URL_PATH), PATHINFO_FILENAME);
    echo preg_filter($regex, '$1', $resource), "
";
}

Output:

35000-0295
35000