变形字符串到模式

I'm working on an issue where users (truck drivers in this case) use SMS to send in information about work status. I want to keep the keying simple as not all users have smart phones so I have adopted some simple short codes for their input. Here are some examples and their meanings:

  • P#123456-3 (This is for picking up load 123456-3)
  • D#456789-1 (For the dropping of load 456789-1)
  • L#345678-9 (Load 345678-9 is going to be late)

This is pretty simple but users (and truck drivers) being what they are will key the updates in somewhat deviant manners such as:

  • #D 456789-1
  • D# 456789 - 1
  • D#.456789-1 This load looks wet to me do weneed to cancelthis order

You can pretty much come up with a dozen other permutations and it's not hard for me to catch and fix those that I can imagine.

I mostly use regular expressions to test the input against all my imagined "bad" patterns and then extract what I assume are the good parts, reassembling them into the correct order.

It's the new errors that cause me problems so I got to wondering if there was a more generic method where I can pass a "pattern" and a "message" to a function that would do it's best to turn the "message" into something matching the "pattern".

My searches have not found anything that really fits what I'm trying to do and I'm not even sure if there is a good general way to do this. I happen to be using PHP for this implementation but any type of example should help. Do any of you have a method?

Try something like this:

function parse($input) {
    // Clean up your input: 'D#.456789 - 1 foo bar' to 'D 456789 1 foo far'
    $clean = trim(preg_replace('/\W+/', ' ', $input));
    // Take first 3 words.
    list($status, $loadId1, $loadId2) = explode(' ', $clean);
    // Glue back your load ID to '456789-1'
    $loadId = $loadId1 . '-' . $loadId2;
    return compact('status', 'loadId');
}

Example:

$inputs = array(
    'P#123456-3',
    '#D 456789-1',
    'D# 456789 - 1',
    'D#.456789-1 This load looks wet to me do weneed to cancelthis order',
);
echo '<pre>';
foreach ($inputs as $s) {
    print_r(parse($s));
}

Output:

Array
(
    [status] => P
    [loadId] => 123456-3
)
Array
(
    [status] => D
    [loadId] => 456789-1
)
Array
(
    [status] => D
    [loadId] => 456789-1
)
Array
(
    [status] => D
    [loadId] => 456789-1
)

If the user has problems with your software, fix the software, not the user!

The problem arises because your format looks unnecessary complicated. Why do you need the hash in the first place? How about simplifying it down to the following:

 operation-code maybe-space load-number maybe-space and comment

Operation codes are assigned to different phone keys, so that J, K and L mean the same thing. Load-numbers can be sent as digits and as letters as well, e.g. agja means 2452. It's hard for the user to make a mistake using this format.

Here's some code to illustrate this approach:

function parse($msg) {

    $codes = array(
        3 => 'DROP',
        5 => 'LOAD',
        // etc
    );

    preg_match('~(\S)\s*(\S+)(\s+.+)?~', $msg, $m);
    if(!$m)
        return null; // cannot parse

    $a = '.,"?!abcdefghijklmnopqrstuvwxyz';
    $d = '1111122233344455566677777888999';

    return array(
        'opcode'  => $codes[strtr($m[1], $a, $d)],
        'load'    => intval(strtr($m[2], $a, $d)),
        'comment' => isset($m[3]) ? trim($m[3]) : ''
    );
}

print_r(parse(' j ww03 This load looks wet to me'));
//[opcode] => LOAD
//[load] => 9903
//[comment] => This load looks wet to me

print_r(parse('dxx0123'));
//[opcode] => DROP
//[load] => 990123
//[comment] => 

Something like

/^[#\s]*([PDL])[#\s]*(\d+[\s-]+\d)/

or to be even more relaxed,

/^[^\d]*([PDL])[^\d]*(\d+)[^\d]+(\d)/

would get you what you want. But I'd prefer HamZa's comment as a solution: throw it back and tell them to get their act together :)

First, remove stuff that shouldn't be there:

$str = preg_replace('/[^PDL\d-]/i', '', $str);

That gives you the following normalised results:

D456789-1
D456789-1
D456789-1ldlddld

Then, attempt to match the data you want:

if (preg_match('/^([PDL])(\d+-\d)/i', $str, $match)) {
    $code = $match[1];
    $load = $match[2];
} else {
    // uh oh, something wrong with the format!
}