MySQL:使用查询中的特殊正则表达式规则修复时间格式

I have one wierd issue regarding time field inside MySQL.

In database we have 2 fields: date and time. Data looks like this:

+-----------------------+
|   date   |    time    |
+-----------------------+
|27/04/2017|    11:30   |
+-----------------------+
|01/05/2017|    20u     |
+-----------------------+
|02/05/2017|    20u30   |
+-----------------------+
|03/05/2017|    21h     |
+-----------------------+

What this data mean? Well:

  • 20u is 20:00
  • 20u30 is 20:30
  • 21h is 21:00

Main problem is that I MUST keep all this data untouched for some reasons and must find solution to display data properly and from this grab unix timestamp or proper date or match proper dates.

For you to understand what I need, some old part of Query what is used look like this:

UNIX_TIMESTAMP(CONCAT(DATE_FORMAT(STR_TO_DATE(`date`,'%d/%m/%Y'),'%Y-%m-%d'),' ',`time`))

But that fail inside matching what is normal.

I wrote one PHP function for displaying proper time on some parts of code and looks like this:

function bo_time($string)
{   
    if(preg_match("/(\d{1,2})(u|h)/Ui",$string, $match))
    {
        $string = sprintf("%s:%s", sprintf('%02d', $match[1]), '00' );
    }
    else if(preg_match("/(\d{1,2})(u|h|\:)(\d{2})/Ui",$string, $match))
    {
        $string = sprintf("%s:%s", sprintf('%02d', $match[1]), ( isset($match[3]) ? sprintf('%02d', $match[3]) : '00' ));
    }

    return $string;
}

With that PHP function some parts of system works fine but in some parts of system I have matching between tables by date/time and everything fail and break apart.

My main question is:

Is there a way to wrote some regex for converting time inside MySQL query like I do inside PHP?

Thanks!

I have created a query which seems to work, at least for the data you provided us. It only required some logic to handle the various edge cases with your data. I transform your time data into actual timestamp time strings, then concatenate with the date, also converted into an ISO format. With a proper date and time in hand, we can then call UNIX_TIMESTAMP on this timestamp string to get the data you want.

SELECT
    date, time, UNIX_TIMESTAMP(CONCAT(date, ' ', time)) AS ts
FROM
(
    SELECT
        CONCAT(RIGHT(date, 4), '-', SUBSTRING(date, 4, 2), '-',
               LEFT(date, 2)) AS date,
        CASE WHEN time REGEXP '.*[uh]$'
             THEN CONCAT(LEFT(time, CHAR_LENGTH(time) - 1), ':00:00')
             ELSE CONCAT(REPLACE(REPLACE(time, 'h', ':'),
                         'u', ':'), ':00') END AS time
    FROM yourTable
) t;

Output:

enter image description here

Demo here:

Rextester

There is no regex function out of the box for MySQL.

If you must leave all existing data untouched, can you create a new field with a proper datetime format and start using that instead?

You can continue to use your PHP function to transform the data and seeing as it fails in some cases you may want to fix those issues first.

I was able to do what your PHP function does in less lines with:

// $time = 20u30
preg_replace(['/[uh]/i','/:(?![0-9]+)/'], [':',':00'], $time);

Firstly, it seems completely unreasonable to leave this representation in the DB. It seems that there is a canonical representation for the time column (e.g. 20:11) and that you could adjust each field to only have that representation. If you did that, you wouldn't have to create hacks to work around this representation nonsense.

Now, concerning the steps for a workaround that you asked for:

  • Firstly, it seems that if you leave off the minutes, they are implicitly "00", so "20u" is "20u00". Now, if you just add two zeroes and then truncate after the fifth character, you will end up with a normalized tuple. E.g. "20u" -> "20u00" -> "20u00" or "20:00" -> "20:0000" -> "20:00".
  • Secondly, it seems that the separator between the hours and minutes is either a colon or the Latin small letter u, so "20:15" is "20u15". That means that if you just erase this letter (concat the first, second, fourth and fifth letter), you would get a canonical representation. E.g. "20:30" -> "2030" or "20u30" -> "2030".

As additional advantage, this doesn't even try to use regular expressions, which are expensive and error-prone. However, it relies on the column content being exactly what you wrote without exceptions.