MySQL海量数据操作处理

I am having the following problem processing big data in the database:

Basically all the metering from digital sensors are stored in the database for each second. What the reports should show from all that data is only the occurred changes, for example at time X the register #1 changed value from 0 to 1.

I have created a procedure that is able to return only the data i need (the changes), and that is saving me a lot of processing in php BUT the big problem is that for a current data of 4 days the query takes 6 * N seconds to complete where N is the number of the selected registers.

Now i was wondering what is the best solution to overcome this problem.

Another thought is to make a trigger on each new insert of the data metering but the problem is that this will be more complicated since i will need to look into the previous metering that were submitted at another time.

So I thought to create views that will be automatically updated when new data arrives in some way. That means when the request is made for the reports the data will be ready and fetched from the view.

Will this be a good solution?

Identifying status changes from your existing data is possible with a single query, but (as you have discovered) quite expensive. I would urge you to store each status change in a cache.

As @Fluffeh explained, looking up the latest status from your existing table won't be very expensive if you use a suitable index; so the trigger approach ought to be quite reasonable.

Therefore:

  1. Define a suitable index (if it does not already exist):

    ALTER TABLE existing_table ADD INDEX (register_id, timestamp);
    
  2. Create a table for the cache (and optionally set user permissions so that it cannot be directly modified by your application):

    CREATE TABLE status_changes VALUES (
      register_id ...,
      timestamp   TIMESTAMP,
      old_status  ...,
      new_status  ...,
    
      PRIMARY KEY                (register_id, timestamp),
    
      FOREIGN KEY                (register_id, timestamp, old_status)
       REFERENCES existing_table (register_id, timestamp, status),
    
      FOREIGN KEY                (register_id, timestamp, new_status)
       REFERENCES existing_table (register_id, timestamp, status)
    );
    
  3. Define a trigger from a user that has permission to modify the new table:

    DELIMITER ;;
    
    CREATE TRIGGER record_change AFTER INSERT ON existing_table FOR EACH ROW
    BEGIN
      DECLARE  _last_status ... ;
    
      SELECT   last.status
      INTO     _last_status
      FROM     existing_table AS last
      WHERE    last.register_id <=> NEW.register_id
           AND last.timestamp    <  NEW.timestamp
      ORDER BY last.timestamp DESC
      LIMIT    1;
    
      IF NOT NEW.status <=> _last_status THEN
        INSERT INTO status_changes (
          register_id,
          timestamp,
          old_status,
          new_status
        ) VALUES (
          NEW.register_id,
          NEW.timestamp,
          _last_status,
          NEW.status
        );
      END IF;
    END;;
    
    DELIMITER ;
    
  4. Populate the new table from the historical data:

    INSERT IGNORE INTO status_changes (
      register_id,
      timestamp,
      old_status,
      new_status
    )
    SELECT NEW.register_id,
           NEW.timestamp,
           (
             SELECT   last.status
             FROM     existing_table AS last
             WHERE    last.register_id <=> NEW.register_id
                  AND last.timestamp    <  NEW.timestamp
             ORDER BY last.timestamp DESC
             LIMIT    1
           ) AS _last_status,
           NEW.status
    FROM   existing_table AS NEW
    WHERE  NOT NEW.status <=> (
             SELECT   last.status
             FROM     existing_table AS last
             WHERE    last.register_id <=> NEW.register_id
                  AND last.timestamp    <  NEW.timestamp
             ORDER BY last.timestamp DESC
             LIMIT    1
           )
    ;
    

I am assuming that your tables are nicely indexed and that your queries are using those indexes nicely?

In this case, you seem to potentially benefit most from a composite index - one on both date and register. An index on each one will help, but a composite index on both will help much more.

The syntax to add a composite index is:

alter table yourTableName add index yourIndexName(col1, col2);

mysql> select * from table1;

+---------+------+------+-------------+
| autonum | ID   | name | metavalue   |
+---------+------+------+-------------+
|       1 |    1 | Rose | Drinker     |
|       2 |    1 | Rose | Nice Person |
|       3 |    1 | Rose | Runner      |
|       4 |    2 | Gary | Player      |
|       5 |    2 | Gary | Funny       |
|       6 |    2 | Gary | NULL        |
|       7 |    2 | Gary | Smelly      |
+---------+------+------+-------------+
7 rows in set (0.01 sec)

mysql> alter table table1 add index autoNumID(autonum, ID);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

The other thing you could look into is making a summary table that is updated one (per hour or per day etc). Use a CRON or something else to run a query which will create a summary of your data into a much smaller table that your reporting will work off.