I am having the following problem processing big data in the database:
Basically all the metering from digital sensors are stored in the database for each second. What the reports should show from all that data is only the occurred changes, for example at time X the register #1 changed value from 0 to 1.
I have created a procedure that is able to return only the data i need (the changes), and that is saving me a lot of processing in php BUT the big problem is that for a current data of 4 days the query takes 6 * N seconds to complete where N is the number of the selected registers.
Now i was wondering what is the best solution to overcome this problem.
Another thought is to make a trigger on each new insert of the data metering but the problem is that this will be more complicated since i will need to look into the previous metering that were submitted at another time.
So I thought to create views that will be automatically updated when new data arrives in some way. That means when the request is made for the reports the data will be ready and fetched from the view.
Will this be a good solution?
Identifying status changes from your existing data is possible with a single query, but (as you have discovered) quite expensive. I would urge you to store each status change in a cache.
As @Fluffeh explained, looking up the latest status from your existing table won't be very expensive if you use a suitable index; so the trigger approach ought to be quite reasonable.
Therefore:
Define a suitable index (if it does not already exist):
ALTER TABLE existing_table ADD INDEX (register_id, timestamp);
Create a table for the cache (and optionally set user permissions so that it cannot be directly modified by your application):
CREATE TABLE status_changes VALUES (
register_id ...,
timestamp TIMESTAMP,
old_status ...,
new_status ...,
PRIMARY KEY (register_id, timestamp),
FOREIGN KEY (register_id, timestamp, old_status)
REFERENCES existing_table (register_id, timestamp, status),
FOREIGN KEY (register_id, timestamp, new_status)
REFERENCES existing_table (register_id, timestamp, status)
);
Define a trigger from a user that has permission to modify the new table:
DELIMITER ;;
CREATE TRIGGER record_change AFTER INSERT ON existing_table FOR EACH ROW
BEGIN
DECLARE _last_status ... ;
SELECT last.status
INTO _last_status
FROM existing_table AS last
WHERE last.register_id <=> NEW.register_id
AND last.timestamp < NEW.timestamp
ORDER BY last.timestamp DESC
LIMIT 1;
IF NOT NEW.status <=> _last_status THEN
INSERT INTO status_changes (
register_id,
timestamp,
old_status,
new_status
) VALUES (
NEW.register_id,
NEW.timestamp,
_last_status,
NEW.status
);
END IF;
END;;
DELIMITER ;
Populate the new table from the historical data:
INSERT IGNORE INTO status_changes (
register_id,
timestamp,
old_status,
new_status
)
SELECT NEW.register_id,
NEW.timestamp,
(
SELECT last.status
FROM existing_table AS last
WHERE last.register_id <=> NEW.register_id
AND last.timestamp < NEW.timestamp
ORDER BY last.timestamp DESC
LIMIT 1
) AS _last_status,
NEW.status
FROM existing_table AS NEW
WHERE NOT NEW.status <=> (
SELECT last.status
FROM existing_table AS last
WHERE last.register_id <=> NEW.register_id
AND last.timestamp < NEW.timestamp
ORDER BY last.timestamp DESC
LIMIT 1
)
;
I am assuming that your tables are nicely indexed and that your queries are using those indexes nicely?
In this case, you seem to potentially benefit most from a composite index - one on both date and register. An index on each one will help, but a composite index on both will help much more.
The syntax to add a composite index is:
alter table yourTableName add index yourIndexName(col1, col2);
mysql> select * from table1;
+---------+------+------+-------------+
| autonum | ID | name | metavalue |
+---------+------+------+-------------+
| 1 | 1 | Rose | Drinker |
| 2 | 1 | Rose | Nice Person |
| 3 | 1 | Rose | Runner |
| 4 | 2 | Gary | Player |
| 5 | 2 | Gary | Funny |
| 6 | 2 | Gary | NULL |
| 7 | 2 | Gary | Smelly |
+---------+------+------+-------------+
7 rows in set (0.01 sec)
mysql> alter table table1 add index autoNumID(autonum, ID);
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
The other thing you could look into is making a summary table that is updated one (per hour or per day etc). Use a CRON or something else to run a query which will create a summary of your data into a much smaller table that your reporting will work off.