I’ve been working on some MySQL stuff recently, specifically storing a snapshot of the price of bitcoin every 5 minutes from an online exchange. As you can imagine, the amount of data this can generate is quite large, and becomes more than I need. I mean, I don’t care what the price for Bitcoin was exactly 3 weeks, 2 days, 11 hours and 20 minutes ago was. I do care about averages and trends though.
So, I decided to aggregate my data to hourly averages, along with a min/max for each hour. Then I can delete any of the 5 minute interval data older than a week or so, and use the aggregated data to do my analysis and graphs.
In order to do this I wanted to run a query that would average all the entries within the current hour, and then update a row with these updated averages, along with min/max values. I also wanted the query to insert a new row if it was a new hour. After a little research I knew I needed to figure out INSERT INTO… ON DUPLICATE KEY UPDATE as my pattern.
Everything seemed to be going great, until I needed to store calculated fields in an updated row. I couldn’t get the SQL statement to actually update the row.
It turns out, when you use a SELECT to generate the data to be INSERTed, MySQL creates a temporary table with the data in it, and I couldn’t find the name of the table, so trying to update the values with entries from the SELECT would fail with ‘No such column’ error.
I fixed this by storing the calculated results in specific variables, and in turn using those variables to update the existing row.
INSERT INTO tickerHourly (ID, time, product_id, priceAvg, priceMax, priceMin, volumeAvg, counter) SELECT (select ID from tickerHourly WHERE product_id = ticker.product_id AND hour(time) = hour(UTC_TIMESTAMP()) and date(time) = date(UTC_TIMESTAMP())), FROM_UNIXTIME(60*60*ROUND(UNIX_TIMESTAMP(time)/(60*60))), product_id, @avg := AVG(price), @max := MAX(price), @min := MIN(price), @volumeAvg := AVG(volume), @counter := count(*) FROM ticker where hour(time) = hour(UTC_TIMESTAMP()) AND date(time) = date(UTC_TIMESTAMP()) AND product_id = ? group by date(time),hour(time),product_id ON DUPLICATE KEY UPDATE tickerHourly.counter = @counter, tickerHourly.priceAvg = @avg, tickerHourly.priceMax = @max, tickerHourly.priceMin = @min, tickerHourly.volumeAvg = @volumeAvg;
So the line ‘@avg := AVG(price)’ takes the average of all the price entries and stores it in the local variable avg, which I then use as part of the UPDATE clause. This was my missing information.