I have an app that needs to update a large amount of data over a large number of entries. Basically it does some 7,000 inserts and/or updates but it takes a looooong time (like almost 9 minutes... averaging about 0.08 seconds per query). Essentially I'm looking for general speed ups to making multiple such requests (I'm not expecting a specific answer to my vague example... that's just to, hopefully, help explain).
Here are some samples from profiling the requests:
SELECT `habitable_planets`.* FROM `habitable_planets` WHERE (timestamp = '2010-10-15T07:30:00-07:00') AND (planet_id = '2010_Gl_581_c')
INSERT INTO `habitable_planets` (`planet_id`, `timestamp`, `weather_air_temp`, `weather_cell_temp`, `weather_irradiance`, `weather_wind_float`, `biolumin_to_date`, `biolumin_detected`, `craft_energy_usage`, `craft_energy_consumed_to_date`) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
SELECT `habitable_planets`.* FROM `habitable_planets` WHERE (timestamp = '2010-10-15T07:45:00-07:00') AND (planet_id = '2010_Gl_581_c')
INSERT INTO `habitable_planets` (`planet_id`, `timestamp`, `weather_air_temp`, `weather_cell_temp`, `weather_irradiance`, `weather_wind_float`, `biolumin_to_date`, `biolumin_detected`, `craft_energy_usage`, `craft_energy_consumed_to_date`) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Repeat ad nauseam (well, about 7,000 times). This is an update that collects data generated at intervals over a 24 hour period and then makes a massive update to the database once per day. Given the limited bit I've shown you are there any suggestions for speeding this process up?
For example... would it make sense to, rather than do a select for each timestamp, do one select for a range all at once and then iterate over them in the script?
Vaguely like:
SELECT `habitable_planets`.* FROM `habitable_planets` WHERE (planet_id = '2010_Gl_581_c')
assign that result to $foo
and then do:
foreach ($foo as $bar)
{
if ($bar['timestamp'] == $baz) // where $baz is the needed timestamp
{
// do the insert here
}
}
EDIT: To add a bit to this, one thing that improved responsiveness in my situation was to change a bunch of code that checked for a record existing and either did an insert or an update depending on the result into using an INSERT... ON DUPLICATE KEY UPDATE
sql query. This resulted in about a 30% speed gain in my particular case because it cut at least one trip to the database out of the equation and over thousands of requests this really adds up.
See Question&Answers more detail:
os