Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
576 views
in Technique[技术] by (71.8m points)

optimization - How to optimize mysql indexes so that INSERT operations happen quickly on a large table with frequent writes and reads?

I have a table watchlist containing today almost 3Mil records.

mysql>  select count(*) from watchlist;
+----------+
| count(*) |
+----------+
|  2957994 |
+----------+

It is used as a log to record product-page-views on a large e-commerce site (50,000+ products). It records the productID of the viewed product, the IP address and USER_AGENT of the viewer. And a timestamp of when it happens:

mysql> show columns from watchlist;
+-----------+--------------+------+-----+-------------------+-------+
| Field     | Type         | Null | Key | Default           | Extra |
+-----------+--------------+------+-----+-------------------+-------+
| productID | int(11)      | NO   | MUL | 0                 |       |
| ip        | varchar(16)  | YES  |     | NULL              |       |
| added_on  | timestamp    | NO   | MUL | CURRENT_TIMESTAMP |       |
| agent     | varchar(220) | YES  | MUL | NULL              |       |
+-----------+--------------+------+-----+-------------------+-------+

The data is then reported on several pages throughout the site on both the back-end (e.g. checking what GoogleBot is indexing), and front-end (e.g. a side-bar box for "Recently Viewed Products" and a page showing users what "People from your region also liked" etc.).

So that these "report" pages and side-bars load quickly I put indexes on relevant fields:

mysql> show indexes from watchlist;
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table     | Non_unique | Key_name  | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| watchlist |          1 | added_on  |            1 | added_on    | A         |        NULL |     NULL | NULL   |      | BTREE      |         |
| watchlist |          1 | productID |            1 | productID   | A         |        NULL |     NULL | NULL   |      | BTREE      |         |
| watchlist |          1 | agent     |            1 | agent       | A         |        NULL |     NULL | NULL   | YES  | BTREE      |         |
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

Without the INDEXES, pages with the side-bar for example would spend about 30-45sec executing a query to get the 7 most-recent ProductIDs. With the indexes it takes <0.2sec.

The problem is that with the INDEXES the product pages themselves are taking longer and longer to load because as the table grows the write operations are taking upwards of 5sec. In addition there is a spike on the mysqld process amounting to 10-15% of available CPU each time a product page is viewed (roughly once every 2sec). We already had to upgrade the server hardware because on a previous server it was reaching 100% and caused mysqld to crash.

My plan is to attempt a 2-table solution. One table for INSERT operations, and another for SELECT operations. I plan to purge the INSERT table whenever it reaches 1000 records using a TRIGGER, and copy the oldest 900 records into the SELECT table. The report pages are a mixture of real-time (recently viewed) and analytics (which region), but the real-time pages tend to only need a handful of fresh records while the analytical pages don't need to know about the most recent trend (i.e. last 1000 views). So I can use the small table for the former and the large table for the latter reports.


My question: Is this an ideal solution to this problem?

Also: With TRIGGERS in MySQL is it possible to nice the trigger_statement so that it takes longer, but doesn't consume much CPU? Would running a cron job every 30min that is niced, and which performs the purging if required be a better solution?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Write operations for a single row into a data table should not take 5 seconds, regardless how big the table gets.

Is your clustered index based on the timestamp field? If not, it should be, so you're not writing into the middle of your table somewhere. Also, make sure you are using InnoDB tables - MyISAM is not optimized for writes.

I would propose writing into two tables: one long-term table, one short-term reporting table with little or no indexing, which is then dumped as needed.

Another solution would be to use memcached or an in-memory database for the live reporting data, so there's no hit on the production database.

One more thought: exactly how "live" must either of these reports be? Perhaps retrieving a new list on a timed basis versus once for every page view would be sufficient.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...