Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
159 views
in Technique[技术] by (71.8m points)

php - Database design for answer table in quiz app

I am designing a quiz app and i am stuck on how to design the answers table.

Assume i have the following tables:

User(user_id,...other columns)
Question(question_id,user_id,...other columns)
QuestionAnswers(question_id,answer_id... other columns)

Now what to do about the UserAnswers table? The structure I have in mind is this:

UserAnswers(user_id,question_id,answer_id,.. other columns)

The structure that I have made works great at start, but the performance starts degrading once I reach 10 million rows. Considering my app, if 10,000 questions are present, and there are 1000 users in the system and each user answers each of the question. I will easily get to 10 million rows, and as users and questions grow the table size will grow dramatically.

What is a better approach to store these answers?

Moreover, I designed the system in MySQL. Do you think the same table structure would work better in some other DBMS?

mysql> explain select count(*) from user_answer where question_id = 9845;
+----+-------------+-------------+------------+------+---------------+-------------+---------+-------+------+----------+-------------+
| id | select_type | table       | partitions | type | possible_keys | key         | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------------+------------+------+---------------+-------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | user_answer | NULL       | ref  | question_id   | question_id | 4       | const |  645 |   100.00 | Using index |
+----+-------------+-------------+------------+------+---------------+-------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)


mysql> explain select count(*) from user_answer;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)


mysql> select count(*) from user_answer;
+----------+
| count(*) |
+----------+
| 20042126 |
+----------+
1 row in set (11 min 30.33 sec)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

A general indexing concept is what is left-most in a key. Let's take for example the following key (whether or not it is primary is not the focus here)

key(a,b,c)

For queries such as

select region from myTable where c='Turkey'

the above key is not used. You may endure a tablescan.

For queries such as

select region from myTable where a=17 and c='Turkey'

The key is used up to the most left-most part used, which is a, as b is not in the query. So the key is useful, but not entirely useful. Meaning, at least it quickly gets you to the segmented a rows, but performs a where from there.

Let me say the above another way: In that query, it does not fully use the index to get to c. It knows b is not in the mix of the query, and does not magically jump over b to get to c while fully using the index. But at least the index is partially used.

This is why on thin index width such as ints, and with composites, I often create the second composite index "going the other way", as shown in this answer for Junction tables:

unique key(studentId,courseId,term), -- no duplicates allowed for the combo (note student can re-take it next term)
key (courseId,studentId),

ignore term in there for this discussion. Point being, those are thin ints (relatively low overhead). The second key will require overhead. So it comes at a cost, a cost I am willing to pay. But for queries going in the other direction, I am covered. Meaning, queries involving courseId without the studentId.

Note, my composite flip in the above is not a stellar one. It has often been pointed out to me that having it as shown causes unnecessary overhead. In particular, for the 2nd key, it should just be on courseId (a non-composite). If on the first key, for whatever reason I had term wedged into 2nd place, then it would be a valid example.

A better example would be

key (a,b,c),
key (c,b)

The above, among other things, would be useful for queries going against just c, and also b and c together. But not just b.

The Takeaway:

Resist the urge to splatter new indexes into your schema foolishly thinking they will be used. Especially for non-left-most columns not picked up in actual and frequent queries. And certainly not for those just mentioned and wider columns like varchar(100) times several flips in ordering in multiple indexes. All they do potentially is slow down the inserts and updates and offer, many times, zero performance gains in actual queries. So scrutinize it all.

All index choices come at a cost. Only you should make that determination for what is right for your system.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...