Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
476 views
in Technique[技术] by (71.8m points)

select - What's the canonical way to pull a record from a MySQL database that has a least/greatest field?

Say you're sitting at home on a Sunday afternoon and you want to know, from your database, the student with the lowest class ranking. Suppose your database table looks like this:

+----------------+-----------+------------+------------+----------------------------+------+-------+
| uid            | last_name | first_name | dob        | email                      | rank | grade |
+----------------+-----------+------------+------------+----------------------------+------+-------+
| 13428700000001 | Smith     | John       | 1990-12-03 | [email protected]              |   99 |     4 |
| 13428721960000 | Li        | Kai Li     | 1979-02-15 | [email protected]           |   12 |     2 |
| 13428722180001 | Zhang     | Xi Xiong   | 1993-11-09 | [email protected]                |    5 |     5 |
| 13428739950000 | Zhou      | Ji Hai     | 1991-06-06 | [email protected]                |  234 |     1 |
| 13428739950001 | Pan       | Yao        | 1992-05-12 | [email protected]              |   43 |     2 |
| 13428740010001 | Jin       | Denny      | 1994-06-02 | [email protected]              |  198 |     3 |
| 13428740010002 | Li        | Fonzie     | 1991-02-02 | [email protected]            |   75 |     3 |
| 13428743370000 | Ma        | Haggar     | 1991-08-16 | [email protected]            |   47 |     4 |
| 13428743590001 | Ren       | Jenny      | 1990-03-29 | [email protected]              |    5 |     2 |
| 13428774040000 | Chen      | Dragon     | 1999-04-12 | [email protected]                |   23 |     5 |
| 13428774260001 | Wang      | Doctor     | 1996-09-30 | [email protected]              |    1 |     5 |
| 13430100000000 | Chanz     | Heyvery    | 1994-04-04 | [email protected]              |  107 |     2 |
+----------------+-----------+------------+------------+----------------------------+------+-------+

I can do this via a SELECT * FROM students WHERE rank = (SELECT MAX(rank) FROM students); but is this the easiest, most common way to do it?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This way is also not uncommon:

SELECT s1.*
FROM students s1
LEFT JOIN students s2 ON s1.rank < s2.rank
WHERE s2.uid IS NULL;

The LEFT JOIN works on the basis that when s1.rank is at its maximum value, there is no s2.rank with a greater value and the s2 rows values will be NULL.

But I'd say that your way of doing it is the most common, easiest to understand way of doing it, yes.

EDIT: On the question why it is slower sometimes:

The performance of this query depends on "how careful it is written". Taken your data as example:

drop table if exists students;
CREATE TABLE students
    (`uid` bigint, `last_name` varchar(5), `first_name` varchar(8), `dob` varchar(10), `email` varchar(16), `rank` int, `grade` int)
;

INSERT INTO students
    (`uid`, `last_name`, `first_name`, `dob`, `email`, `rank`, `grade`)
VALUES
    (13428700000001, 'Smith', 'John', '1990-12-03', '[email protected]', 99, 4),
    (13428721960000, 'Li', 'Kai Li', '1979-02-15', '[email protected]', 12, 2),
    (13428722180001, 'Zhang', 'Xi Xiong', '1993-11-09', '[email protected]', 5, 5),
    (13428739950000, 'Zhou', 'Ji Hai', '1991-06-06', '[email protected]', 234, 1),
    (13428739950001, 'Pan', 'Yao', '1992-05-12', '[email protected]', 43, 2),
    (13428740010001, 'Jin', 'Denny', '1994-06-02', '[email protected]', 198, 3),
    (13428740010002, 'Li', 'Fonzie', '1991-02-02', '[email protected]', 75, 3),
    (13428743370000, 'Ma', 'Haggar', '1991-08-16', '[email protected]', 47, 4),
    (13428743590001, 'Ren', 'Jenny', '1990-03-29', '[email protected]', 5, 2),
    (13428774040000, 'Chen', 'Dragon', '1999-04-12', '[email protected]', 23, 5),
    (13428774260001, 'Wang', 'Doctor', '1996-09-30', '[email protected]', 1, 5),
    (13430100000000, 'Chanz', 'Heyvery', '1994-04-04', '[email protected]', 107, 2)
;

The explain of your query looks like this:

| ID | SELECT_TYPE |    TABLE | TYPE | POSSIBLE_KEYS |    KEY | KEY_LEN |    REF | ROWS |       EXTRA |
-------------------------------------------------------------------------------------------------------
|  1 |     PRIMARY | students |  ALL |        (null) | (null) |  (null) | (null) |   12 | Using where |
|  2 |    SUBQUERY | students |  ALL |        (null) | (null) |  (null) | (null) |   12 |             |

The one from my query like this:

| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS |    KEY | KEY_LEN |    REF | ROWS |       EXTRA |
----------------------------------------------------------------------------------------------------
|  1 |      SIMPLE |    s1 |  ALL |        (null) | (null) |  (null) | (null) |   12 |             |
|  1 |      SIMPLE |    s2 |  ALL |        (null) | (null) |  (null) | (null) |   12 | Using where |

Almost the same. Neither query uses an index, all rows are scanned. Now we're adding an index on column rank.

drop table if exists students;
CREATE TABLE students
    (`uid` bigint, `last_name` varchar(5), `first_name` varchar(8), `dob` varchar(10), `email` varchar(16), `rank` int, `grade` int
    , key rankkey(rank)
    )
;

The explain from your query:

| ID | SELECT_TYPE |    TABLE |   TYPE | POSSIBLE_KEYS |     KEY | KEY_LEN |    REF |   ROWS |                        EXTRA |
-----------------------------------------------------------------------------------------------------------------------------
|  1 |     PRIMARY | students |    ref |       rankkey | rankkey |       5 |  const |      1 |                  Using where |
|  2 |    SUBQUERY |   (null) | (null) |        (null) |  (null) |  (null) | (null) | (null) | Select tables optimized away |

versus mine:

| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS |    KEY | KEY_LEN |    REF | ROWS |       EXTRA |
----------------------------------------------------------------------------------------------------
|  1 |      SIMPLE |    s1 |  ALL |        (null) | (null) |  (null) | (null) |   12 |             |
|  1 |      SIMPLE |    s2 |  ALL |       rankkey | (null) |  (null) | (null) |   12 | Using where |

Your query uses the index, mine doesn't.

Now we're adding a primary key to the table.

drop table if exists students;
CREATE TABLE students
    (`uid` bigint, `last_name` varchar(5), `first_name` varchar(8), `dob` varchar(10), `email` varchar(16), `rank` int, `grade` int
    , key rankkey(rank)
    , primary key(uid)
    );

Explain from your query:

| ID | SELECT_TYPE |    TABLE |   TYPE | POSSIBLE_KEYS |     KEY | KEY_LEN |    REF |   ROWS |                        EXTRA |
-----------------------------------------------------------------------------------------------------------------------------
|  1 |     PRIMARY | students |    ref |       rankkey | rankkey |       5 |  const |      1 |                  Using where |
|  2 |    SUBQUERY |   (null) | (null) |        (null) |  (null) |  (null) | (null) | (null) | Select tables optimized away |

and from mine:

| ID | SELECT_TYPE | TABLE |  TYPE | POSSIBLE_KEYS |     KEY | KEY_LEN |    REF | ROWS |                                EXTRA |
-------------------------------------------------------------------------------------------------------------------------------
|  1 |      SIMPLE |    s1 |   ALL |        (null) |  (null) |  (null) | (null) |   12 |                                      |
|  1 |      SIMPLE |    s2 | index |       rankkey | rankkey |       5 | (null) |   12 | Using where; Using index; Not exists |

This way they are most likely equally fast. And this is how the query and the table is usually built. Every table should have a primary key and if you're running a query filtering on the rank column very often you should of course have an index on it. So there's almost no difference. It all depends now on how much rows you have in your table, if it's a unique index and/or a clustered one. But that would lead now a bit too far. But note, that in this example there's a difference in how much rows are examined. With small data there's no difference, with large data volumes there sure is. But(!) this behaviour might change for both queries, depending on the index.

What if the one who writes the query makes a mistake? What if he writes it like this:

SELECT s1.*
FROM students s1
LEFT JOIN students s2 ON s1.rank < s2.rank
WHERE s2.last_name IS NULL;

The query still works and is valid, but

| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS |    KEY | KEY_LEN |    REF | ROWS |       EXTRA |
----------------------------------------------------------------------------------------------------
|  1 |      SIMPLE |    s1 |  ALL |        (null) | (null) |  (null) | (null) |   12 |             |
|  1 |      SIMPLE |    s2 |  ALL |       rankkey | (null) |  (null) | (null) |   12 | Using where |

again the index is not used.

What if we remove the primary key again and write the query like this:

SELECT s1.*
FROM students s1
LEFT JOIN students s2 ON s1.rank < s2.rank
WHERE s2.rank IS NULL;

| ID | SELECT_TYPE | TABLE |  TYPE | POSSIBLE_KEYS |     KEY | KEY_LEN |    REF | ROWS |                    EXTRA |
-------------------------------------------------------------------------------------------------------------------
|  1 |      SIMPLE |    s1 |   ALL |        (null) |  (null) |  (null) | (null) |   12 |                          |
|  1 |      SIMPLE |    s2 | index |       rankkey | rankkey |       5 | (null) |   12 | Using where; Using index |

Index is used again.

Conclusion: Both queries should run equally fast, if done right. Yours is fast as long as an index is on rank column. Same applies for mine if written with indexes in mind.

Hope this helps.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...