Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
193 views
in Technique[技术] by (71.8m points)

sql - Oracle <> , != , ^= operators

I want to know the difference of those operators, mainly their performance difference.

I have had a look at Difference between <> and != in SQL, it has no performance related information.

Then I found this on dba-oracle.com, it suggests that in 10.2 onwards the performance can be quite different.

I wonder why? does != always perform better then <>?

NOTE: Our tests, and performance on the live system shows, changing from <> to != has a big impact on the time the queries return in. I am here to ask WHY this is happening, not whether they are same or not. I know semantically they are, but in reality they are different.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I have tested the performance of the different syntax for the not equal operator in Oracle. I have tried to eliminate all outside influence to the test.

I am using an 11.2.0.3 database. No other sessions are connected and the database was restarted before commencing the tests.

A schema was created with a single table and a sequence for the primary key

CREATE TABLE loadtest.load_test (
  id NUMBER NOT NULL,
  a VARCHAR2(1) NOT NULL,
  n NUMBER(2) NOT NULL,
  t TIMESTAMP NOT NULL
);

CREATE SEQUENCE loadtest.load_test_seq
START WITH 0
MINVALUE 0;

The table was indexed to improve the performance of the query.

ALTER TABLE loadtest.load_test
ADD CONSTRAINT pk_load_test
PRIMARY KEY (id)
USING INDEX;

CREATE INDEX loadtest.load_test_i1
ON loadtest.load_test (a, n);

Ten million rows were added to the table using the sequence, SYSDATE for the timestamp and random data via DBMS_RANDOM (A-Z) and (0-99) for the other two fields.

SELECT COUNT(*) FROM load_test;

COUNT(*)
----------
10000000

1 row selected.

The schema was analysed to provide good statistics.

EXEC DBMS_STATS.GATHER_SCHEMA_STATS(ownname => 'LOADTEST', estimate_percent => NULL, cascade => TRUE);

The three simple queries are:-

SELECT a, COUNT(*) FROM load_test WHERE n <> 5 GROUP BY a ORDER BY a;

SELECT a, COUNT(*) FROM load_test WHERE n != 5 GROUP BY a ORDER BY a;

SELECT a, COUNT(*) FROM load_test WHERE n ^= 5 GROUP BY a ORDER BY a;

These are exactly the same with the exception of the syntax for the not equals operator (not just <> and != but also ^= )

First each query is run without collecting the result in order to eliminate the effect of caching.

Next timing and autotrace were switched on to gather both the actual run time of the query and the execution plan.

SET TIMING ON

SET AUTOTRACE TRACE

Now the queries are run in turn. First up is <>

> SELECT a, COUNT(*) FROM load_test WHERE n <> 5 GROUP BY a ORDER BY a;

26 rows selected.

Elapsed: 00:00:02.12

Execution Plan
----------------------------------------------------------
Plan hash value: 2978325580

--------------------------------------------------------------------------------------
| Id  | Operation             | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |              |    26 |   130 |  6626   (9)| 00:01:20 |
|   1 |  SORT GROUP BY        |              |    26 |   130 |  6626   (9)| 00:01:20 |
|*  2 |   INDEX FAST FULL SCAN| LOAD_TEST_I1 |  9898K|    47M|  6132   (2)| 00:01:14 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("N"<>5)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      22376  consistent gets
      22353  physical reads
          0  redo size
        751  bytes sent via SQL*Net to client
        459  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         26  rows processed

Next !=

> SELECT a, COUNT(*) FROM load_test WHERE n != 5 GROUP BY a ORDER BY a;

26 rows selected.

Elapsed: 00:00:02.13

Execution Plan
----------------------------------------------------------
Plan hash value: 2978325580

--------------------------------------------------------------------------------------
| Id  | Operation             | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |              |    26 |   130 |  6626   (9)| 00:01:20 |
|   1 |  SORT GROUP BY        |              |    26 |   130 |  6626   (9)| 00:01:20 |
|*  2 |   INDEX FAST FULL SCAN| LOAD_TEST_I1 |  9898K|    47M|  6132   (2)| 00:01:14 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("N"<>5)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      22376  consistent gets
      22353  physical reads
          0  redo size
        751  bytes sent via SQL*Net to client
        459  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         26  rows processed

Lastly ^=

> SELECT a, COUNT(*) FROM load_test WHERE n ^= 5 GROUP BY a ORDER BY a;

26 rows selected.

Elapsed: 00:00:02.10

Execution Plan
----------------------------------------------------------
Plan hash value: 2978325580

--------------------------------------------------------------------------------------
| Id  | Operation             | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |              |    26 |   130 |  6626   (9)| 00:01:20 |
|   1 |  SORT GROUP BY        |              |    26 |   130 |  6626   (9)| 00:01:20 |
|*  2 |   INDEX FAST FULL SCAN| LOAD_TEST_I1 |  9898K|    47M|  6132   (2)| 00:01:14 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("N"<>5)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      22376  consistent gets
      22353  physical reads
          0  redo size
        751  bytes sent via SQL*Net to client
        459  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         26  rows processed

The execution plan for the three queries is identical and the timings 2.12, 2.13 and 2.10 seconds.

It should be noted that whichever syntax is used in the query the execution plan always displays <>

The tests were repeated ten times for each operator syntax. These are the timings:-

<>

2.09
2.13
2.12
2.10
2.07
2.09
2.10
2.13
2.13
2.10

!=

2.09
2.10
2.12
2.10
2.15
2.10
2.12
2.10
2.10
2.12

^=

2.09
2.16
2.10
2.09
2.07
2.16
2.12
2.12
2.09
2.07

Whilst there is some variance of a few hundredths of the second it is not significant. The results for each of the three syntax choices are the same.

The syntax choices are parsed, optimised and are returned with the same effort in the same time. There is therefore no perceivable benefit from using one over another in this test.

"Ah BC", you say, "in my tests I believe there is a real difference and you can not prove it otherwise".

Yes, I say, that is perfectly true. You have not shown your tests, query, data or results. So I have nothing to say about your results. I have shown that, with all other things being equal, it doesn't matter which syntax you use.

"So why do I see that one is better in my tests?"

Good question. There a several possibilities:-

  1. Your testing is flawed (you did not eliminate outside factors - other workload, caching etc You have given no information about which we can make an informed decision)
  2. Your query is a special case (show me the query and we can discuss it).
  3. Your data is a special case (Perhaps - but how - we don't see that either).
  4. There is some other outside influence.

I have shown via a documented and repeatable process that there is no benefit to using one syntax over another. I believe that <> != and ^= are synonymous.

If you believe otherwise fine, so

a) show a documented example that I can try myself

and

b) use the syntax which you think is best. If I am correct and there is no difference it won't matter. If you are correct then cool, you have an improvement for very little work.

"But Burleson said it was better and I trust him more than you, Faroult, Lewis, Kyte and all those other bums."

Did he say it was better? I don't think so. He didn't provide any definitive example, test or result but only linked to someone saying that != was better and then quoted some of their post.

Show don't tell.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...