Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

sql - Performance considerations for temporary data in Oracle

I'm evaluating various options to run a bunch of high-performing queries against a single temporary data set in Oracle. In T-SQL, I'd probably use in-memory temporary tables, but Oracle doesn't have an exact equivalent of this feature.

I'm currently seeing these options:

1. Global temporary tables

CREATE GLOBAL TEMPORARY TABLE test_temp_t (
  n NUMBER(10), 
  s VARCHAR2(10)
) ON COMMIT DELETE ROWS; -- Other configurations are possible, too

DECLARE
  t test_t;
  n NUMBER(10);
BEGIN

  -- Replace this with the actual temporary data set generation
  INSERT INTO test_temp_t
  SELECT MOD(level, 10), '' || MOD(level, 12)
  FROM dual
  CONNECT BY level < 1000000;

  -- Replace this example query with more interesting statistics
  SELECT COUNT(DISTINCT t.n)
  INTO n 
  FROM test_temp_t t;

  DBMS_OUTPUT.PUT_LINE(n);
END;

Plan:

----------------------------------------------------
| Id  | Operation            | A-Rows |   A-Time   |
----------------------------------------------------
|   0 | SELECT STATEMENT     |      1 |00:00:00.27 |
|   1 |  SORT AGGREGATE      |      1 |00:00:00.27 |
|   2 |   VIEW               |     10 |00:00:00.27 |
|   3 |    HASH GROUP BY     |     10 |00:00:00.27 |
|   4 |     TABLE ACCESS FULL|    999K|00:00:00.11 |
----------------------------------------------------

2. Unnesting of PL/SQL table type variables

CREATE TYPE test_o AS OBJECT (n NUMBER(10), s VARCHAR2(10));
CREATE TYPE test_t AS TABLE OF test_o;

DECLARE
  t test_t;
  n NUMBER(10);
BEGIN

  -- Replace this with the actual temporary data set generation
  SELECT test_o(MOD(level, 10), '' || MOD(level, 12))
  BULK COLLECT INTO t
  FROM dual
  CONNECT BY level < 1000000;

  -- Replace this example query with more interesting statistics
  SELECT COUNT(DISTINCT n)
  INTO n 
  FROM TABLE(t) t;

  DBMS_OUTPUT.PUT_LINE(n);
END;

Plan:

------------------------------------------------------------------
| Id  | Operation                          | A-Rows |   A-Time   |
------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |      1 |00:00:00.68 |
|   1 |  SORT GROUP BY                     |      1 |00:00:00.68 |
|   2 |   COLLECTION ITERATOR PICKLER FETCH|    999K|00:00:00.22 |
------------------------------------------------------------------

3. Materialised views

I'm ruling them out for this use-case, because the temporary data set in question is rather complex and the implications on updating the materialised view would be too significant.

Real data considerations

The above are examples of what I'm trying to do. The real data sets involve:

  • The temp data is denormalised from around 15 joined tables.
  • It is produced around 2-20x / seconds.
  • The actual amount of rows per temp data set is around 10-200 (not as big as in the above example).
  • Each user of the system has their own temp data set (1M users overall, 10k concurrent users).
  • Once the data set is established, around 10-50 analytical queries should be run against it.
  • These analyses must be run online, i.e. they cannot be deferred to a batch job.

Questions

From my intuition, the temp table query "should" be slower because it (probably) involves I/O and disk access, whereas the PL/SQL collection query is a mere in-memory solution. But in my trivial benchmark, this is not the case as the temp table query beats the PL/SQL collection query by factor 3x. Why is this the case? Is there some PL/SQL <-> SQL context switch happening?

Do I have other options for fast (yet extensive) "in-memory" data analysis on a well-defined temporary data set? Are there any significant publicly available benchmarks comparing the various options?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Temporary tables are effectively the same as in-memory tables thanks to caching and asynchronous I/O, and the temporary table solution does not require any overhead for converting between SQL and PL/SQL.

Confirming the results

Comparing the two versions with RunStats, the temporary table version looks much worse. All that junk for the temporary table version in Run1, and only a little extra memory for the PL/SQL version in Run2. At first it seems like PL/SQL should be the clear winner.

Type  Name                              Run1 (temp) Run2 (PLSQL)         Diff
----- -------------------------------- ------------ ------------ ------------
...
STAT  physical read bytes                    81,920            0      -81,920
STAT  physical read total bytes              81,920            0      -81,920
LATCH cache buffers chains                  104,663          462     -104,201
STAT  session uga memory                    445,488      681,016      235,528
STAT  KTFB alloc space (block)            2,097,152            0   -2,097,152
STAT  undo change vector size             2,350,188            0   -2,350,188
STAT  redo size                           2,804,516            0   -2,804,516
STAT  temp space allocated (bytes)       12,582,912            0  -12,582,912
STAT  table scan rows gotten             15,499,845            0  -15,499,845
STAT  session pga memory                    196,608   19,857,408   19,660,800
STAT  logical read bytes from cache     299,958,272            0 -299,958,272

But at the end of the day only the wall clock time matters. Both the loading and the querying steps run much faster with temporary tables.

The PL/SQL version can be improved by replacing the BULK COLLECT with cast(collect(test_o(MOD(a, 10), '' || MOD(a, 12))) as test_t) INTO t. But it's still significantly slower than the temporary table version.

Optimized Reads

Reading from the small temporary table only uses the buffer cache, which is in memory. Run only the query part many times, and watch how the consistent gets from cache (memory) increase while the physical reads cache (disk) stay the same.

select name, value
from v$sysstat
where name in ('db block gets from cache', 'consistent gets from cache', 
'physical reads cache');

Optimized Writes

Ideally there would be no physical I/O, especially since the temporary table is ON COMMIT DELETE ROWS. And it sounds like the next version of Oracle may introduce such a mechanism. But it doesn't matter much in this case, the disk I/O does not seem to slow things down.

Run the load step multiple times, and then run select * from v$active_session_history order by sample_time desc;. Most of the I/O is BACKGROUND, which means nothing is waiting on it. I assume the temporary table internal logic is just a copy of regular DML mechanisms. In general, new table data may need to be written to disk, if it's committed. Oracle may start working on it, for example by moving data from the log buffer to disk, but there is no rush until there is an actual COMMIT.

Where does the PL/SQL time go?

I have no clue. Are there multiple context switches, or a single conversion between the SQL and PL/SQL engines? As far as I know none of the available metrics show the time spent on switching between SQL and PL/SQL.

We may never know exactly why PL/SQL code is slower. I don't worry about it too much. The general answer is, the vast majority of database work has to be done in SQL anyway. It would make a lot of sense if Oracle spent more time optimizing the core of their database, SQL, than the add-on language, PL/SQL.

Additional notes

For performance testing it can be helpful to remove the connect by logic into a separate step. That SQL is a great trick for loading data, but it can be very slow and resource intensive. It's more realistic to load a sample table once with that trick, and then insert from that table.

I tried using the new Oracle 12c feature, temporary undo, and the new 18c feature, private temporary tables. Neither one improved performance over regular temporary tables.

I wouldn't bet on it, but I can see a way that the results would completely change as the data gets larger. The log buffer and the buffer cache can only get so large. And eventually that background I/O could add up and overwhelm some processes, turning the BACKGROUND wait into a FOREGROUND wait. On the other hand, there's only so much PGA memory for the PL/SQL solution, and then things crash.

Finally, this partially confirms my skepticism of "in-memory databases". Caching is nothing new, databases have been doing it for decades.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...