From My other question, Using REGEXP_SUBSTR with Strings Qualifier, I'm trying to decide which approach would be better to use.
The Resulting Data set should only show the strings before the delimters PLE
, #
and ALL
in proper order.
The Current Query already in the package is something like this (DDL and DML is in the Bottom of the Post):
SELECT DATA1
, DECODE(SIGN(0 - instr(DATA1, 'PLE')), -1, SUBSTR(DATA1, 1, instr(DATA1, 'PLE') - 1)) GET_DATA_TILL_FIRST_PLE
, DECODE(SIGN(0 - instr(DATA1, '#')), -1, SUBSTR(DATA1, 1, instr(DATA1, '#') - 1)) GET_DATA_TILL_FIRST_NUM_SIGN
, DECODE(SIGN(0 - instr(DATA1, 'ALL')), -1, SUBSTR(DATA1, 1, instr(DATA1, 'ALL') - 1)) GET_DATA_TILL_FIRST_ALL
, NVL(DECODE(SIGN(0 - instr(DATA1, 'PLE')), -1, SUBSTR(DATA1, 1, instr(DATA1, 'PLE') - 1), 0,
DECODE(SIGN(0 - instr(DATA1, '#')), -1, SUBSTR(DATA1, 1, instr(DATA1, '#') - 1), 0,
DECODE(SIGN(0 - instr(DATA1, 'ALL')), -1, SUBSTR(DATA1, 1, instr(DATA1, 'ALL') - 1), DATA1), DATA1), DATA1), DATA1) PUT_THEM_ALL_TOGETHER
FROM table_x;
Which Results into a Data Set below:
DATA1 | GET_DATA_TILL_FIRST_PLE | GET_DATA_TILL_FIRST_#_SIGN | GET_DATA_TILL_FIRST_ALL | PUT_THEM_ALL_TOGETHER
----------------------- | ----------------------- | --------------------------- | -------------------------- | ----------------------
STRING_EXAMPLE | STRING_EXAM | | | STRING_EXAM
TREE_OF_APPLES | TREE_OF_AP | | | TREE_OF_AP
FIRST_EXAMPLE | FIRST_EXAM | | | FIRST_EXAM
IMPLEMENTATION | IM | | | IM
PARIS | | | | PARIS
PLEONASM | | | | PLEONASM
XXXX 1 | | | | XXXX 1
XXXX YYYYYY 2 FFFFFFFFF | | | | XXXX YYYYYY 2 FFFFFFFFF
XXXX YYYYYY 5FFFFFFFFF | | | | XXXX YYYYYY 5FFFFFFFFF
OPOPOPOPO #09090 APPLE | OPOPOPOPO #09090 AP | OPOPOPOPO | OPOPOPOPO # | OPOPOPOPO #09090 AP
OPOPOPOPO BALL# | | OPOPOPOPO BALL | OPOPOPOPO B | OPOPOPOPO BALL
BALL IS #LIFE | | BALL IS | B | BALL IS
PS. I only need column PUT_THEM_ALL_TOGETHER
but i included the other columns as well do add context.
I find the query bit confusing and hard to read so i tried using REGEXP_SUBSTR
and with the suggestion of @vkp, i came up with the below query
that results into the same data set above.
SELECT DATA1
, REGEXP_SUBSTR(DATA1, '(.+?)PLE',1,1,null,1) GET_DATA_TILL_FIRST_PLE
, REGEXP_SUBSTR(DATA1, '(.+?)#',1,1,null,1) GET_DATA_TILL_FIRST_#_SIGN
, REGEXP_SUBSTR(DATA1, '(.+?)ALL',1,1,null,1) GET_DATA_TILL_FIRST_ALL
, COALESCE(REGEXP_SUBSTR(DATA1, '(.+?)PLE',1,1,null,1),
REGEXP_SUBSTR(DATA1, '(.+?)#',1,1,null,1),
REGEXP_SUBSTR(DATA1, '(.+?)ALL',1,1,null,1),
DATA1) PUT_THEM_ALL_TOGETHER
FROM table_x;
However, From @MathGuy's Answer, it seems that INSTR
and SUBSTR
is much more efficient.
i tested this to some extent and here's what i got:
Using INSTR
and SUBSTR
:
SET TIMING ON;
BEGIN
UPDATE table_x
SET DATA2 = NVL(DECODE(SIGN(0 - instr(DATA1, 'PLE')), -1, SUBSTR(DATA1, 1, instr(DATA1, 'PLE') - 1), 0,
DECODE(SIGN(0 - instr(DATA1, '#')), -1, SUBSTR(DATA1, 1, instr(DATA1, '#') - 1), 0,
DECODE(SIGN(0 - instr(DATA1, 'ALL')), -1, SUBSTR(DATA1, 1, instr(DATA1, 'ALL') - 1), DATA1), DATA1), DATA1), DATA1);
ROLLBACK;
END;
/
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.234
Using REGEXP_SUBSTR
:
SET TIMING ON;
BEGIN
UPDATE table_x
SET DATA2 = COALESCE(REGEXP_SUBSTR(DATA1, '(.+?)PLE',1,1,null,1)
,REGEXP_SUBSTR(DATA1, '(.+?)#',1,1,null,1)
,REGEXP_SUBSTR(DATA1, '(.+?)ALL',1,1,null,1)
,DATA1);
ROLLBACK;
END;
/
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.236
Although this is a very limited test data, it shows that the combination of INSTR
and SUBSTR
is a little bit faster than REGEXP_SUBSTR
.
Would it be negligible to use REGEXP_SUBSTR
instead of INSTR
and SUBSTR
for readability's sake?
DML and DDL:
create table table_x
(
data1 varchar2(100)
,data2 varchar2(100)
);
INSERT INTO table_x (DATA1) VALUES ('STRING_EXAMPLE');
INSERT INTO table_x (DATA1) VALUES ('TREE_OF_APPLES');
INSERT INTO table_x (DATA1) VALUES ('FIRST_EXAMPLE');
INSERT INTO table_x (DATA1) VALUES ('IMPLEMENTATION');
INSERT INTO table_x (DATA1) VALUES ('PARIS');
INSERT INTO table_x (DATA1) VALUES ('PLEONASM');
INSERT INTO table_x (DATA1) VALUES ('XXXX 1');
INSERT INTO table_x (DATA1) VALUES ('XXXX YYYYYY 2 FFFFFFFFF');
INSERT INTO table_x (DATA1) VALUES ('XXXX YYYYYY 5FFFFFFFFF');
INSERT INTO table_x (DATA1) VALUES ('OPOPOPOPO #09090 APPLE');
INSERT INTO table_x (DATA1) VALUES ('OPOPOPOPO BALL#');
INSERT INTO table_x (DATA1) VALUES ('BALL IS #LIFE');
Thank you.
See Question&Answers more detail:
os