This is a classic use of a simple recursive common table expression (WITH RECURSIVE
), available in PostgreSQL 8.4 and later.
Demonstrated here: http://sqlfiddle.com/#!12/78e15/9
Given the sample data as SQL:
CREATE TABLE Table1
("ID1" text, "ID2" text)
;
INSERT INTO Table1
("ID1", "ID2")
VALUES
('vc1', 'vc2'),
('vc2', 'vc3'),
('vc3', 'vc4'),
('vc4', 'rc7')
;
You could write:
WITH RECURSIVE chain(from_id, to_id) AS (
SELECT NULL, 'vc2'
UNION
SELECT c.to_id, t."ID2"
FROM chain c
LEFT OUTER JOIN Table1 t ON (t."ID1" = to_id)
WHERE c.to_id IS NOT NULL
)
SELECT from_id FROM chain WHERE to_id IS NULL;
What this does is iteratively walk the chain, adding each row to the chain
table as from- and to-pointers. When it encounters a row for which the 'to' reference doesn't exist it will add a null 'to' reference for that row. The next iteration will notice that the 'to' reference is null and produce zero rows, which causes the iteration to end.
The outer query then picks up rows that've been determined to be the end of the chain by having a non-existent to_id.
It takes a bit of effort to get your head around recursive CTEs. They key things to understand are:
They start with the output of an initial query, which they repeatedly union with the output of the "recursive part" (the query after the UNION
or UNION ALL
) until the recursive part adds no rows. That stops iteration.
They aren't really recursive, more iterative, though they're good for the sorts of things you might use recursion for.
So you're basically building a table in a loop. You can't delete rows or change them, only add new ones, so you generally need an outer query that filters the results to get the result rows you want. You'll often add extra columns containing intermediate data that you use to track the state of the iteration, control stop-conditions, etc.
It can help to look at the unfiltered result. If I replace the final summary query with a simple SELECT * FROM chain
I can see the table that's been generated:
from_id | to_id
---------+-------
| vc2
vc2 | vc3
vc3 | vc4
vc4 | rc7
rc7 |
(5 rows)
The first row is the manually added starting point row, where you specify what you want to look up - in this case that was vc2
. Each subsequent row was added by the UNION
ed recursive term, which does a LEFT OUTER JOIN
on the previous result and returns a new set of rows that pair up the previous to_id
(now in the from_id
column) to the next to_id
. If the LEFT OUTER JOIN
doesn't match then the to_id
will be null, causing the next invocation to return now rows and end iteration.
Because this query doesn't attempt to add only the last row each time, it's actually repeating a fair bit of work each iteration. To avoid that you would need to use an approach more like Gordon's, but additionally filter on the previous depth field when you scanned input table, so you joined only the most recent row. In practice this usually isn't necessary, but it can be a concern for very big data sets or where you can't create appropriate indexes.
More can be learned in the PostgreSQL documentation on CTEs.