Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
242 views
in Technique[技术] by (71.8m points)

relational database - What is the difference between theta join and inner join?

theta join and inner join look identical to me: they are Cartesian product followed by an arbitrary selection. Or am I missing their differences? Thanks.

The above wikipedia link says a theta-join takes a comparison & two attributes. But that is not correct. Instead, a theta-join can take any selection condition.

From Database System Concept which is supposed to follow the SQL standard and more coherent and reliable than wikipedia:

The theta join operation is a variant of the natural-join operation that allows us to combine a selection and a Cartesian product into a single operation. Consider relations r(R) and s(S), and let theta be a predicate on attributes in the schema R ∪ S. The theta join operation r join_theta s is de?ned as follows:

r join_theta s = sigma_theta(r × s)

....

the join operations we studied earlier that do not preserve nonmatched tuples are called inner join operations, to distinguish them from the outer-join operations.

and I still feel they are the same concept.

Difference between a theta join, equijoin and natural join doesn't explain the difference between a theta join and an inner join.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There is no single "relational algebra". They differ even in what a relation is. Codd originally defined theta join as taking a binary operator (the theta) & two attributes. That is what people usually mean by the term. From Codd's 1992 book The Relational Model for Database Management Version 2:

RB-14-RB-23 The Theta-Join Operator

The theta-join operator employs two R-tables as its operands. It generates as a result an R-table that contains rows of one operand (say S) concatenated with rows of the second operand (say T), but only where the specified condition is found to hold true. For brevity, this operator is often referred to as join.

The condition expressed in the join operator involves comparing each value from a column of S with each value from a column of T. The columns to be compared are indicated explicitly in the join command; these columns are called the comparand columns. This condition can involve any of the 10 comparators cited in the list [...]

(Often the term is used for an operator like that but where a value can take the place of an attribute.)

But Codd contradicts himself there, because there are "two operands" but also "the specified condition". Later he contradicts himself again, saying there are a bunch of joins collectively called the theta-joins, and equi-join is one of them:

RB-25 The Natural Join Operator

As described in the last section, an equi-join generates a result in which two of the columns are identical in values, although different in column names. These two columns are derived from the comparand columns of the operands; of course, the columns may be either simple or composite. Of the 10 types of theta-join, equi-join is the only one that yields a result in which the comparand columns are completely redundant, one with the other. The natural join behaves just like the equi-join except that one of the redundant columns, simple or composite, is omitted from the result.

The textbook you quote uses the term to refer to a different thing, where theta is a boolean expression combining comparisons. That is reminiscent of SQL INNER JOIN ON complex conditions. For Codd it is:

RB-24 The Boolean Extension of Theta-Join

Observe that your quoted textbook's theta-join is not an algebraic operator (mapping argument values to a result value)! It is a language non-terminal (in which a comparison expression does not denote a value). The textbook doesn't actually define sufficient algebra operators to implement its algebra-style query language. It's alleged set of basic algebra operators includes select but it actually only defines that as a language non-terminal since one operand is a condition expression.

(This sort of sloppiness is typical of relational database textbooks. There is a culture of fuzziness & confusion in the relational database field. Which, like the field, was begun by Codd.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...