Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

sql - Misnamed field in subquery leads to join

I faced a problem with a data loss, caused by a wrong query.
Data restored, but now I'd like to understand the problem.

I encountered the problem on SQL Server 2014, but I replicated it on SQL Server 2000 and PostgreSQL. Specifically, there was a DELETE. In the following scenario I use a SELECT.

The tables creation for sql server 2014:

CREATE TABLE [dbo].[tmp_color](
    [color_id] [int] NOT NULL,
    [color_name] [nvarchar](50) NOT NULL,
    [color_cat] [int] NOT NULL,
 CONSTRAINT [PK_tmp_color] PRIMARY KEY CLUSTERED (
    [color_id] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF
      , ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[tmp_color_cat](
    [catid] [int] NOT NULL,
    [catname] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_tmp_color_cat] PRIMARY KEY CLUSTERED (
    [catid] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF
      , ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

And the Postgres version:

CREATE TABLE tmp_color (
  color_id integer NOT NULL,
  color_name text,
  color_cat integer,
  CONSTRAINT tmp_color_pkey PRIMARY KEY (color_id)
);

CREATE TABLE tmp_color_cat (
  catid integer NOT NULL,
  catname text,
  CONSTRAINT tmp_color_cat_pkey PRIMARY KEY (catid)
);

Data population (works on both RDBMS):

INSERT INTO tmp_color_cat (catid, catname) VALUES (1, 'magic color');
INSERT INTO tmp_color_cat (catid, catname) VALUES (2, 'normal color');

INSERT INTO tmp_color (color_id, color_name, color_cat) VALUES (1, 'red', 1);
INSERT INTO tmp_color (color_id, color_name, color_cat) VALUES (2, 'green', 2);
INSERT INTO tmp_color (color_id, color_name, color_cat) VALUES (3, 'black', 1);

The following SELECT is wrong:

SELECT color_cat
FROM tmp_color_cat;

Because color_cat does not exists in tmp_color_cat.
BUT, the moment you take this in a subquery:

SELECT * FROM tmp_color
WHERE color_cat IN(
    SELECT color_cat
    FROM tmp_color_cat
    WHERE catname = 'magic color'
);

It returns every single record from tmp_color.
The logical error in script is obvious: developer wrote the wrong column to identify category. If you are deleting records instead of selecting them, you will delete entire table. Not good.

Is this desired behavior? Or it is a consequence of subquery design?

By observing the execution plan of SQL Server, the logical operation is a Left Semi Join.

I found a couple of posts, one for PostgreSQL and one for SQL Server. Is there any good documentation I could send to the developer group explaining why this is not an error?

How can I avoid this kind of problems? My first idea is to use an alias. Aliases are good.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is known behaviour with SQL Server. Using Aliases will prevent this

SELECT * FROM tmp_color
WHERE color_cat IN(
    SELECT A.color_cat
    FROM tmp_color_cat As A
    WHERE A.catname = 'magic color'
);

The above query will throw an error

Msg 207, Level 16, State 1, Line 3
Invalid column name 'color_cat'.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...