SQL Query Help

Kntx

Platinum Member
Dec 11, 2000
2,270
0
71
Hello,

I am writing an SQL query. Maybe I am having a brain fart. Maybe I just don't know what I'm doing. Whatever.

Here is a generic representation of what I'm trying to accomplish.

TABLE A
~~~~~~
ID
OTHER...

TABLE B
~~~~~~
ID
FK_A
OTHER...

So A has a one to many relationship to B. I want to get all records of A that have child rows in B. So I think, OK, join. But of course this approach gives me duplicate records of table A for as many child records exist in B. I am not interested in any data in table B. Just A. Just those records that have one or many child records. No records that don't have child records.

Note, that I have accomplished this by using multiple queries, the distinct clause and temporary tables. What I really want though, is a single query.

Thoughts?









 

mugs

Lifer
Apr 29, 2003
48,920
46
91
Yeah, select distinct would do what you're looking for. You generally want to avoid using DISTINCT, but in this case I'm not sure if there is a more efficient way to do it.

Other possibilities are:
SELECT *
FROM a
WHERE a.id IN (select fk_a from b)

or

SELECT *
FROM a
WHERE a.id IN (select fk_a from b group by fk_a)

or even

SELECT *
FROM a
WHERE a.id IN (select distinct fk_a from b)

Yeah I know I said it's best to avoid distinct, but it could make a difference if it's at a different point in the execution process. Depending on your RDBMS and whether you have an index on b.fk_a, those last two options might be more efficient than joining and then doing a select distinct. Note that I am not a DBA and I am not an expert on query efficiency, this is just me thinking logically that a group by or a distinct on an indexed column might be faster. As for the first one, I'm not sure if that would be faster than a select distinct or not, but I have had surprising efficiency from queries formatted like that.

In any case, my post serves to highlight that there are often several ways to do something in SQL, and what seems like the best way or the most obvious way may not always be the most efficient way.
 

mugs

Lifer
Apr 29, 2003
48,920
46
91
FWIW, I ran the equivalent of all of the above queries on a pair of tables of work where the parent table has 32k records and the child table has 42 million.

Execution times:
select id from a where id IN (select fk_a from b);
Elapsed: 00:02:07.40


select distinct a.id from a, b where a.id = b.fk_a;
Elapsed: 00:06:23.54


select id from a where id IN (select fk_a from b group by fk_a);
Elapsed: 00:04:45.27


select id from a where id IN (select distinct fk_a from b);
Elapsed: 00:01:55.62

The order I executed them is the order that is shown. I executed my first query first to try to ensure that it didn't benefit from any caching. I suspect my last query (the distinct in the subquery) may have benefited from caching of the query before it, because they are essentially the same.

This is in Oracle by the way (9i to be precise). No doubt the results could be different in SQL Server or MySQL or PostgreSQL or what have you... or even in a more recent version of Oracle.

Moral of the story is, respect your DBA, because he has to know this stuff.

Edit: In the example above, there IS an index on b.fk_a
 

troytime

Golden Member
Jan 3, 2006
1,996
1
0
Originally posted by: mugs
FWIW, I ran the equivalent of all of the above queries on a pair of tables of work where the parent table has 32k records and the child table has 42 million.

Execution times:
select id from a where id IN (select fk_a from b);
Elapsed: 00:02:07.40


select distinct a.id from a, b where a.id = b.fk_a;
Elapsed: 00:06:23.54


select id from a where id IN (select fk_a from b group by fk_a);
Elapsed: 00:04:45.27


select id from a where id IN (select distinct fk_a from b);
Elapsed: 00:01:55.62

The order I executed them is the order that is shown. I executed my first query first to try to ensure that it didn't benefit from any caching. I suspect my last query (the distinct in the subquery) may have benefited from caching of the query before it, because they are essentially the same.

This is in Oracle by the way (9i to be precise). No doubt the results could be different in SQL Server or MySQL or PostgreSQL or what have you... or even in a more recent version of Oracle.

Moral of the story is, respect your DBA, because he has to know this stuff.

Edit: In the example above, there IS an index on b.fk_a


nice post!

what platform is that from?
also, whats the average number of childs per row? (in the IN statement or in the subquery)
 

Kntx

Platinum Member
Dec 11, 2000
2,270
0
71
Originally posted by: mugs
FWIW, I ran the equivalent of all of the above queries on a pair of tables of work where the parent table has 32k records and the child table has 42 million.

Execution times:
select id from a where id IN (select fk_a from b);
Elapsed: 00:02:07.40


select distinct a.id from a, b where a.id = b.fk_a;
Elapsed: 00:06:23.54


select id from a where id IN (select fk_a from b group by fk_a);
Elapsed: 00:04:45.27


select id from a where id IN (select distinct fk_a from b);
Elapsed: 00:01:55.62

The order I executed them is the order that is shown. I executed my first query first to try to ensure that it didn't benefit from any caching. I suspect my last query (the distinct in the subquery) may have benefited from caching of the query before it, because they are essentially the same.

This is in Oracle by the way (9i to be precise). No doubt the results could be different in SQL Server or MySQL or PostgreSQL or what have you... or even in a more recent version of Oracle.

Moral of the story is, respect your DBA, because he has to know this stuff.

Edit: In the example above, there IS an index on b.fk_a

You're the man. Now on Monday I can write my query!
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |