SQL

1791 readers

1 users here now

Related Fediverse communities:

#sql on Mastodon
#postgresql on Mastodon
c/PostgreSQL on programming.dev

Icon base by Delapouite under CC BY 3.0 with modifications to add a gradient

founded 2 years ago

MODERATORS

Ategon@programming.dev

jnovinger@programming.dev

Are these queries equivalent? (lemmy.ca)

submitted 7 months ago* (last edited 7 months ago) by roadrunner_ex@lemmy.ca to c/sql@programming.dev

14 comments fedilink hide all child comments

Putting aside any opinions on performance, I've been trying to test a notion about whether a couple queries would output the same data (ordering doesn't matter).

SELECT *
FROM articles
WHERE (
  last_updated >= %s
  OR id IN (1, 2, 3)
  )
  AND created_at IS NOT NULL

SELECT *
FROM articles
WHERE last_updated >= %s
  AND created_at IS NOT NULL
UNION
SELECT *
FROM articles
WHERE id IN (1, 2, 3)
  AND created_at IS NOT NULL

I think they're equivalent, but I can't prove it to myself.

Edit: Aye, looking at the replies, I'm becoming aware that I left out a couple key assumptions I've made. Assuming:

a) id is a PRIMARY KEY (or otherwise UNIQUE)

b) I mean equivalent insofar as "the rows returned will contain equivalent data same (though maybe ordered differently)"

you are viewing a single comment's thread
view the rest of the comments

[–] rollin@piefed.social 5 points 7 months ago (2 children)

I don't think they're technically the same because UNION implicitly removes duplicates.

In the case of your specific data, the queries are probably functionality the same as you probably wouldn't have duplicates in the first query because each row most likely has a unique ID column. Even if it didn't, last-updated and created-at are probably timestamps which would in practice make them unique, not to mention other fields such as headline and article body - unless there had been a glitch causing a row to be inserted twice.

If you were to use UNION ALL in place of UNION, duplicates would no longer be removed from the second query. In that case, even if you had duplicate rows in the first query, the second query would return the same rows unless any rows with ID 1, 2 or 3 also had been updated in the given timespan (as those will now be duplicated by the second query)

Pretty sure that's how UNION works, so in practice, I think you'd get the same rows 99.9% of the time.

[–] RobertTableaux@programming.dev 8 points 7 months ago (1 children)

The UNION removing any dups here is what makes them the same - the top query would never have duplicates as written.

[–] rollin@piefed.social 2 points 7 months ago

Ah but that's true ONLY IF the table doesn't itself contain duplicates. Quick example:

CREATE TEMPORARY TABLE animal (species VARCHAR(255) NOT NULL, colour VARCHAR(255) NOT NULL);
INSERT INTO animal VALUES ('monkey', 'green'), ('rabbit', 'orange'), ('elephant', 'pink'),('monkey','blue'),('rabbit','orange'),('monkey','green'),('monkey','green');

SELECT * FROM animal WHERE species = 'monkey' OR colour = 'green';
+---------+--------+
| species | colour |
+---------+--------+
| monkey  | green  |
| monkey  | blue   |
| monkey  | green  |
| monkey  | green  |
+---------+--------+
SELECT * FROM animal WHERE species = 'monkey' UNION SELECT * FROM animal WHERE colour = 'green';
+---------+--------+
| species | colour |
+---------+--------+
| monkey  | green  |
| monkey  | blue   |
+---------+--------+

So we could change the query to use UNION ALL, which does include duplicates. In that case, the returned rows are the same ONLY IF the rows returned by the left side of the UNION do not overlap those returned by the right side, otherwise it will return more rows.

SELECT * FROM animal WHERE species = 'monkey' UNION ALL SELECT * FROM animal WHERE colour = 'green';
+---------+--------+
| species | colour |
+---------+--------+
| monkey  | green  |
| monkey  | blue   |
| monkey  | green  |
| monkey  | green  |
| monkey  | green  |
| monkey  | green  |
| monkey  | green  |
+---------+--------+

For completeness, here's an example where the two queries in the UNION do not return any of the same rows:

SELECT * FROM animal WHERE species = 'monkey' OR colour = 'orange';
+---------+--------+
| species | colour |
+---------+--------+
| monkey  | green  |
| rabbit  | orange |
| monkey  | blue   |
| rabbit  | orange |
| monkey  | green  |
| monkey  | green  |
+---------+--------+
SELECT * FROM animal WHERE species = 'monkey' UNION ALL SELECT * FROM animal WHERE colour = 'orange';
+---------+--------+
| species | colour |
+---------+--------+
| monkey  | green  |
| monkey  | blue   |
| monkey  | green  |
| monkey  | green  |
| rabbit  | orange |
| rabbit  | orange |
+---------+--------+

[–] roadrunner_ex@lemmy.ca 2 points 7 months ago* (last edited 7 months ago)

Aye, looking at the replies, I'm becoming aware that I left out a couple key assumptions I've made. Assuming:

a) id is a PRIMARY KEY (or otherwise UNIQUE)

b) I mean equivalent insofar as "the rows returned will contain equivalent data same (though maybe ordered differently)"