Joining MySQL tables revisited - finding nonmatching records, etc
Archive - Originally posted on "The Horse's Mouth" - 2008-03-15 08:47:38 - Graham EllisA join lets me connect two tables one to the right of the other. Here's an example. First table - estate agents:
mysql> select agent,town,aid from agents;
+--------------------------+------------------+-----+
| agent | town | aid |
+--------------------------+------------------+-----+
| Alder King | Trowbridge | 1 |
| Connells | Trowbridge | 2 |
| DK Residential | Trowbridge | 3 |
| Jayson Kent | Melksham | 4 |
| Halifax | Trowbridge | 5 |
| Halifax | Melksham | 6 |
| Greg Pullen | Devizes | 7 |
| Town and Country Estates | Westbury | 8 |
| Davies and Davies | Bradford on Avon | 9 |
| Kavanaghs | Melksham | 10 |
+--------------------------+------------------+-----+
10 rows in set (0.17 sec)
And second table - homes for sale:
mysql> select agid,locate,asking,sid from sales;
+------+------------+--------+-----+
| agid | locate | asking | sid |
+------+------------+--------+-----+
| 10 | Semington | 225000 | 1 |
| 10 | Melksham | 195000 | 2 |
| 10 | Atworth | 237500 | 3 |
| 8 | Westbury | 152000 | 4 |
| 8 | Trowbridge | 205000 | 5 |
| 6 | Melksham | 229950 | 6 |
| 5 | Trowbridge | 335000 | 7 |
| 1 | Trowbridge | 96950 | 17 |
| 5 | Hilperton | 279950 | 8 |
| 4 | Melksham | 149950 | 9 |
| 1 | Trowbridge | 220000 | 18 |
| 4 | Melksham | 127500 | 10 |
| 11 | Melksham | 465000 | 11 |
| 3 | Semington | 222000 | 12 |
| 3 | Hilperton | 465000 | 13 |
| 2 | Westbury | 140000 | 14 |
| 2 | Trowbridge | 116000 | 15 |
| 11 | Melksham | 275000 | 16 |
| 4 | Semington | 250000 | 20 |
+------+------------+--------+-----+
19 rows in set (0.17 sec)
By default, a join connects all records in the first table with all records in the second table , so with 19 rows in one and 10 rows in the other, a joine gives me no less that 190 records!
(If the following outputs clip or folds - see here for an alternative display)
mysql> select agid,locate,asking,sid,agent,town,aid
from sales join agents;
+------+------------+--------+-----+--------------------------+------------------+-----+
| agid | locate | asking | sid | agent | town | aid |
+------+------------+--------+-----+--------------------------+------------------+-----+
| 10 | Semington | 225000 | 1 | Alder King | Trowbridge | 1 |
| 10 | Melksham | 195000 | 2 | Alder King | Trowbridge | 1 |
| 10 | Atworth | 237500 | 3 | Alder King | Trowbridge | 1 |
| 8 | Westbury | 152000 | 4 | Alder King | Trowbridge | 1 |
| 8 | Trowbridge | 205000 | 5 | Alder King | Trowbridge | 1 |
| 6 | Melksham | 229950 | 6 | Alder King | Trowbridge | 1 |
| 5 | Trowbridge | 335000 | 7 | Alder King | Trowbridge | 1 |
| 1 | Trowbridge | 96950 | 17 | Alder King | Trowbridge | 1 |
| 5 | Hilperton | 279950 | 8 | Alder King | Trowbridge | 1 |
| 4 | Melksham | 149950 | 9 | Alder King | Trowbridge | 1 |
| 1 | Trowbridge | 220000 | 18 | Alder King | Trowbridge | 1 |
| 4 | Melksham | 127500 | 10 | Alder King | Trowbridge | 1 |
(etc)
| 3 | Hilperton | 465000 | 13 | Kavanaghs | Melksham | 10 |
| 2 | Westbury | 140000 | 14 | Kavanaghs | Melksham | 10 |
| 2 | Trowbridge | 116000 | 15 | Kavanaghs | Melksham | 10 |
| 11 | Melksham | 275000 | 16 | Kavanaghs | Melksham | 10 |
| 4 | Semington | 250000 | 20 | Kavanaghs | Melksham | 10 |
+------+------------+--------+-----+--------------------------+------------------+-----+
190 rows in set (0.36 sec)
Clearly that's not what I want - I usually want so see what connects to what, so I add an on clause to give me all the records that match as appropriate:
mysql> select agid,locate,asking,sid,agent,town,aid
from sales join agents on aid = agid;
+------+------------+--------+-----+--------------------------+------------+-----+
| agid | locate | asking | sid | agent | town | aid |
+------+------------+--------+-----+--------------------------+------------+-----+
| 10 | Semington | 225000 | 1 | Kavanaghs | Melksham | 10 |
| 10 | Melksham | 195000 | 2 | Kavanaghs | Melksham | 10 |
| 10 | Atworth | 237500 | 3 | Kavanaghs | Melksham | 10 |
| 8 | Westbury | 152000 | 4 | Town and Country Estates | Westbury | 8 |
| 8 | Trowbridge | 205000 | 5 | Town and Country Estates | Westbury | 8 |
| 6 | Melksham | 229950 | 6 | Halifax | Melksham | 6 |
| 5 | Trowbridge | 335000 | 7 | Halifax | Trowbridge | 5 |
| 1 | Trowbridge | 96950 | 17 | Alder King | Trowbridge | 1 |
| 5 | Hilperton | 279950 | 8 | Halifax | Trowbridge | 5 |
| 4 | Melksham | 149950 | 9 | Jayson Kent | Melksham | 4 |
| 1 | Trowbridge | 220000 | 18 | Alder King | Trowbridge | 1 |
| 4 | Melksham | 127500 | 10 | Jayson Kent | Melksham | 4 |
| 3 | Semington | 222000 | 12 | DK Residential | Trowbridge | 3 |
| 3 | Hilperton | 465000 | 13 | DK Residential | Trowbridge | 3 |
| 2 | Westbury | 140000 | 14 | Connells | Trowbridge | 2 |
| 2 | Trowbridge | 116000 | 15 | Connells | Trowbridge | 2 |
| 4 | Semington | 250000 | 20 | Jayson Kent | Melksham | 4 |
+------+------------+--------+-----+--------------------------+------------+-----+
17 rows in set (0.18 sec)
Which gives me just 17 out of 190 records.
I can even ask for the other 173 records if I wish by switching to a not equals operator and getting a list of who is NOT selling what - the Estate agents you can walk into and enquire about a specific house to be told "Sorry - that one's NOT on our books". Sounds silly, doesn't it - but the query IS possible:
mysql> select agid,locate,asking,sid,agent,town,aid
from sales join agents on aid != agid;
+------+------------+--------+-----+--------------------------+------------+-----+
| agid | locate | asking | sid | agent | town | aid |
+------+------------+--------+-----+--------------------------+------------+-----+
| 10 | Semington | 225000 | 1 | Kavanaghs | Melksham | 10 |
| 10 | Melksham | 195000 | 2 | Kavanaghs | Melksham | 10 |
| 10 | Atworth | 237500 | 3 | Kavanaghs | Melksham | 10 |
(etc}
| 2 | Trowbridge | 116000 | 15 | Kavanaghs | Melksham | 10 |
| 11 | Melksham | 275000 | 16 | Kavanaghs | Melksham | 10 |
| 4 | Semington | 250000 | 20 | Kavanaghs | Melksham | 10 |
+------+------------+--------+-----+--------------------------+------------------+-----+
173 rows in set (0.35 sec)
You may have noticed that we had 19 properties for sale, but only 17 of them came up on the "reasonable" join. That's because my data includes two properties which are not listed with any agent at all - they're for sale by owner. If I want to include extra records to ensure that I have one output row for EVERY incoming row in the leftmost of the tables in my join, I can generate the extra records by specifiying a LEFT JOIN rather than a JOIN, and the extra records are generated, NULL filled. There's a full explanation and example of left and right joins [here] and that page also shows you how to find ONLY those records which don't match - the "for sale by owner" orphans, and the estate agents with nothing for sale.