SQL - Data v Metadata, and the various stages of data selection
Archive - Originally posted on "The Horse's Mouth" - 2011-04-29 11:00:22 - Graham Ellis
MySQL is a system for managing data ... so it's natural for it to manage its own configuration data too in the same way. However, there are a few elements which have to be kept outside those internal tables - such as the data about which port number the daemon is to listen on, and the directory name (file path) for the databases to be stored.
There are defaults values built in to each of the MySQL processes which will be used in the absence of any other settings. But they are overridden as follows:
• The file /etc/my.cnf - a system wide file that contains settings for MySQL server and client programs for all user accounts
• Files ~username/.my.cnf - settings for all server and client programs on a "per user" basis. So there's (potentaiily) a file called .my.cnf in every user's home directory, containing instructions for all client and servers run by that user which will override any conflicing directives in /etc/my.cnf, and defaults.
• From the command line - specific MySQL settings can be made for individual processes, irrespective of defaults and overrides in /etc/mt.cnf and ~username/.my.cnf
Database engines look after data - and they also look after the structure of how that data is held. That's known as the Metadata. So the metadata is "how" and the data is "what".
The diagram shows the separation between data and metadata. As a comparison on this week's MySQL course, I equated the metadata to the cages and enclosures at a zoo, and the data to the animals themselves. SQL instructions too differ between those that run on the data and metadata ... to look at data, you'll use a SELECT, but to look at metadata, you'll use SHOW. And so it goes on.
MySQL databases are accessed through logins - unlike web servers which are often open to all (at least at a protocol level), users of MySQL need to establish credentials. Not only is there the traditional user name and password associated with an account (and the authorisation or denial of a request), but there's an additonal layer that can be used with is the remote location from which the connection is being made.
By using the remote location on database accounts, MySQL can be locked down to respond to requests only from the subnet on which the server islocated, or indeed purely from other processes on the very host on which it is running. Very often these restrictions will be backed up by firewall settings that would (in any case) prevent information on MySQL ports getting diretly to the database server - but of course it's better to have two levels of access control in place than none at all.
Even when I'm running a course during the week, other things are going on too. Early on Thursday morning, I had to drop off some papers in our neighbouring town of Trowbridge. Looking briefly around Trowbridge (and carefully pushing the camera's shutter button when there was no traffic), I marvelled at some of the hidden beauty that there's to see in our area, even in towns which are not renowned for the tourist attraction and great beauty.
There are quite a few steps from data being extracted from database tables to it being returned to the user who made the SQL request, and each of the steps may be specified by a clause on a SELECT command.
• Tables are JOINED to link columns in one table to columns in another table - linking rows, if you like.
• Records are filtered with a WHERE clause, in which individual records that don't meet required criteria.
• If there's a need to link a whole series of rows into a single row, it's done by a GROUP clause. You may require, for example to know how many soldiers are based at a certain barracks rather than all their names!
• The HAVING clause allows for further records to be eliminated after they have been grouped - so that (following on from the previous example) we could eliminate all barracks with less than 20 soldiers based there.
• the remaining records are now SORTed into order
• and finally a LIMIT is applied if required so that only the first / top records are returned.