Databases are where serious applications and servers hold all their data. A program that allows databases to be created and modified is called a Database Management System (DBMS).
The language to communicate with a DBMS is SQL: Structured Query Language.Most DBMSes accept all SQL commands. Some have quirks for SQL commands. Some have non standard SQL commands. But for basic stuff, any DBMS will do.
There are a number of floss DBMSes:
You may have also heard of some proprietary ones:
- Microsoft SQL "sequel" Server
Inside a database you will find a number of tables (think spreadsheets, not furniture). Each table has some fields/columns that store different types of data, and a number of entries/rows which hold that data. Tables will also have one or more Primary Keys, which are used to find a unique row. Often the primary key will just be a number, like a customer ID.
To make the utterly boring topic of databases interesting, let's mess about with the Ashley Madison database. Ashley Madison is/was a dating site for married people. It was owned in 2015 and there is a torrent available.
Installing a DBMS
MariaDB is cross platform, so let's have a play. First step is to grab the binaries:
- On Linux, install the mysql package.
- On Windows, download the standalone zipfile from the MariaDB downloads page. Extract it somewhere when it finished downloading. For reference, this guide is using v 10.0.21.
With the binaries downloaded and extracted, you need to do some initial setup:
- On Linux, following the ArchWiki MariaDB page we must:
- # mysql_install_db --user=mysql --basedir=/usr --datadir=/var/lib/mysql
- your database files will be in /var/lib/mysql
- # mysql_install_db --user=mysql --basedir=/usr --datadir=/var/lib/mysql
- On Windows:
- Open a command prompt by hitting Windows Key + R and typing cmd.exe and hitting enter.
- Navigate to your extracted mariadb/bin directory, e.g.
- cd ..
- cd downloads/mariadb/bin
- mysql_install_db.exe -d C:/am
- your database files will be in C:/am
Now it's time to run the mysqld daemon. This will run the DBMS as a service on your machine.
- Linux: # systemctl start mysqld
- Windows: run mysqld.exe from the mariadb/bin directory.
- A black terminal window will pop up briefly and fade away. You can check that mysqld.exe is running in Task Manager.
- If your get a firewall prompt asking for network access, feel free to deny it as we don't need our database accessable on the network.
You can also set some security parameters in Linux:
- # mysql_secure_installation
- disable everything and set a root password for the database.
At this point the DBMS is installed, configured, and running.
- Now it's time to create a database.
- Windows: in your command prompt, run mysql.exe -u root
- Linux: mysql -u root -p
- At the mysql prompt, enter:
- create database am;
- this will make a new database called am. the semicolon denotes the end of a command.
- create database am;
- Quit by typing exit.
Now let's import the Ashley Madison data to our new "am" database.
- Importing the data will take many hours. It's at least an overnight job. It's not a small database. If the Ashley Madison webserver caught on fire and you were the database administrator, this is the exact same shit you'd do.
- Extract a database dump from the AM torrent. For this example we'll use the email table dump.
- Windows: Extract aminno_member_email.dump.gz with 7zip. Then move the .dump file to the mariadb/bin directory (or setup your PATH variable to know about mariadb).
- Linux: $ 7z x aminno_member_email.dump.gz
With the .dump extracted, import it into the databasse.
- Windows: in your command prompt, enter:
- mysql.exe am -v -u root < aminno_member_email.dump
- mysql -v -u root -p am < aminno_member_email.dump
- For both of these commands, -v denotes verbose mode, which will output updates on what the database is doing. Without this, you will have no output and start to wonder if anything is happening.
- Database imports are bottlenecked by your HDD. If you have an SSD, do everything on the SSD. You will not notice more than 5% or so of your CPU or RAM being used.
Basic SQL Syntax
With one or more tables imported into the database we can have a look at what we have.
Grab yourself a MariaDB prompt by running:
- Windows: mysql.exe am -u root
- Linux: mysql am -u root -p
Tables and Table Properties
The commands to show tables and their properties differ from DBMS to DBMS. Here we'll be using MariaDB's cxommands.
This will list all the tables in the database. If you've imported the aminno_member_email.dump file, you should see the aminno_member_email table listed;
This will list the fields (columns) in the table.
- Field is the field's name.
- Type is the datatype (varchar is text, int is a number, tinyint is used for true/false).
- Null defines whether or not a field can be empty or not.
- Key lists whether a field is used as a key. PRI means Primary Key, and this field must be unique (in our case it's the "pnum" field).
- Default lists any default value that a field may have.
Now that we know what tables we have and what they look like, we can view some of the entries.
select * from aminno_member_email limit 5;
This command will list the first 5 entries in the table. Let's break it down:
- Select tells the DBMS that we're going to search for entries.
- The asterix says we want all of the fields to be displayed in our search results.
- from aminno_member_email tells the DBMS what table to search.
- limit 5 tells the DBMS to stop searching once 5 results have been found (this is just here so that we didn't output the millions and millions of entries.
select pnum, email from aminno_member_email limit 5;
This query is the same as the above one, except we're only listing the pnum and email fields in the results.
select pnum, email from aminno_member_email where email="email@example.com";
This time we're searching for a specific email address. This query will return any entries that have this exact email address. Everything in inverted commas is case sensitive. We've also dropped the limit on search results, since we're not expecting to see too many results.
select pnum, email from aminno_member_email where email like "%@housecapades.com";
This time we're searching for email addresses which match a pattern. This query will take a little longer to do it's work.
Searching Across Multiple Tables
If you've imported multiple tables into the database, you can search across multiple tables to find more useful data. This is known as "joining" the tables.
You need to tell the DBMS how to match up data from one table to another. This is where Keys come in.
Earlier we saw a field in the aminno_member_email table called pnum which was listed as the Primary Key. Other tables are likely to refer to this pnum, or even use an identical pnum field themselves.
If you have only imported on table so far, import the member_details.dump file. We'll use it as an example here, but if you have imported something else, the steps will be roughly the same.
Here we can see that the Primary Key is also pnum, so our two tables will be dead easy to match up. If you're working with different tables, there could be a pnum field which isn't the Primary Key. This is known as a Foreign Key and it's there to match up data from other (foreign) tables.
In member_details there is a dob (date of birth) field, and a profile caption. Let's look at these for our good friend firstname.lastname@example.org who has a pnum of 9.
select dob, profile_caption, email from aminno_member_email, member_details where aminno_member_email.pnum=member_details.pnum and aminno_member_email.pnum = 9;
- We want the dob, profile_caption and email fields displayed in the results. All of these fields are uniquely named, so we don't need to tell the DBMS which tables they're on.
- We're selecting data from both the aminno_member_email and member_details tables;
- We're matching the pnum field from aminno_member_email and the pnum from member_details. Without this the DBMS won't know how the tables relate to each other.
- We're matching the pnum of 9, our buddy darren. Here we specified the use of the pnum from aminno_member_email, but could have just as easily matched it from member_details.
So now we know that Darren had a shitty childhood because his birthday is close to christmas, and that he's too old to be using "4" instead of "for" on the internet.
Other fields in the member_details are worth noting:
- An eye color and hair color must be specified, but profile_smoke and profile_drink can remain blank.
- Numbers are used for many fields which aren't logically numbers. Do you know anyone with a hair color of 6? Numbers are used in place of text for faster processing. Computers blast through numbers in a fraction of the time they compute text.
- How do you know what color hair is 6? This might be listed in another table, or more likely is defined in the software that uses this database. You could perhaps do a statistical analysis and find which colors are most popular in the database and match that against color frequency in english speaking society (i.e. people who use the ashley madison website). Or go on the AM website, make a profile and presume that the drop down list with hair colors are numbered from 0 onwards.
Darren's email was used to create two accounts: 9 and 1796677. Let's have a look at his other account, but this time rather than specifying his pnum, let's just search by his email address. This query will be a little slower than using his pnum, but it will serve as a nice little wrapup of basic sql queries and will also serve as a copy/paste for tl;dr types:
select dob, profile_caption, email from aminno_member_email, member_details where aminno_member_email.pnum=member_details.pnum and email="email@example.com";
Perhaps his childhood wasn't so shitty. Perhaps our mate Darren is smart enough not to give ashley madison his true dob. Nice one Dazza.