2015 UK General Election using SQL Server
This tutorial assumes you have access to powershell and sqlcmd or Microsoft SQL Server Management Studio
Get to the powershell prompt and download the csv
Go into mysql and create a table for the results
Here you create a single flat table that can store all of the unnormalised data.
The first line of the CSV file contains the column headings (ons_id,ons_region_id,constituency_name,county_name,region_name,country_name,constituency_type,party_name,party_abbreviation,firstname,surname,gender,sitting_mp,former_mp,votes,share,change). We create a table with a column for each of these:
mysql -u 40000036 -ptiger 40000036
The create table statement could be:
CREATE TABLE ge( ons_id VARCHAR(10), ons_region_id VARCHAR(10), constituency_name VARCHAR(50), county_name VARCHAR(50), region_name VARCHAR(50), country_name VARCHAR(50), constituency_type VARCHAR(10), party_name VARCHAR(50), party_abbreviation VARCHAR(50), firstname VARCHAR(50), surname VARCHAR(50), gender VARCHAR(6), sitting_mp VARCHAR(3), former_mp VARCHAR(3), votes INT, `share` FLOAT, `change` VARCHAR(20), PRIMARY KEY(ons_id,firstname,surname) );
Import the csv file into your flat table
You can import the data using this line
LOAD DATA INFILE '/tmp/hocl-ge2015-results-full.csv' INTO TABLE ge COLUMNS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES;
The above line is complicated but it instructs MySQL about the format of the file.
- CSV stands for "comma separated values", each field is terminated by a comma
- Each line is ended with the newline, carriage return \r\n
- Sometimes a value includes a comma in such cases the field is enclosed by quotes.
- The constituency "Ayr, Carrick and Cumnock" is an example.
- The first row contains column headings not data and so we "IGNORE 1 LINES"
Run some queries
Now let's look at some data. How many female candidates were there?
select count(1) from ge where gender='female';
Who stood in Edinburgh South?
select surname,votes from ge where constituency_name='Edinburgh South';
You can now move to the next stage: 2015 UK General Election Normalising Data