Difference between revisions of "2015 UK General Election using SQL Server"
(Created page with "This tutorial assumes you have access to powershell and sqlcmd or Microsoft SQL Server Management Studio ==Get to the powershell prompt and download the csv== (new-object sy...") |
|||
Line 2: | Line 2: | ||
==Get to the powershell prompt and download the csv== | ==Get to the powershell prompt and download the csv== | ||
− | (new-object system.net.webclient).downloadfile("http://researchbriefings.files.parliament.uk/documents/CBP-7186/hocl-ge2015-results-full.csv"," | + | (new-object system.net.webclient).downloadfile("http://researchbriefings.files.parliament.uk/documents/CBP-7186/hocl-ge2015-results-full.csv","$pwd\e2015.csv") |
− | |||
==Go into mysql and create a table for the results== | ==Go into mysql and create a table for the results== |
Revision as of 12:35, 16 June 2017
This tutorial assumes you have access to powershell and sqlcmd or Microsoft SQL Server Management Studio
Contents
Get to the powershell prompt and download the csv
(new-object system.net.webclient).downloadfile("http://researchbriefings.files.parliament.uk/documents/CBP-7186/hocl-ge2015-results-full.csv","$pwd\e2015.csv")
Go into mysql and create a table for the results
Here you create a single flat table that can store all of the unnormalised data.
The first line of the CSV file contains the column headings (ons_id,ons_region_id,constituency_name,county_name,region_name,country_name,constituency_type,party_name,party_abbreviation,firstname,surname,gender,sitting_mp,former_mp,votes,share,change). We create a table with a column for each of these:
mysql -u 40000036 -ptiger 40000036
The create table statement could be:
CREATE TABLE ge( ons_id VARCHAR(10), ons_region_id VARCHAR(10), constituency_name VARCHAR(50), county_name VARCHAR(50), region_name VARCHAR(50), country_name VARCHAR(50), constituency_type VARCHAR(10), party_name VARCHAR(50), party_abbreviation VARCHAR(50), firstname VARCHAR(50), surname VARCHAR(50), gender VARCHAR(6), sitting_mp VARCHAR(3), former_mp VARCHAR(3), votes INT, `share` FLOAT, `change` VARCHAR(20), PRIMARY KEY(ons_id,firstname,surname) );
Import the csv file into your flat table
You can import the data using this line
LOAD DATA INFILE '/tmp/hocl-ge2015-results-full.csv' INTO TABLE ge COLUMNS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES;
The above line is complicated but it instructs MySQL about the format of the file.
- CSV stands for "comma separated values", each field is terminated by a comma
- Each line is ended with the newline, carriage return \r\n
- Sometimes a value includes a comma in such cases the field is enclosed by quotes.
- The constituency "Ayr, Carrick and Cumnock" is an example.
- The first row contains column headings not data and so we "IGNORE 1 LINES"
Run some queries
Now let's look at some data. How many female candidates were there?
select count(1) from ge where gender='female';
Who stood in Edinburgh South?
select surname,votes from ge where constituency_name='Edinburgh South';
You can now move to the next stage: 2015 UK General Election Normalising Data