Difference between revisions of "2015 UK General Election using SQL Server"

From SQLZOO
Jump to navigation Jump to search
Line 4: Line 4:
 
  (new-object system.net.webclient).downloadfile("http://researchbriefings.files.parliament.uk/documents/CBP-7186/hocl-ge2015-results-full.csv","$pwd\e2015.csv")
 
  (new-object system.net.webclient).downloadfile("http://researchbriefings.files.parliament.uk/documents/CBP-7186/hocl-ge2015-results-full.csv","$pwd\e2015.csv")
  
==Go into mysql and create a table for the results==
+
==Go into sqlcmd or SSMS and create a table for the results==
 
Here you create a single flat table that can store all of the unnormalised data.
 
Here you create a single flat table that can store all of the unnormalised data.
  
Line 26: Line 26:
 
   former_mp VARCHAR(3),
 
   former_mp VARCHAR(3),
 
   votes INT,
 
   votes INT,
   `share` FLOAT,
+
   share FLOAT,
   `change` VARCHAR(20),
+
   change VARCHAR(20),
 
   PRIMARY KEY(ons_id,firstname,surname)
 
   PRIMARY KEY(ons_id,firstname,surname)
  );
+
  )
 +
GO
 +
 
 
==Import the csv file into your flat table==
 
==Import the csv file into your flat table==
 
You can import the data using this line
 
You can import the data using this line

Revision as of 12:36, 16 June 2017

This tutorial assumes you have access to powershell and sqlcmd or Microsoft SQL Server Management Studio

Get to the powershell prompt and download the csv

(new-object system.net.webclient).downloadfile("http://researchbriefings.files.parliament.uk/documents/CBP-7186/hocl-ge2015-results-full.csv","$pwd\e2015.csv")

Go into sqlcmd or SSMS and create a table for the results

Here you create a single flat table that can store all of the unnormalised data.

The first line of the CSV file contains the column headings (ons_id,ons_region_id,constituency_name,county_name,region_name,country_name,constituency_type,party_name,party_abbreviation,firstname,surname,gender,sitting_mp,former_mp,votes,share,change). We create a table with a column for each of these:

mysql -u 40000036 -ptiger 40000036

The create table statement could be:

CREATE TABLE ge(
  ons_id VARCHAR(10),
  ons_region_id VARCHAR(10),
  constituency_name VARCHAR(50),
  county_name VARCHAR(50),
  region_name VARCHAR(50),
  country_name VARCHAR(50),
  constituency_type VARCHAR(10),
  party_name VARCHAR(50),
  party_abbreviation VARCHAR(50),
  firstname VARCHAR(50),
  surname VARCHAR(50),
  gender VARCHAR(6),
  sitting_mp VARCHAR(3),
  former_mp VARCHAR(3),
  votes INT,
  share FLOAT,
  change VARCHAR(20),
  PRIMARY KEY(ons_id,firstname,surname)
)
GO

Import the csv file into your flat table

You can import the data using this line

LOAD DATA INFILE '/tmp/hocl-ge2015-results-full.csv' INTO TABLE ge COLUMNS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES;

The above line is complicated but it instructs MySQL about the format of the file.

  • CSV stands for "comma separated values", each field is terminated by a comma
  • Each line is ended with the newline, carriage return \r\n
  • Sometimes a value includes a comma in such cases the field is enclosed by quotes.
    • The constituency "Ayr, Carrick and Cumnock" is an example.
  • The first row contains column headings not data and so we "IGNORE 1 LINES"

Run some queries

Now let's look at some data. How many female candidates were there?

select count(1) from ge where gender='female';

Who stood in Edinburgh South?

select surname,votes from ge where constituency_name='Edinburgh South';

You can now move to the next stage: 2015 UK General Election Normalising Data