2015 UK General Election Normalising Data

From SQLZOO
Revision as of 21:29, 19 October 2016 by Andr3w (talk | contribs) (Created page with "The unnormalised data from the CSV file looks like this: File:ge2015.png Notice that it includes redundancy - constituency names are repeated for example. We can see that...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The unnormalised data from the CSV file looks like this: Ge2015.png Notice that it includes redundancy - constituency names are repeated for example. We can see that link between the ons_id and the constituency_name is repeated over and over. Our first job is to identify a primary key. There are several options to consider:

  • First we notice that there is no single column that is unique and so we will need to use at least a pair of columns.
  • (firstname,surname) would be good
    • The candidates are unique. By law, no one is allowed to stand as a candidate in two constituencies
    • Unfortunately candidate names are not unique - there are two candidates called "Alan Johnson" for example. You can confirm this with a query such as
      SELECT firstname,surname COUNT(1) FROM ge HAVING COUNT(1)>1