l Sharpe Analytics

Sharpe Analytics

Stockholm · Edinburgh · Yorkshire · Remote · contact@sharpeanalytics.com

I do data science. For fun, for good and/or for money. Ideally all three.

I like music/podcasts, geography (especially northern or rural geography), houses, healthcare, literature, and football.


Main Employment

Staff Data Scientist

Spotify · Stockholm/Yorkshire (remote)
Just about tied as the most senior data scientist at Spotify. Mainly working on making Spotify the destination for podcasts. Working with content, product, marketing, personalisation and anybody else I can find to try to make things happen. All the stuff I was doing before for Spotify, but just a bit more of it. 30+ data scientists & user researchers, more senior stakeholders (C-Suite), generating and driving patentable inventions. Senior data scientist++

Python (Jupyter notebooks, Sklearn, Pandas etc.), Google Bigquery, Scala, Tensorflow

September 2019 - Present

Senior Data Scientist

Spotify · Stockholm

Still podcasts. A little more AB testing, a fair amount of metric setting, smatterings of machine learning, and then loads and loads of data visualisations and storytelling. Mentoring other data scientists, coordinating the efforts of 10+ data scientists and user researchers, influencing senior people and getting them to rely on insights more. Basically just being useful with data.


Python (Jupyter notebooks, Sklearn, Pandas etc.), Google Bigquery, Javascript

October 2017 - September 2019

Senior Data Scientist

Skyscanner · Edinburgh

Hacking on HR data to uncover opportunities where data science hadn't really been used before, building and measuring recommendations seen my tens of millions of people, setting metrics that hundreds of colleagues are working towards and growth. Plus all the usual working in and contributing to a data science community.


Python (Jupyter notebooks, Sklearn, Pandas etc.), Scala, AWS (EC2, Redshift), Spark

June 2016 - October 2017

Data Scientist

Arachnys · Birmingham/London (remote)

Building a really cool, really custom B2B sales risk model and sales predictor using Bayesian probabilities and caffeine. Also, fairly large (multi-TB) NLP processing pipelines from scratch.


Python (Stanford-NLP, NLTK, Sklearn), Java, Storm, Spark, Hadoop, Postgres

January 2015 - May 2016

Data Scientist

The Hut Group · Northwich

Properly setting up the company's first ever Hadoop cluster, being that AB testing guy (setting up the systems, reporting the results, running the big site-wide redesign test), creating automated pricing algorithms, sending an awful lot of marketing emails (automatically, based on user attributes). As one of the two first data scientists at the company, basically setting up the whole data science discipline at what is now a multi-billion dollar company.


Python, Bash, SQL Server, Hadoop, Hive, Excel

June 2012 - January 2015

Education & Side Hustles

DPhil Social Data Science

Oxford University

Studying habits with online media at the Oxford Internet Institute

October 2022 - Present

Skills Advisory Board Member

The Data Lab
February 2020 - August 2022

Honorary Fellow

Edinburgh University
September 2018 - September 2021

CEO & Founder

Sharpe Analytics

Raised around £30k to build a prototype machine learning system designed to predict the life-expectancy of terminal cancer patients. Engaged with patient support groups, wrote and ran surveys, worked with a number of external partners and built a nifty lightweight ML system.

January 2018 - Present

Course Writer - Big Data

IKMNet

Wrote an online course/test for professionals wanting to gain a certification in Cassandra (noSQL database).

May 2016 - June 2016

MSci Natural Sciences (Hons)

Durham University

Maths, Physics & Chemistry

October 2007 - June 2011

Skills

Programming Languages & Tools
  • Python, Java, Scala, Bash, Javascript
  • SQL & noSQL (Bigquery, Redshift, SQL Server, MySQL, Postgresql, Redis, Cassandra, Hive)
  • Machine learning, AB Testing, Metrics Setting, Data Visualisation, Causal Inference
  • Google Cloud, AWS, Spark, Hadoop

Projects

Rural England in Data

Everybody seems to like studing cities using data, and that's all well and good but I live in a small village now. And besides, I bought a house and saw about 15 different villages in Yorkshire in doing so. And there's clearly some really interesting relationship between population of a village and the number of facilities. So, can we work out at what population size a village typically has a pub? A church? A school? First you spend 10 days straight downloading OpenStreetMap data for the country. Then you download a whole bunch of stuff from the Office of National Statistics. Then you do some boundary jiggery-pokery and hey-presto, you've got the number of pubs per parish for every rural parish in England. Maths maths maths and there you have it - regional disparities in the relationship between population size and facilites. Check it out - there's an academic looking paper and everything.

Rightmove Scraper

I'm looking at buying a house. So, you download all of Rightmove. Then, you build something that lets you search much more specifically. As an example, let's say you want to be within 20 minutes walk of a pub, 20 minutes walk from a primary school and 3 hours public transport from London on a Monday morning - that's no stress and easy enough given Google's APIs. Of course, you can always build the typical predictive model to work out how much you think each house is worth. Or even, use your wife's favourite houses to predict which houses might be interesting to her. But realistically, I'll just build something that allows custom searching.

Betfair Better

I love a get-rich-quick scheme as much as the next man, so here's mine. Download the last 5 year's worth of football matches across every country, and use that to build a model that predicts how many goals a team will score in a game. Then, predict scores for all the games coming up in the next few days. Then, compare your odds to the bookies odds and place bets when there's sufficient edge that you might make some money. Set it up on a cron job and watch the money roll in. Except, it seems to be slightly net negative in terms of the returns it's generated so far. It's not dreadful, but it's certainly not positive. I'd have a warning here about not running this blindly because it'll cost you money, but if you're smart enough to be able to run it, you're probably smart enough to not do so.

Wedding Website

Fairly simple this one - I just built a website for my wedding. Much more fun that using a template and allowed me to do cool stuff with letting people submit song requests and whatnot.

Politics and Manifestos

What better way to bring in an election that analysing the two main political parties manifestos and seeing how they compare to each other, and to manifestos from previous elections? How similar are David Cameron and Tony Blair? Is Corbyn really bringing back the 1970s? Natural language processing to the rescue! Download all the manifestos, turn them into text, and then play around using NLTK (and some custom stuff) to your heart's content.

Centre of the Fringe Festival

I lived in Edinburgh and really liked the Fringe festival and so thought it'd be interesting to work out where the geographic centre of the festival was. Basically, scraping the Fringe website to get all of the show locations, times and categories and then finding the weighted average and sticking them all on a filterable map (so you can pick your category and choose your accommodation accordingly)

Harry Potter Cluster Generation

I do love Harry Potter, and so it seemed like it'd be fun to play around with Harry Potter and data science to create something. K-means clustering, plus all of the Harry Potter books leads to a bunch of cool graphs and some 'topic clusters' that shows the thematically similar Harry Potter chapters across books. Honestly, it worked pretty well and now I'm wandering why I didn't do the same thing for a bunch of other books.