Sharpe Analytics

Yorkshire · Remote · contact@sharpeanalytics.com

I do data science. For fun, for good and/or for money. Ideally all three.


Main Employment

Staff Data Scientist

Automattic (WordPress, Tumblr)
We kind of run a huge chunk of the internet. I lead data science projects to figure out how people use these platforms and how we can make them better and safer. It's a mix of big data, ML, and working with lots of different teams to make cool stuff happen.

Python, SQL (BigQuery, Spark), LLMs

September 2024 - Present

AI Consultant

Credal.ai (YC)
Jumped in with a friend's YC-backed AI startup to help build their data systems from the ground up. A classic startup experience: working directly with the founders, figuring out what to measure, and doing a bunch of research on LLMs to help point the company in the right direction.

Python, SQL, LLMs, RAG

January 2024 - July 2024

Staff Data Scientist

Spotify
Started as a Senior Data Scientist and eventually became the most senior data scientists at the company, focused on making Spotify the destination for podcasts. This involved everything from mentoring data scientists and coordinating research, to leading C-suite level strategy with predictive models, and spearheading the data side of a major homepage redesign. And my favourite, crafting data stories for some very famous people.

Python (Jupyter notebooks, Sklearn, Pandas etc.), Google Bigquery, Causal Inference

October 2017 - January 2024

Senior Data Scientist

Skyscanner

Hacking on HR data to uncover opportunities where data science hadn't really been used before, building and measuring recommendations seen by tens of millions of people, setting metrics that hundreds of colleagues are working towards and growth. Plus all the usual working in and contributing to a data science community.


Python (Jupyter notebooks, Sklearn, Pandas etc.), Scala, AWS (EC2, Redshift), Spark

June 2016 - October 2017

Data Scientist

Arachnys

Building a really cool, really custom B2B sales risk model and sales predictor using Bayesian probabilities and caffeine. Also, fairly large (multi-TB) NLP processing pipelines from scratch.


Python (Stanford-NLP, NLTK, Sklearn), Java, Storm, Spark, Hadoop, Postgres

January 2015 - May 2016

Data Scientist

The Hut Group

Properly setting up the company's first ever Hadoop cluster, being that AB testing guy (setting up the systems, reporting the results, running the big site-wide redesign test), creating automated pricing algorithms, sending an awful lot of marketing emails (automatically, based on user attributes). As one of the two first data scientists at the company, basically setting up the whole data science discipline at what is now a multi-billion dollar company.


Python, Bash, SQL Server, Hadoop, Hive, Excel

June 2012 - January 2015

Education & Side Hustles

DPhil Social Data Science

University of Oxford

Currently doing a part-time DPhil (what Oxford calls a PhD) at the Oxford Internet Institute, supervised by the wonderful Renaud Lambiotte and Andy Przybylski. My research is all about habits - how to find them, what causes them, how they spread - using huge datasets from places like Spotify, social media and the web.

October 2022 - Present

CEO & Founder

Sharpe Analytics

Raised around £30k to build a prototype machine learning system designed to predict the life-expectancy of terminal cancer patients. Engaged with patient support groups, wrote and ran surveys, worked with a number of external partners and built a nifty lightweight ML system.

January 2018 - Present

MSci Natural Sciences (Hons)

Durham University

Maths, Physics & Chemistry

October 2007 - June 2011

Skills

Programming Languages & Tools
  • Python, Java, Scala, Bash, Javascript
  • SQL & noSQL (Bigquery, Redshift, SQL Server, MySQL, Postgresql, Redis, Cassandra, Hive)
  • Machine learning, AB Testing, Metrics Setting, Data Visualisation, Causal Inference
  • Google Cloud, AWS, Spark, Hadoop

Projects

Supporting Rural England

House hunting in rural Yorkshire, I was struck by the similarity of villages. If a village has a church, it's got 100 people or thereabouts. If it's got a pub, it's probably got at least 300 people. If it's got a school, it's probably got more than around 600 people. Being a data scientist, I wondered whether these 'village scaling laws' were universal, and were in any way indicative of a region's health and wealth. It turns out, with OpenStreetMap, you can get all of the 'facilities' in England (and their location). And with the Government Geographic services, you can get population and boundaries of every parish in England. Put it all together (and tie it in with the housing sales data) and you've got a pretty comprehensive picture of every rural parish in England! I did a talk on this at a conference, and you can see all the details and all the fun things you can do on the GitHub.

Rightmove Scraper

I was looking at buying a house. So, you download all of Rightmove. Then, you build something that lets you search much more specifically. As an example, let's say you want to be within 20 minutes walk of a pub, 20 minutes walk from a primary school and 3 hours public transport from London on a Monday morning - that's no stress and easy enough given Google's APIs. Of course, you can always build the typical predictive model to work out how much you think each house is worth. Or even, use your wife's favourite houses to predict which houses might be interesting to her. But realistically, I'll just build something that allows custom searching.

Betfair Better

I love a get-rich-quick scheme as much as the next man, so here's mine. Download the last 5 year's worth of football matches across every country, and use that to build a model that predicts how many goals a team will score in a game. Then, predict scores for all the games coming up in the next few days. Then, compare your odds to the bookies odds and place bets when there's sufficient edge that you might make some money. Set it up on a cron job and watch the money roll in. Except, it seems to be slightly net negative in terms of the returns it's generated so far. It's not dreadful, but it's certainly not positive. I'd have a warning here about not running this blindly because it'll cost you money, but if you're smart enough to be able to run it, you're probably smart enough to not do so.

Wedding Website

Fairly simple this one - I just built a website for my wedding. Much more fun that using a template and allowed me to do cool stuff with letting people submit song requests and whatnot.