I do data science. For fun, for good and/or for money. Ideally all three.
Python, SQL (BigQuery, Spark), LLMs
Python, SQL, LLMs, RAG
Python (Jupyter notebooks, Sklearn, Pandas etc.), Google Bigquery, Causal Inference
Hacking on HR data to uncover opportunities where data science hadn't really been used before, building and measuring recommendations seen by tens of millions of people, setting metrics that hundreds of colleagues are working towards and growth. Plus all the usual working in and contributing to a data science community.
Python (Jupyter notebooks, Sklearn, Pandas etc.), Scala, AWS (EC2, Redshift), Spark
Building a really cool, really custom B2B sales risk model and sales predictor using Bayesian probabilities and caffeine. Also, fairly large (multi-TB) NLP processing pipelines from scratch.
Python (Stanford-NLP, NLTK, Sklearn), Java, Storm, Spark, Hadoop, Postgres
Properly setting up the company's first ever Hadoop cluster, being that AB testing guy (setting up the systems, reporting the results, running the big site-wide redesign test), creating automated pricing algorithms, sending an awful lot of marketing emails (automatically, based on user attributes). As one of the two first data scientists at the company, basically setting up the whole data science discipline at what is now a multi-billion dollar company.
Python, Bash, SQL Server, Hadoop, Hive, Excel
Currently doing a part-time DPhil (what Oxford calls a PhD) at the Oxford Internet Institute, supervised by the wonderful Renaud Lambiotte and Andy Przybylski. My research is all about habits - how to find them, what causes them, how they spread - using huge datasets from places like Spotify, social media and the web.
Raised around £30k to build a prototype machine learning system designed to predict the life-expectancy of terminal cancer patients. Engaged with patient support groups, wrote and ran surveys, worked with a number of external partners and built a nifty lightweight ML system.
Maths, Physics & Chemistry
House hunting in rural Yorkshire, I was struck by the similarity of villages. If a village has a church, it's got 100 people or thereabouts. If it's got a pub, it's probably got at least 300 people. If it's got a school, it's probably got more than around 600 people. Being a data scientist, I wondered whether these 'village scaling laws' were universal, and were in any way indicative of a region's health and wealth. It turns out, with OpenStreetMap, you can get all of the 'facilities' in England (and their location). And with the Government Geographic services, you can get population and boundaries of every parish in England. Put it all together (and tie it in with the housing sales data) and you've got a pretty comprehensive picture of every rural parish in England! I did a talk on this at a conference, and you can see all the details and all the fun things you can do on the GitHub.
I was looking at buying a house. So, you download all of Rightmove. Then, you build something that lets you search much more specifically. As an example, let's say you want to be within 20 minutes walk of a pub, 20 minutes walk from a primary school and 3 hours public transport from London on a Monday morning - that's no stress and easy enough given Google's APIs. Of course, you can always build the typical predictive model to work out how much you think each house is worth. Or even, use your wife's favourite houses to predict which houses might be interesting to her. But realistically, I'll just build something that allows custom searching.
I love a get-rich-quick scheme as much as the next man, so here's mine. Download the last 5 year's worth of football matches across every country, and use that to build a model that predicts how many goals a team will score in a game. Then, predict scores for all the games coming up in the next few days. Then, compare your odds to the bookies odds and place bets when there's sufficient edge that you might make some money. Set it up on a cron job and watch the money roll in. Except, it seems to be slightly net negative in terms of the returns it's generated so far. It's not dreadful, but it's certainly not positive. I'd have a warning here about not running this blindly because it'll cost you money, but if you're smart enough to be able to run it, you're probably smart enough to not do so.
Fairly simple this one - I just built a website for my wedding. Much more fun that using a template and allowed me to do cool stuff with letting people submit song requests and whatnot.