I do data science. For fun, for good and/or for money. Ideally all three.
Still podcasts. A little more AB testing, a fair amount of metric setting, smatterings of machine learning, and then loads and loads of data visualisations and storytelling. Mentoring other data scientists, coordinating the efforts of 10+ data scientists and user researchers, influencing senior people and getting them to rely on insights more. Basically just being useful with data.
Python (Jupyter notebooks, Sklearn, Pandas etc.), Google Bigquery
Hacking on HR data to uncover opportunities where data science hadn't really been used before, building and measuring recommendations seen my tens of millions of people, setting metrics that hundreds of colleagues are working towards and growth. Plus all the usual working in and contributing to a data science community.
Python (Jupyter notebooks, Sklearn, Pandas etc.), Scala, AWS (EC2, Redshift), Spark
Building a really cool, really custom B2B sales risk model and sales predictor using Bayesian probabilities and caffeine. Also, fairly large (multi-TB) NLP processing pipelines from scratch.
Python (Stanford-NLP, NLTK, Sklearn), Java, Storm, Spark, Hadoop, Postgres
Properly setting up the company's first ever Hadoop cluster, being that AB testing guy (setting up the systems, reporting the results, running the big site-wide redesign test), creating automated pricing algorithms, sending an awful lot of marketing emails (automatically, based on user attributes). As one of the two first data scientists at the company, basically setting up the whole data science discipline at what is now a multi-billion dollar company.
Python, Bash, SQL Server, Hadoop, Hive, Excel
Raised around £30k to build a prototype machine learning system designed to predict the life-expectancy of terminal cancer patients. Engaged with patient support groups, wrote and ran surveys, worked with a number of external partners and built a nifty lightweight ML system.
Wrote an online course/test for professionals wanting to gain a certification in Cassandra (noSQL database).
Maths, Physics & Chemistry
I'm looking at buying a house. So, you download all of Rightmove. Then, you build something that lets you search much more specifically. As an example, let's say you want to be within 20 minutes walk of a pub, 20 minutes walk from a primary school and 3 hours public transport from London on a Monday morning - that's no stress and easy enough given Google's APIs. Of course, you can always build the typical predictive model to work out how much you think each house is worth. Or even, use your wife's favourite houses to predict which houses might be interesting to her. But realistically, I'll just build something that allows custom searching.
I love a get-rich-quick scheme as much as the next man, so here's mine. Download the last 5 year's worth of football matches across every country, and use that to build a model that predicts how many goals a team will score in a game. Then, predict scores for all the games coming up in the next few days. Then, compare your odds to the bookies odds and place bets when there's sufficient edge that you might make some money. Set it up on a cron job and watch the money roll in. Except, it seems to be slightly net negative in terms of the returns it's generated so far. It's not dreadful, but it's certainly not positive. I'd have a warning here about not running this blindly because it'll cost you money, but if you're smart enough to be able to run it, you're probably smart enough to not do so.
Fairly simple this one - I just built a website for my wedding. Much more fun that using a template and allowed me to do cool stuff with letting people submit song requests and whatnot.
What better way to bring in an election that analysing the two main political parties manifestos and seeing how they compare to each other, and to manifestos from previous elections? How similar are David Cameron and Tony Blair? Is Corbyn really bringing back the 1970s? Natural language processing to the rescue! Download all the manifestos, turn them into text, and then play around using NLTK (and some custom stuff) to your heart's content.
I lived in Edinburgh and really liked the Fringe festival and so thought it'd be interesting to work out where the geographic centre of the festival was. Basically, scraping the Fringe website to get all of the show locations, times and categories and then finding the weighted average and sticking them all on a filterable map (so you can pick your category and choose your accommodation accordingly)
I do love Harry Potter, and so it seemed like it'd be fun to play around with Harry Potter and data science to create something. K-means clustering, plus all of the Harry Potter books leads to a bunch of cool graphs and some 'topic clusters' that shows the thematically similar Harry Potter chapters across books. Honestly, it worked pretty well and now I'm wandering why I didn't do the same thing for a bunch of other books.