Best things of 2015

Here are a few things I appreciated in 2015:

arXiv papers:

  • Batch Normalization- a simple conceptual idea beautifully executed. We've known for years that neural networks respond better to input data that follow certain distributions. Since the output of each layer is the input to the next, it's a good idea to ensure that each layer is getting "well-behaved" data. The procedure quickly made it into deep learning standard practice. (I'm putting this article in 2015 since that's when I first read it.)

  • Deep Residual Networks- 2015 looks like the beginning of the "truly deep" deep learning era with this ImageNet-winning paper from Microsoft Research Asia that uses a convolutional neural network with 153(!) layers. (Highway networks are another good way to train very deep networks, but they didn't have the same dramatic success). Most layers in the Deep Residual Network are relatively compact, which makes training more feasible, and an identity mapping is used to keep the signal strong through all the layers. The results are impressive.

  • Text Understanding from Scratch: Using words as the building blocks of NLP for so long may partially be a result of much work being done in English (English is not a morphologically rich language, so a lot of our meaning is communicated using noun phrases and verb phrases rather than declined or inflected nouns and verbs). But a convolutional neural network trained at a character level proves very effective at solving NLP tasks. It's also fun to see Yann LeCun, who is primarily known for his deep learning work on computer vision, involved in fun work on text.

  • State of the Art Control of Atari Games Using Shallow Reinforcement Learning: This paper contains a very careful analysis of the evaluation methods used in the deep Q learning papers and compares those results to the proposed shallow strategy wherever possible. It points out some weaknesses in the previous evaluation methods, but it does so in a way that is refreshingly respectful. The paper also succeeds at its goal of demonstrating that a non-deep-learning approach can be similarly effective at many of the Atari games.

Favorite Data Visualization Method:

  • t-Distributed Stochastic Neighbor Embedding. t-sne

It's a great unsupervised visualization technique and this implementation from Laurens van der Maaten, the first author of the t-sne paper doesn't require a precalculated similarity matrix, so it can scale to millions of points.

General Data Science Blog Posts:

More Technical Blog Posts:

  • Peter Norvig's post on Swype-style keyboards. The most impressive thing about Norvig's Jupyter notebooks are that he makes you feel like you could have solved the problem if you'd just thought a bit more carefully about it. It's an unusual skill.

  • Deep Dream used learned transformations from CNN layers to highlight what the network thought it was looking at. The post was one of the first instances of deep learning as a cultural phenomenon.

  • In the great data science language wars, Rob Story is COBOL's lone champion.

  • Bot detection is an important part of my day job, so I really enjoyed Erin Shellman's piece on identifying bots on Twitter

  • Andrej Karpathy's The Unreasonable Effectiveness of Recurrent Neural Networks. It's a terrific advertisement for RNNs.

  • The past few months have seen an explosion of generative models for basically everything: Donald Trump speeches, anime characters, paintings, etc. Samim's post has some great videos on how generative models and multi-modal embeddings can impact the artistic process. He's an interesting person to follow on twitter, too.


  • Machine Learning: A Probabilistic Perspective. Sidesteps the frequentist vs Bayesian battles by focusing on the applications of probability. This book borrows liberally from other books and educational materials to present the concepts in a structured and cohesive way.

  • Data Science from Scratch by Joel Grus (an increasingly famous author) is a nice walkthrough of a lot of the basic concepts in data science. It's written in Python (2.7) with very few dependencies, so it feels very immediate. I would especially recommend it to students studying statistics or information, as it ties together the concepts with actual code.