Machine learning basics
There are some incredible interactive visual resources on the web that explain how Machine Learning works more in depth. A great place to start is R2D3’s visual introduction to ML.
The ideas that make up modern Machine Learning have been around since the 1950s. At its core, it’s very simple: data has patterns, and you can use those patterns to predict what’s going to happen next. In fact, you already do this every day.
Imagine you’ve got a friend who is constantly late. You’ve got a party coming up, so your expectation is that he’s going to, shocker, be late again. You don’t know that for sure, but given that he has always been late, you figure there’s a good chance he will be this time. And if he shows up on time, you’re surprised, and you keep that new information in the back of your head; maybe next time you’ll adjust your expectations on the chance of him being late.
Your brain has millions of these models working all the time, but their actual internal mechanics are beyond our scientific understanding for now. So in the real world, we need to settle for algorithms – some crude, and some highly complex – that learn from data and extrapolate what’s going to happen in unknown situations. Models are usually trained to work for specific domains (predicting stock prices, or generating an image) but increasingly they’re becoming more general purpose.
Logistically, a Machine Learning model is sort of like an API: it takes in some inputs, and you teach it to give you some outputs. Here’s how it works:
- Curate some data – gather data on the problem you’re trying to predict and get it ready for a model to look over.
- Train the model – choose an algorithm (or two), and try to fit a model to the data.
- Predict – you show new data to the model, and it spits out what it thinks
You design the model’s interface – what kind of data it takes, and what kind of data it returns – to match whatever your task is.
Central to the idea of Machine Learning is figuring out how to frame something as a Machine Learning problem. For old school examples like predicting stock prices (or if your friend is going to be late), it’s straightforward. The data is historical stock prices, and the output is the stock price at a future date.
But what about something less obvious, like image classification? Imagine you’re a corn (and soybean) farmer in Iowa, and you want to develop a model that allows you to detect whether images of your crop have harmful pests in them or not. How would you frame that as a prediction problem?
Images, as represented on computers, are a bunch of pixels. Each pixel has a color value, and over several thousand of them in specific positions, you’ve got an image. This is how a computer sees a picture.