At the core of many machine learning models, lies a very simple mathematical problem: fit a given set of data points into an appropriate function, then make sure this function performs well on new data.
Structure of the machine learning part of an AI problem
Our goal in this blog is to internalize the following structure of the machine learning part of an AI problem:
Identify the problem
The problem depends on the specific use case: classify images, classify documents, predict house prices, detect fraud or anomalies, recommend the next product, predict the likelihood of a criminal reoffending, predict the internal structure of a building given external images, convert speech to text, generate audio, generate images, generate video, etc.
Acquire the appropriate data
This is about training our models to do the right thing. We say that our models learn from data. We have make sure data is clean, complete and if necessary, depending on the specific model we are implementing, transformed (normalized, standardize, some features aggregated, etc.). This step is usually way more time-consuming than implementing and training the machine learning models.
Create a hypothesis function
We use the terms hypothesis function, learning function, prediction function, prediction function, training function, and model interchangeably. Our main assumption is that this input/output mathematical function explains the observed data, and it can be used later to make predictions on new data.
I’ve always been amazed of how (supervised) machine learning algorithms are constituted by simple ideas.
fit a given set of data points into an appropriate function, then make sure this function performs well on new data.
Training Function (Model)
Loss Function
Optimization
Classical Machine Learning Algorithms:
Bias / Variance Trade-Off
The inability for a machine learning method to capture the true relationship of data is called bias.
In machine learning lingo, the difference in fits between data sets is called variance.
In Machine Learning, the ideal algorithm has low variance and therefore can accurately model the true relationship. It also has low variability, by producing consistent predictions across different data sets.