Artificial intelligence VS machine learning
AI is a field of computer science that aims to imitate how the human brain functions (reasoning, decision-making, problem-solving, etc)
ML is a branch of AI that recognises patterns and extrapolates to give outputs
Supervised learning
Advantages:
- Accurate and reliable
- performance is easy to measure
Disadvantages:
- training data can be expensive to get
- loses effectiveness with noisy, complex or small sets of data.
Unsupervised
Advantages:
- doesn’t require labelled data
- can simplify complex, high-dimensional data
Disadvantages:
- computationally intensive
- harder to evaluate results with no ‘right’ answer
- liable to false positives
Semi-supervised
Some labelled, mostly unlabelled data.
Advantages:
- suitable for when labelled data is expensive
Disadvantages:
- small amount of labelled data might not be representative of overall trends and cause incorrect results.
Reinforcement learning
Involves an agent performing actions in an environment, receiving rewards or penalties based on those actions, and adjusting its behaviour accordingly. Over time, the agent’s behaviour is optimised to produce positive outcomes.
Advantages:
- powerful for decision-making tasks (e.g. in games) especially for when outcomes change and are uncertain
- Learns strategies through experience
Disadvantages:
- Complex (esp. troubleshooting and debugging)
- Requires a lot of training time and computational resources
- Reliant on effectiveness of feedback
Common applications of ML
Linear regression
Linear regression is used to predict unknown values based on the features of the training data (must form a linear relationship).
Regression: fitting a line to datapoints
Gradient descent
An optimisation algorithm that iteratively calculates the line by taking the gradient of the current position and adjusting it according to the learning rate (learning rate / step eventually becomes insignificant). This finds the minimum of a curve representative of errors.
Polynomial regression (+multivariate polynomial regression)
Same as linear regression but with more complex patterns that fit polynomial curves.
When the value of y is controlled by multiple input values in a polynomial relationship, this is called multivariate polynomial regression.
Logistic regression
CLASSIFICATION algorithm: Predicts the chance of something happening.
Given input, it can predict the output value, which goes through a mathematical process called the sigmoid function, which ‘squashes; this predicted output into a value 0-1.
K-nearest neighbour
Classification or regression tasks. It is based on the idea that the observations closest to a given data point are the most ‘similar’, and we can therefore classify new points based on the values of the closest existing points. K = the no. of nearby data points to consider.
Loss functions (e.g. MSE)
How ML algorithms measure error (they quantify difference between the model’s predictions and the actual values)
Common one is mean squared error which calculates the average of the squared differences between predicted and actual values (beneficial as ensures errors are positive AND penalises large errors more)
Not used for unsupervised learning.