Machine learning is a branch of artificial intelligence that relies on using data sets to train algorithms. By analyzing the solutions to a large number of similar problems, information systems begin to identify patterns and offer solutions to such problems.
How machine learning works
Machine learning requires several steps: preparing a data set, training algorithms on it (the actual learning step), evaluating their work, and correcting them.
- Preparing the data set involves collecting data from sources relevant to the task, cleaning the data, and creating a sample. A sample may be needed because the data set is too large and a small subset of the data is sufficient to solve the problem. In that case, one sample is used to train the algorithm and another acts as a control sample, used to evaluate the result.
- Training is the stage where a mathematical function is selected to solve the problem. The process differs depending on the machine learning model: supervised, unsupervised, or so-called deep learning.
- Evaluation occurs once the training is completed, with the correctness and efficiency of the algorithms being checked on a data sample selected at the preparation stage.
- Correction is the refinement of the algorithms to make them more accurate, efficient, and compact.
After the last phase the whole process is repeated if necessary.
Types of machine learning
Three overall types of machine learning algorithms exist today:
- Supervised machine learning. In this version, the algorithm knows the correct answer in advance. It looks not for a solution to the problem but rather for correlations within the data that indicate the solution. It can then find a solution in similar cases based on the correlations when it is exposed to a new data sample. An example is the analysis of bank transactions that have already been identified as safe (or as suspicious). Based on that analysis, the algorithm learns to identify common features of safe transactions and what distinguishes them from suspicious ones. Using that data, it will be able to identify suspicious transactions in the future.
- Unsupervised machine learning. For this type of machine learning, the algorithm operates without knowing the answers in advance. The algorithm’s task is to find connections between individual data points and to build patterns and relationships based on them. This kind of machine learning is widely used in user-recommendation systems (for example, people who like movie X will also like movie Y).
- Deep learning. This method can be either supervised or unsupervised. In either case, it requires an analysis of big data. Deep learning uses neural networks and requires a large amount of computing power. One application of deep learning is image recognition.
How machine learning is used in information security
Machine learning methods of various kinds are used throughout information security — to filter spam, analyze traffic, and detect fraud or malware. Machine learning can also help reduce the number of false positives in security systems, improve the interpretability of results, and increase software’s resilience against the actions of a potential attacker.