# 从原理到应用：简述Logistics回归算法

[En]

Therefore, everyone who comes into contact with machine learning should be familiar with its principle. The basic principle of Logistic regression can also be used in neural networks. In this article, you will learn what Logistic regression is, how it works, what are its advantages and disadvantages, and so on. #### 目录

• 什么是Logistic回归？
[En]

* what is Logistic regression?

• 它是如何运作的？

[En]

* how does it work?

• Logistic 回归 vs 线性回归

• 利弊得失
[En]

• 何时适用？
[En]

* when is it applicable

• 多类别任务(OVA、OVO)
[En]

• 其他分类算法
[En]

* other classification algorithms

• 摘要
[En]

* Summary

### 什么是 Logistic 回归？

[En]

Like many other machine learning algorithms, logical regression is borrowed from statistics. Although there is the word regression in the name, it is not a regression algorithm that needs to predict continuous results.

[En]

On the contrary, Logistic regression is the preferred method for binary classification tasks. It outputs a discrete binary result between 0 and 1. To put it simply, the result is either 1 or 0.

[En]

The cancer detection algorithm can be seen as a simple example of the Logistic regression problem, which enters pathological pictures and should identify whether the patient has cancer (1) or no cancer (0).

### 它是如何工作的?

Logistic 回归通过使用其固有的 logistic 函数估计概率，来衡量因变量（我们想要预测的标签）与一个或多个自变量（特征）之间的关系。

[En]

Then these probabilities must be binarized before they can really be predicted. This is the task of the logistic function, also known as the Sigmoid function. The Sigmoid function is an S-shaped curve that maps any real value to a value between 0 and 1, but not to 0 or 1. Then use a threshold classifier to convert the value between 0 and 1 to 0 or 1.

[En]

The following picture illustrates all the steps required for logistic regression to produce a prediction. [En]

The following is a graphical representation of the logistic function (sigmoid function): [En]

We want to maximize the probability of random data points being correctly classified, which is the maximum likelihood estimation. Maximum likelihood estimation is a general method to estimate parameters in statistical models.

[En]

You can use different methods (such as optimization algorithms) to maximize probability. Newton method is also one of them, which can be used to find the maximum (or minimum) values of many different functions, including likelihood functions. The gradient descent method can also be used instead of the Newton method.

### Logistic 回归 vs 线性回归

[En]

You may wonder what is the difference between logistic regression and linear regression. A discrete result is obtained by logical regression, but a continuous result is obtained by linear regression. The model for predicting house prices is a good example of returning continuous results. The value varies according to parameters such as the size or location of the house. The discrete result is always one thing (you have cancer) or another (you don’t have cancer).

### 优缺点

Logistic 回归是一种被人们广泛使用的算法，因为它非常高效，不需要太大的计算量，又通俗易懂，不需要缩放输入特征，不需要任何调整，且很容易调整，并且输出校准好的预测概率。

[En]

Like linear regression, logistic regression does work better when you remove attributes that have nothing to do with the output variables and attributes with high similarity. So feature processing plays an important role in the performance of Logistic and linear regression.

Logistic 回归的另一个优点是它非常容易实现，且训练起来很高效。在研究中，我通常以 Logistic 回归模型作为基准，再尝试使用更复杂的算法。

[En]

Because of its simplicity and fast implementation, Logistic regression is also a good benchmark that you can use to measure the performance of other more complex algorithms.

[En]

One of its disadvantages is that we can not use logistic regression to solve nonlinear problems, because its decision boundary is linear. Let’s look at the following example, where there are two instances of each of the two classes. [En]

Obviously, it is impossible for us to draw a straight line to distinguish the two classes without making an error. Using a simple decision tree is a better choice. Logistic 回归并非最强大的算法之一，它可以很容易地被更为复杂的算法所超越，另一个缺点是它高度依赖正确的数据表示。

[En]

This means that logical regression will not be a useful tool until you have identified all the important independent variables. Because the results are discrete, Logistic regression can only predict the classification results. It is also known for its ease of fitting.

### 何时适用

[En]

As I have already mentioned, Logistic regression divides your input into two “regions” through linear boundaries, one for each category. Therefore, your data should be linearly divisible, as shown in the following figure: [En]

In other words: when the Y variable has only two values (for example, when you are faced with a classification problem), you should consider using logical regression. Note that you can also use Logistic regression for multi-category classification, which will be discussed in the next section.

### 多分类任务

[En]

Now there are many multi-classification algorithms, such as random forest classifier or naive Bayesian classifier. Although some algorithms do not seem to be suitable for multi-classification, such as Logistic regression, they can also be used for multi-classification tasks through some techniques.

[En]

Let’s start with the MNIST dataset of handwritten digital images from 0 to 9 and discuss these most common “techniques”. This is a multi-classification task, and our algorithm should tell us which number the image corresponds to.

#### 1）一对多（OVA）

[En]

According to this strategy, you can train 10 two classifiers, one for each number. This means training a classifier to detect 0, one to detect 1, one to detect 2, and so on. When you want to classify an image, just look at which classifier has the highest prediction score

#### 2）一对一（OVO）

[En]

According to this strategy, a two classifier is trained for each pair of numbers. This means training a classifier that can distinguish 0s and 1s, a classifier that can distinguish 0s and 2s, a classifier that can distinguish 1s and 2s, and so on. If there are N categories, N × N (NMUI 1) / 2 classifiers need to be trained, and for MNIST data sets, 45 classifiers are needed.

[En]

When you want to classify an image, run each of these 45 classifiers and select the classifier with the best performance. This strategy has a big advantage over other strategies, which is that you only need to train on the two types of training sets it wants to classify.

[En]

Algorithms such as support vector machine classifiers are not scalable on large data sets, so the OvO strategy of binary classification algorithms such as Logistic regression is better in this case, because training a large number of classifiers on small data sets is faster than training one classifier on big data sets.

[En]

In most algorithms, sklearn can identify when to use two classifiers for multi-classification tasks, and automatically use OvA strategy. Special case: when you try to use the support vector machine classifier, it will automatically run the OvO strategy.

### 其它分类算法

[En]

Other common classification algorithms include naive Bayesian, decision tree, random forest, support vector machine, k-nearest neighbor and so on. We will discuss them in other articles, but don’t be intimidated by the number of machine learning algorithms. Please note that it is best to really understand four or five algorithms and focus on feature processing, which is also the subject of future work.

### 总结

[En]

In this article, you have learned what Logistic regression is and how it works. You now have a deep understanding of its pros and cons and know when to use it.

[En]

In addition, you explored the use of Logistic regression and sklearn for multi-classification, and why the former is a better benchmark algorithm than other machine learning algorithms.

Original: https://www.jiqizhixin.com/articles/2018-05-13-3
Author: 李泽南
Title: 从原理到应用：简述Logistics回归算法

(0)