Kernel Functions for Machine Learning Applications

In recent years, Kernel methods have received major attention, particularly due to the increased popularity of the Support Vector Machines. Kernel functions can be used in many applications as they provide a simple bridge from linearity to non-linearity for algorithms which can be expressed in terms of dot products. In this article, we will list a few kernel functions and some of their properties.

Many of these functions have been incorporated in Accord.NET, a framework for creating machine learning, statistics, and computer vision applications.

Contents

  1. Kernel Methods
  2. The Kernel Trick
  3. Kernel Properties
  4. Choosing the Right Kernel
  5. Kernel Functions
  6. Linear Kernel
  7. Polynomial Kernel
  8. Gaussian Kernel
  9. Exponential Kernel
  10. Laplacian Kernel
  11. ANOVA Kernel
  12. Hyperbolic Tangent (Sigmoid) Kernel
  13. Rational Quadratic Kernel
  14. Multiquadric Kernel
  15. Inverse Multiquadric Kernel
  16. Circular Kernel
  17. Spherical Kernel
  18. Wave Kernel
  19. Power Kernel
  20. Log Kernel
  21. Spline Kernel
  22. B-Spline Kernel
  23. Bessel Kernel
  24. Cauchy Kernel
  25. Chi-Square Kernel
  26. Histogram Intersection Kernel
  27. Generalized Histogram Intersection Kernel
  28. Generalized T-Student Kernel
  29. Bayesian Kernel
  30. Wavelet Kernel
  31. Source code
  32. See also
  33. References

Kernel Methods

Kernel methods are a class of algorithms for pattern analysis or recognition, whose best known element is the support vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (such as clusters, rankings, principal components, correlations, classifications) in general types of data (such as sequences, text documents, sets of points, vectors, images, graphs, etc) (Wikipedia, 2010a).

The main characteristic of Kernel Methods, however, is their distinct approach to this problem. Kernel methods map the data into higher dimensional spaces in the hope that in this higher-dimensional space the data could become more easily separated or better structured. There are also no constraints on the form of this mapping, which could even lead to infinite-dimensional spaces. This mapping function, however, hardly needs to be computed because of a tool called the kernel trick.

The Kernel trick

The Kernel trick is a very interesting and powerful tool. It is powerful because it provides a bridge from linearity to non-linearity to any algorithm that can expressed solely on terms of dot products between two vectors. It comes from the fact that, if we first map our input data into a higher-dimensional space, a linear algorithm operating in this space will behave non-linearly in the original input space.

Now, the Kernel trick is really interesting because that mapping does not need to be ever computed. If our algorithm can be expressed only in terms of a inner product between two vectors, all we need is replace this inner product with the inner product from some other suitable space. That is where resides the “trick”: wherever a dot product is used, it is replaced with a Kernel function. The kernel function denotes an inner product in feature space and is usually denoted as:

K(x,y) =

Using the Kernel function, the algorithm can then be carried into a higher-dimension space without explicitly mapping the input points into this space. This is highly desirable, as sometimes our higher-dimensional feature space could even be infinite-dimensional and thus unfeasible to compute.

Kernel Properties

Kernel functions must be continuous, symmetric, and most preferably should have a positive (semi-) definite Gram matrix. Kernels which are said to satisfy the Mercer’s theorem are positive semi-definite, meaning their kernel matrices have only non-negative Eigen values. The use of a positive definite kernel insures that the optimization problem will be convex and solution will be unique.

However, many kernel functions which aren’t strictly positive definite also have been shown to perform very well in practice. An example is the Sigmoid kernel, which, despite its wide use, it is not positive semi-definite for certain values of its parameters. Boughorbel (2005) also experimentally demonstrated that Kernels which are only conditionally positive definite can possibly outperform most classical kernels in some applications.

Kernels also can be classified as anisotropic stationary, isotropic stationary, compactly supported, locally stationary, nonstationary or separable nonstationary. Moreover, kernels can also be labeled scale-invariant or scale-dependant, which is an interesting property as scale-invariant kernels drive the training process invariant to a scaling of the data.

Choosing the Right Kernel

Choosing the most appropriate kernel highly depends on the problem at hand – and fine tuning its parameters can easily become a tedious and cumbersome task. Automatic kernel selection is possible and is discussed in the works by Tom Howley and Michael Madden.

The choice of a Kernel depends on the problem at hand because it depends on what we are trying to model. A polynomial kernel, for example, allows us to model feature conjunctions up to the order of the polynomial. Radial basis functions allows to pick out circles (or hyperspheres) – in constrast with the Linear kernel, which allows only to pick out lines (or hyperplanes).

The motivation behind the choice of a particular kernel can be very intuitive and straightforward depending on what kind of information we are expecting to extract about the data. Please see the final notes on this topic from Introduction to Information Retrieval, by Manning, Raghavan and Schütze for a better explanation on the subject.

Kernel Functions

Below is a list of some kernel functions available from the existing literature. As was the case with previous articles, every LaTeX notation for the formulas below are readily available from their alternate text html tag. I can not guarantee all of them are perfectly correct, thus use them at your own risk. Most of them have links to articles where they have been originally used or proposed.

The Linear kernel is the simplest kernel function. It is given by the inner product

Kernel Functions for Machine Learning Applications

2. Polynomial Kernel

The Polynomial kernel is a non-stationary kernel. Polynomial kernels are well suited for problems where all the training data is normalized.

Kernel Functions for Machine Learning Applications
Adjustable parameters are the slope alpha, the constant term c and the polynomial degree d.

3. Gaussian Kernel

The Gaussian kernel is an example of radial basis function kernel.

Kernel Functions for Machine Learning Applications

Alternatively, it could also be implemented using

Kernel Functions for Machine Learning Applications

The adjustable parameter sigma plays a major role in the performance of the kernel, and should be carefully tuned to the problem at hand. If overestimated, the exponential will behave almost linearly and the higher-dimensional projection will start to lose its non-linear power. In the other hand, if underestimated, the function will lack regularization and the decision boundary will be highly sensitive to noise in training data.

4. Exponential Kernel

The exponential kernel is closely related to the Gaussian kernel, with only the square of the norm left out. It is also a radial basis function kernel.

Kernel Functions for Machine Learning Applications

5. Laplacian Kernel

The Laplace Kernel is completely equivalent to the exponential kernel, except for being less sensitive for changes in the sigma parameter. Being equivalent, it is also a radial basis function kernel.

Kernel Functions for Machine Learning Applications

It is important to note that the observations made about the sigma parameter for the Gaussian kernel also apply to the Exponential and Laplacian kernels.

6. ANOVA Kernel

The ANOVA kernel is also a radial basis function kernel, just as the Gaussian and Laplacian kernels. It is said to perform well in multidimensional regression problems (Hofmann, 2008).

Kernel Functions for Machine Learning Applications

7. Hyperbolic Tangent (Sigmoid) Kernel

The Hyperbolic Tangent Kernel is also known as the Sigmoid Kernel and as the Multilayer Perceptron (MLP) kernel. The Sigmoid Kernel comes from the Neural Networks field, where the bipolar sigmoid function is often used as an activation function for artificial neurons.

Kernel Functions for Machine Learning Applications

It is interesting to note that a SVM model using a sigmoid kernel function is equivalent to a two-layer, perceptron neural network. This kernel was quite popular for support vector machines due to its origin from neural network theory. Also, despite being only conditionally positive definite, it has been found to perform well in practice.

There are two adjustable parameters in the sigmoid kernel, the slope alphaand the intercept constant c. A common value for alpha is 1/N, where N is the data dimension. A more detailed study on sigmoid kernels can be found in the works by Hsuan-Tien and Chih-Jen.

8. Rational Quadratic Kernel

The Rational Quadratic kernel is less computationally intensive than the Gaussian kernel and can be used as an alternative when using the Gaussian becomes too expensive.

Kernel Functions for Machine Learning Applications

9. Multiquadric Kernel

The Multiquadric kernel can be used in the same situations as the Rational Quadratic kernel. As is the case with the Sigmoid kernel, it is also an example of an non-positive definite kernel.

Kernel Functions for Machine Learning Applications

10. Inverse Multiquadric Kernel

The Inverse Multi Quadric kernel. As with the Gaussian kernel, it results in a kernel matrix with full rank (Micchelli, 1986) and thus forms a infinite dimension feature space.

Kernel Functions for Machine Learning Applications

11. Circular Kernel

The circular kernel is used in geostatic applications. It is an example of an isotropic stationary kernel and is positive definite in _R_2.

Kernel Functions for Machine Learning Applications
Kernel Functions for Machine Learning Applications

12. Spherical Kernel

The spherical kernel is similar to the circular kernel, but is positive definite in _R_3.

Kernel Functions for Machine Learning Applications

Kernel Functions for Machine Learning Applications

13. Wave Kernel

The Wave kernel is also symmetric positive semi-definite (Huang, 2008).

Kernel Functions for Machine Learning Applications

14. Power Kernel

The Power kernel is also known as the (unrectified) triangular kernel. It is an example of scale-invariant kernel (Sahbi and Fleuret, 2004) and is also only conditionally positive definite.

Kernel Functions for Machine Learning Applications

15. Log Kernel

The Log kernel seems to be particularly interesting for images, but is only conditionally positive definite.

Kernel Functions for Machine Learning Applications

16. Spline Kernel

The Spline kernel is given as a piece-wise cubic polynomial, as derived in the works by Gunn (1998).

Kernel Functions for Machine Learning Applications

However, what it actually mean is:

Kernel Functions for Machine Learning Applications

With

Kernel Functions for Machine Learning Applications

17. B-Spline (Radial Basis Function) Kernel

The B-Spline kernel is defined on the interval [−1, 1]. It is given by the recursive formula:

Kernel Functions for Machine Learning Applications

Kernel Functions for Machine Learning Applications

In the it is given by:

Kernel Functions for Machine Learning Applications

Alternatively, Bn can be computed using the explicit expression (Fomel, 2000):

Kernel Functions for Machine Learning Applications

Where x+ is defined as the truncated power function:

Kernel Functions for Machine Learning Applications

18. Bessel Kernel

The Bessel kernel is well known in the theory of function spaces of fractional smoothness. It is given by:

Kernel Functions for Machine Learning Applications

where J is the Bessel function of first kind. However, in the Kernlab for R documentation, the Bessel kernel is said to be:

Kernel Functions for Machine Learning Applications

19. Cauchy Kernel

The Cauchy kernel comes from the Cauchy distribution (Basak, 2008). It is a long-tailed kernel and can be used to give long-range influence and sensitivity over the high dimension space.

Kernel Functions for Machine Learning Applications

20. Chi-Square Kernel

The Chi-Square kernel comes from the Chi-Square distribution:

Kernel Functions for Machine Learning Applications

However, as noted by commenter Alexis Mignon, this version of the kernel is only conditionally positive-definite (CPD). A positive-definite version of this kernel is given in (Vedaldi and Zisserman, 2011) as

Kernel Functions for Machine Learning Applications

and is suitable to be used by methods other than support vector machines.

21. Histogram Intersection Kernel

The Histogram Intersection Kernel is also known as the Min Kernel and has been proven useful in image classification.

Kernel Functions for Machine Learning Applications

22. Generalized Histogram Intersection

The Generalized Histogram Intersection kernel is built based on the Histogram Intersection Kernelfor image classification but applies in a much larger variety of contexts (Boughorbel, 2005). It is given by:

Kernel Functions for Machine Learning Applications

23. Generalized T-Student Kernel

The Generalized T-Student Kernel has been proven to be a Mercel Kernel, thus having a positive semi-definite Kernel matrix (Boughorbel, 2004). It is given by:

Kernel Functions for Machine Learning Applications

24. Bayesian Kernel

The Bayesian kernel could be given as:

Kernel Functions for Machine Learning Applications

where

Kernel Functions for Machine Learning Applications

However, it really depends on the problem being modeled. For more information, please see the work by Alashwal, Deris and Othman, in which they used a SVM with Bayesian kernels in the prediction of protein-protein interactions.

25. Wavelet Kernel

The Wavelet kernel (Zhang et al, 2004) comes from Wavelet theory and is given as:

Kernel Functions for Machine Learning Applications

Where a and c are the wavelet dilation and translation coefficients, respectively (the form presented above is a simplification, please see the original paper for details). A translation-invariant version of this kernel can be given as:

Kernel Functions for Machine Learning Applications

Where in both h(x) denotes a mother wavelet function. In the paper by Li Zhang, Weida Zhou, and Licheng Jiao, the authors suggests a possible h(x) as:

Kernel Functions for Machine Learning Applications

Which they also prove as an admissible kernel function.

Source Code

The latest version of the source code for almost all of the kernels listed aboveis available in the Accord.NET Framework. Some are also available in the sequel of this article, Kernel Support Vector Machines for Classification and Regression in C#. They are provided together with a comprehensive and simple implementation of SVMs (Support Vector Machines) in C#. However, for the latest sources, which may contain bug fixes and other enhancements, please download the most recent version available of Accord.NET.

See also

References

http://crsouza.com/2010/03/kernel-functions-for-machine-learning-applications/

Original: https://www.cnblogs.com/chenying99/p/5226126.html
Author: 刺猬的温驯
Title: Kernel Functions for Machine Learning Applications

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/551379/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

  • 并发编程基础(上)

    从我开始写博客到现在,已经写了不少关于并发编程的了,差不多还有一半内容整个并发编程系列就结束了,而今天这篇博客是比较简单的,只是介绍下并发编程的基础知识( = =!其实,对于大神来…

    技术杂谈 2023年7月25日
    052
  • 技能篇:linux服务性能问题排查及jvm调优思路

    只要业务逻辑代码写正确,处理好业务状态在多线程的并发问题,很少会有调优方面的需求。最多就是在性能监控平台发现某些接口的调用耗时偏高,然后再发现某一SQL或第三方接口执行超时之类的。…

    技术杂谈 2023年7月25日
    077
  • OpenNI1.5获取华硕XtionProLive深度图和彩色图并用OpenCV显示

    华硕XtionPro类似Kinect,都是体感摄像机,可捕捉深度图和彩色图。 具体參数见:http://www.asus.com.cn/Multimedia/Xtion_PRO_L…

    技术杂谈 2023年5月31日
    096
  • AT&T汇编语言——工具及程序组成

    1. 开发工具 在汇编语言中,用到的工具主要用下面几个: 汇编器、连接器、调试器、编译器 由于我在这里的是AT&T 汇编语言。所以工具下也都是gnu 下的那些。 1.1 汇…

    技术杂谈 2023年5月30日
    0108
  • 6.3Peterson 方法解决临界区问题

    6.3Peterson 方法解决临界区问题 Peterson解决方案适用于两个进程交错的执行临界区与剩余区的情况。 假设两个进程是 P0 和P1 ,同时为了方便,当使用Pi时,另外…

    技术杂谈 2023年6月21日
    083
  • Flink CDC同步MySQL数据到Iceberg实践

    Flink CDC: 捕获数据库完整的变更日志记录增、删、改等所有数据. Flink在1.11版本开始引入了Flink CDC功能,并且同时支持Table & SQL两种形…

    技术杂谈 2023年7月10日
    083
  • Game Engine Architecture 9

    【 Game Engine Architecture 9】 1、Formatted Output with OutputDebugString() int VDebugPrintF…

    技术杂谈 2023年5月31日
    085
  • 势函数法

    https://www.cnblogs.com/huadongw/p/4106290.html 势函数主要用于确定分类面,其思想来源于物理。 1 势函数法基本思想 假设要划分属于两…

    技术杂谈 2023年5月31日
    0117
  • 剑指offer计划20( 搜索与回溯算法中等)—java

    1.1、题目1 剑指 Offer 07. 重建二叉树 1.2、解法 注释解法。 1.3、代码 class Solution { int[] preorder; HashMap ma…

    技术杂谈 2023年7月25日
    068
  • 小小装饰器大大用处

    事情是这样,我们正在编写接口自动化用例。因为基本上都是复杂的场景测试。 例如测试支付业务的过程: 也就是说,如你想测试支付业务,大概必须要调用前面三个接口。那我们就需要把前面三个接…

    技术杂谈 2023年5月31日
    093
  • docker

    一.Docker入门 1. Docker 为什么会出现 Docker是基于Go语言开发的!开源项目 4.1. 虚拟化技术的缺点 资源占用十分多 冗余步骤多 启动很慢 2.2. 容器…

    技术杂谈 2023年7月10日
    064
  • 2.搭建一个spring-boot项目(持续更新)

    很多同学在搭建一个springboot项目的时候会遇到很多问题,闲来无事我就自己搭建了一个基础的框架,大家可以自己看看。 框架主要包括: 初始化配置 数据库配置 Mysql myb…

    技术杂谈 2023年7月24日
    057
  • js中常用的语法

    一、注释 二、输出 输出有三种: 三、变量 概述:变量是在内存中生成一个空间用来存储数据。 1.声明变量 var age; 2.同时声明多个变量,使用逗号隔开。 Original:…

    技术杂谈 2023年5月31日
    0108
  • node-canvas遇到NODE_MODULE_VERSION不一致的问题

    在使用node-canvas时,由于更换了node版本遇到 Error: The module ‘\?\D:\nodejs\node_modules\canvas\bu…

    技术杂谈 2023年5月31日
    093
  • 校验用户名是否存在案例

    案例 校验用户名是否存在 服务器响应的数据,在客户端使用时,要想当作json数据格式使用 $.get(type):将最后一个参数tupe指定为”json” …

    技术杂谈 2023年6月21日
    099
  • 关于专才与通才的思辨

    为啥突然想这个问题? 这阵子跟技术人交流,突然引出了一个问题 对个人来说,究竟是专才好,还是通才好。当然就技术开发这个领域来讲。 首先列一下名词解释 专才:就是指专注在某个领域/语…

    技术杂谈 2023年6月21日
    081
亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球