Kernel Functions for Machine Learning Applications

2023年5月31日下午2:40 • 技术杂谈 • 阅读 137

In recent years, Kernel methods have received major attention, particularly due to the increased popularity of the Support Vector Machines. Kernel functions can be used in many applications as they provide a simple bridge from linearity to non-linearity for algorithms which can be expressed in terms of dot products. In this article, we will list a few kernel functions and some of their properties.

Check the source code for all kernel functions here.

Many of these functions have been incorporated in Accord.NET, a framework for creating machine learning, statistics, and computer vision applications.

Kernel Methods

Kernel methods are a class of algorithms for pattern analysis or recognition, whose best known element is the support vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (such as clusters, rankings, principal components, correlations, classifications) in general types of data (such as sequences, text documents, sets of points, vectors, images, graphs, etc) (Wikipedia, 2010a).

The main characteristic of Kernel Methods, however, is their distinct approach to this problem. Kernel methods map the data into higher dimensional spaces in the hope that in this higher-dimensional space the data could become more easily separated or better structured. There are also no constraints on the form of this mapping, which could even lead to infinite-dimensional spaces. This mapping function, however, hardly needs to be computed because of a tool called the kernel trick.

The Kernel trick

The Kernel trick is a very interesting and powerful tool. It is powerful because it provides a bridge from linearity to non-linearity to any algorithm that can expressed solely on terms of dot products between two vectors. It comes from the fact that, if we first map our input data into a higher-dimensional space, a linear algorithm operating in this space will behave non-linearly in the original input space.

Now, the Kernel trick is really interesting because that mapping does not need to be ever computed. If our algorithm can be expressed only in terms of a inner product between two vectors, all we need is replace this inner product with the inner product from some other suitable space. That is where resides the “trick”: wherever a dot product is used, it is replaced with a Kernel function. The kernel function denotes an inner product in feature space and is usually denoted as:

K(x,y) =

Using the Kernel function, the algorithm can then be carried into a higher-dimension space without explicitly mapping the input points into this space. This is highly desirable, as sometimes our higher-dimensional feature space could even be infinite-dimensional and thus unfeasible to compute.

Kernel Properties

Kernel functions must be continuous, symmetric, and most preferably should have a positive (semi-) definite Gram matrix. Kernels which are said to satisfy the Mercer’s theorem are positive semi-definite, meaning their kernel matrices have only non-negative Eigen values. The use of a positive definite kernel insures that the optimization problem will be convex and solution will be unique.

However, many kernel functions which aren’t strictly positive definite also have been shown to perform very well in practice. An example is the Sigmoid kernel, which, despite its wide use, it is not positive semi-definite for certain values of its parameters. Boughorbel (2005) also experimentally demonstrated that Kernels which are only conditionally positive definite can possibly outperform most classical kernels in some applications.

Kernels also can be classified as anisotropic stationary, isotropic stationary, compactly supported, locally stationary, nonstationary or separable nonstationary. Moreover, kernels can also be labeled scale-invariant or scale-dependant, which is an interesting property as scale-invariant kernels drive the training process invariant to a scaling of the data.

Choosing the Right Kernel

Choosing the most appropriate kernel highly depends on the problem at hand – and fine tuning its parameters can easily become a tedious and cumbersome task. Automatic kernel selection is possible and is discussed in the works by Tom Howley and Michael Madden.

The choice of a Kernel depends on the problem at hand because it depends on what we are trying to model. A polynomial kernel, for example, allows us to model feature conjunctions up to the order of the polynomial. Radial basis functions allows to pick out circles (or hyperspheres) – in constrast with the Linear kernel, which allows only to pick out lines (or hyperplanes).

The motivation behind the choice of a particular kernel can be very intuitive and straightforward depending on what kind of information we are expecting to extract about the data. Please see the final notes on this topic from Introduction to Information Retrieval, by Manning, Raghavan and Schütze for a better explanation on the subject.

Kernel Functions

Below is a list of some kernel functions available from the existing literature. As was the case with previous articles, every LaTeX notation for the formulas below are readily available from their alternate text html tag. I can not guarantee all of them are perfectly correct, thus use them at your own risk. Most of them have links to articles where they have been originally used or proposed.

Linear kernel documentation –linear kernel source code –how to create SVMs in .NET with Accord.NET

The Linear kernel is the simplest kernel function. It is given by the inner product

Kernel Functions for Machine Learning Applications

2. Polynomial Kernel

The Polynomial kernel is a non-stationary kernel. Polynomial kernels are well suited for problems where all the training data is normalized.

Adjustable parameters are the slope alpha, the constant term c and the polynomial degree d.

3. Gaussian Kernel

The Gaussian kernel is an example of radial basis function kernel.

Alternatively, it could also be implemented using

The adjustable parameter sigma plays a major role in the performance of the kernel, and should be carefully tuned to the problem at hand. If overestimated, the exponential will behave almost linearly and the higher-dimensional projection will start to lose its non-linear power. In the other hand, if underestimated, the function will lack regularization and the decision boundary will be highly sensitive to noise in training data.

4. Exponential Kernel

The exponential kernel is closely related to the Gaussian kernel, with only the square of the norm left out. It is also a radial basis function kernel.

5. Laplacian Kernel

The Laplace Kernel is completely equivalent to the exponential kernel, except for being less sensitive for changes in the sigma parameter. Being equivalent, it is also a radial basis function kernel.

It is important to note that the observations made about the sigma parameter for the Gaussian kernel also apply to the Exponential and Laplacian kernels.

6. ANOVA Kernel

The ANOVA kernel is also a radial basis function kernel, just as the Gaussian and Laplacian kernels. It is said to perform well in multidimensional regression problems (Hofmann, 2008).

7. Hyperbolic Tangent (Sigmoid) Kernel

The Hyperbolic Tangent Kernel is also known as the Sigmoid Kernel and as the Multilayer Perceptron (MLP) kernel. The Sigmoid Kernel comes from the Neural Networks field, where the bipolar sigmoid function is often used as an activation function for artificial neurons.

It is interesting to note that a SVM model using a sigmoid kernel function is equivalent to a two-layer, perceptron neural network. This kernel was quite popular for support vector machines due to its origin from neural network theory. Also, despite being only conditionally positive definite, it has been found to perform well in practice.

There are two adjustable parameters in the sigmoid kernel, the slope alphaand the intercept constant c. A common value for alpha is 1/N, where N is the data dimension. A more detailed study on sigmoid kernels can be found in the works by Hsuan-Tien and Chih-Jen.

8. Rational Quadratic Kernel

The Rational Quadratic kernel is less computationally intensive than the Gaussian kernel and can be used as an alternative when using the Gaussian becomes too expensive.

9. Multiquadric Kernel

The Multiquadric kernel can be used in the same situations as the Rational Quadratic kernel. As is the case with the Sigmoid kernel, it is also an example of an non-positive definite kernel.

10. Inverse Multiquadric Kernel

The Inverse Multi Quadric kernel. As with the Gaussian kernel, it results in a kernel matrix with full rank (Micchelli, 1986) and thus forms a infinite dimension feature space.

11. Circular Kernel

The circular kernel is used in geostatic applications. It is an example of an isotropic stationary kernel and is positive definite in _R_2.

12. Spherical Kernel

The spherical kernel is similar to the circular kernel, but is positive definite in _R_3.

13. Wave Kernel

The Wave kernel is also symmetric positive semi-definite (Huang, 2008).

14. Power Kernel

The Power kernel is also known as the (unrectified) triangular kernel. It is an example of scale-invariant kernel (Sahbi and Fleuret, 2004) and is also only conditionally positive definite.

15. Log Kernel

The Log kernel seems to be particularly interesting for images, but is only conditionally positive definite.

16. Spline Kernel

The Spline kernel is given as a piece-wise cubic polynomial, as derived in the works by Gunn (1998).

However, what it actually mean is:

With

17. B-Spline (Radial Basis Function) Kernel

The B-Spline kernel is defined on the interval [−1, 1]. It is given by the recursive formula:

In the it is given by:

Alternatively, Bn can be computed using the explicit expression (Fomel, 2000):

Where x+ is defined as the truncated power function:

18. Bessel Kernel

The Bessel kernel is well known in the theory of function spaces of fractional smoothness. It is given by:

where J is the Bessel function of first kind. However, in the Kernlab for R documentation, the Bessel kernel is said to be:

19. Cauchy Kernel

The Cauchy kernel comes from the Cauchy distribution (Basak, 2008). It is a long-tailed kernel and can be used to give long-range influence and sensitivity over the high dimension space.

20. Chi-Square Kernel

The Chi-Square kernel comes from the Chi-Square distribution:

However, as noted by commenter Alexis Mignon, this version of the kernel is only conditionally positive-definite (CPD). A positive-definite version of this kernel is given in (Vedaldi and Zisserman, 2011) as

and is suitable to be used by methods other than support vector machines.

21. Histogram Intersection Kernel

The Histogram Intersection Kernel is also known as the Min Kernel and has been proven useful in image classification.

22. Generalized Histogram Intersection

The Generalized Histogram Intersection kernel is built based on the Histogram Intersection Kernelfor image classification but applies in a much larger variety of contexts (Boughorbel, 2005). It is given by:

23. Generalized T-Student Kernel

The Generalized T-Student Kernel has been proven to be a Mercel Kernel, thus having a positive semi-definite Kernel matrix (Boughorbel, 2004). It is given by:

24. Bayesian Kernel

The Bayesian kernel could be given as:

where

However, it really depends on the problem being modeled. For more information, please see the work by Alashwal, Deris and Othman, in which they used a SVM with Bayesian kernels in the prediction of protein-protein interactions.

25. Wavelet Kernel

The Wavelet kernel (Zhang et al, 2004) comes from Wavelet theory and is given as:

Where a and c are the wavelet dilation and translation coefficients, respectively (the form presented above is a simplification, please see the original paper for details). A translation-invariant version of this kernel can be given as:

Where in both h(x) denotes a mother wavelet function. In the paper by Li Zhang, Weida Zhou, and Licheng Jiao, the authors suggests a possible h(x) as:

Which they also prove as an admissible kernel function.

Source Code

The latest version of the source code for almost all of the kernels listed aboveis available in the Accord.NET Framework. Some are also available in the sequel of this article, Kernel Support Vector Machines for Classification and Regression in C#. They are provided together with a comprehensive and simple implementation of SVMs (Support Vector Machines) in C#. However, for the latest sources, which may contain bug fixes and other enhancements, please download the most recent version available of Accord.NET.

References

On-Line Prediction Wiki Contributors. “Kernel Methods.” On-Line Prediction Wiki. http://onlineprediction.net/?n=Main.KernelMethods(accessed March 3, 2010).
Genton, Marc G. “Classes of Kernels for Machine Learning: A Statistics Perspective.” Journal of Machine Learning Research 2 (2001) 299-312.
Hofmann, T., B. Schölkopf, and A. J. Smola. “Kernel methods in machine learning.” Ann. Statist. Volume 36, Number 3 (2008), 1171-1220.
Gunn, S. R. (1998, May). “Support vector machines for classification and regression.” Technical report, Faculty of Engineering, Science and Mathematics School of Electronics and Computer Science.
Karatzoglou, A., Smola, A., Hornik, K. and Zeileis, A. “Kernlab – an R package for kernel Learning.” (2004).
Karatzoglou, A., Smola, A., Hornik, K. and Zeileis, A. “Kernlab – an S4 package for kernel methods in R.” J. Statistical Software, 11, 9 (2004).
Karatzoglou, A., Smola, A., Hornik, K. and Zeileis, A. “R: Kernel Functions.” Documentation for package ‘kernlab’ version 0.9-5. http://rss.acs.unt.edu/Rdoc/library/kernlab/html/dots.html (accessed March 3, 2010).
Howley, T. and Madden, M.G. “The genetic kernel support vector machine: Description and evaluation“. Artificial Intelligence Review. Volume 24, Number 3 (2005), 379-395.
Shawkat Ali and Kate A. Smith. “Kernel Width Selection for SVM Classification: A Meta-Learning Approach.” International Journal of Data Warehousing & Mining, 1(4), 78-97, October-December 2005.
Hsuan-Tien Lin and Chih-Jen Lin. “A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods.” Technical report, Department of Computer Science, National Taiwan University, 2003.
Boughorbel, S., Jean-Philippe Tarel, and Nozha Boujemaa. “Project-Imedia: Object Recognition.” INRIA – INRIA Activity Reports – RalyX. http://ralyx.inria.fr/2004/Raweb/imedia/uid84.html (accessed March 3, 2010).
Huang, Lingkang. “Variable Selection in Multi-class Support Vector Machine and Applications in Genomic Data Analysis.” PhD Thesis, 2008.
Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. “Nonlinear SVMs.” The Stanford NLP (Natural Language Processing) Group. http://nlp.stanford.edu/IR-book/html/htmledition/nonlinear-svms-1.html (accessed March 3, 2010).
Fomel, Sergey. “Inverse B-spline interpolation.” Stanford Exploration Project, 2000. http://sepwww.stanford.edu/public/docs/sep105/sergey2/paper_html/node5.html (accessed March 3, 2010).
Basak, Jayanta. “A least square kernel machine with box constraints.” International Conference on Pattern Recognition 2008 1 (2008): 1-4.
Alashwal, H., Safaai Deris, and Razib M. Othman. “A Bayesian Kernel for the Prediction of Protein – Protein Interactions.” International Journal of Computational Intelligence 5, no. 2 (2009): 119-124.
Hichem Sahbi and François Fleuret. “Kernel methods and scale invariance using the triangular kernel“. INRIA Research Report, N-5143, March 2004.
Sabri Boughorbel, Jean-Philippe Tarel, and Nozha Boujemaa. “Generalized histogram intersection kernel for image recognition“. Proceedings of the 2005 Conference on Image Processing, volume 3, pages 161-164, 2005.
Micchelli, Charles. Interpolation of scattered data: Distance matrices and conditionally positive definite functions. Constructive Approximation 2, no. 1 (1986): 11-22.
Wikipedia contributors, “Kernel methods,” Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Kernel_methods&oldid=340911970 (accessed March 3, 2010).
Wikipedia contributors, “Kernel trick,” Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Kernel_trick&oldid=269422477(accessed March 3, 2010).
Weisstein, Eric W. “Positive Semidefinite Matrix.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/PositiveSemidefiniteMatrix.html
Hamers B. “Kernel Models for Large Scale Applications“, Ph.D., Katholieke Universiteit Leuven, Belgium, 2004.
Li Zhang, Weida Zhou, Licheng Jiao. Wavelet Support Vector Machine. IEEE Transactions on System, Man, and Cybernetics, Part B, 2004, 34(1): 34-39.
Vedaldi, A. and Zisserman, A. Efficient Additive Kernels via Explicit Feature Maps. IEEE Transactions on Pattern Recognition and Machine Intelligence, Vol. XX, No. XX, June, 2011.

http://crsouza.com/2010/03/kernel-functions-for-machine-learning-applications/

Original: https://www.cnblogs.com/chenying99/p/5226126.html
Author: 刺猬的温驯
Title: Kernel Functions for Machine Learning Applications

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/551379/

转载文章受原作者版权保护。转载请注明原作者出处！

技术杂谈

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

技术管理进阶——一线Leader怎么做？经理的速成宝典

原创不易，求分享、求一键三连本期培训材料关注公众号后回复：经理培训，获得前段时间有个同学问我有没有一线Leader的速成培训课程，很好的问题，首先我们需要定义一下什么是小Le…

技术杂谈 2023年6月1日
00110
OpenMP入门

OpenMP 入门简介 OpenMP 一个非常易用的共享内存的并行编程框架，它提供了一些非常简单易用的API，让编程人员从复杂的并发编程当中释放出来，专注于具体功能的实现。ope…

技术杂谈 2023年7月24日
0083
区块链–构建于技术创新之上的理想国

前言作为最近最火热的热潮之一，区块链吸引了大众的眼球。不管是互联网从业人员、金融行业从业人员，还是投机者、传销者，都对区块链充满了好奇和期待，寄希望于通过区块链创造财富和价值。那…

技术杂谈 2023年7月23日
0081
在线互联网地图资源汇总（仅学习用）

Original: https://www.cnblogs.com/rainbow70626/p/16514207.htmlAuthor: rainbow70626Title: 在…

技术杂谈 2023年6月1日
0085
性能测试案例全过程方案六———购物流程（重要！！！）

代码改变世界 Cnblogs Dashboard Login 2022-01-15 22:19 清风软件测试开发阅读(9 ) 评论() 编辑性能测试案例全过程方案六模拟多用户…

技术杂谈 2023年5月31日
00123
对象数组排序和类比JDK实现 sort()的方法

1.定义自己的 MyComparable 接口 1 package Test.treeSetDemo; 2 3 public interface MyComparable { 4 …

技术杂谈 2023年6月21日
0091
每天一个 HTTP 状态码 103

103 Earyly Hints 是被用于在最终的 HTTP 消息前返回一些响应头… 103 Early Hints 103 Earyly Hints 是被用于在最终 …

技术杂谈 2023年7月11日
0090
String s = new String(“xyz”)创建了几个实例你真的能答对吗？

从面试题说起 String s = new String("xyz"); 创建了几个实例？这是一道很经典的面试题，在一本所谓的Java宝典上，我看到的&#82…

技术杂谈 2023年7月24日
00106
jd-gui反编译报错// INTERNAL ERROR //

最近在反编译class和jar包的时候，发现部分class无法反编译出来，换了最新版本的jd-gui和多个版本都不行，只能放弃了解决方案：GitHub上找Luyten这个工具反编…

技术杂谈 2023年5月31日
0092
Apache Thrift系列(一)：Thrift基本及IDL语法

thrift支持数据类型基本类型： bool: 布尔值 byte: 8位有符号整数 i16: 16位有符号整数 i32: 32位有符号整数 i64: 64位有符号整数 doubl…

技术杂谈 2023年5月31日
0092
SQL的外键知识

SQL 的外键，其实用的很多，但是在学数据库的时候，这是放在后面的内容。容易造成忽视。定义一个含有外键约束的表 CREATE TABLE Orders ( Id_O int NO…

技术杂谈 2023年7月11日
0072
Elasticsearch Analyzer 内置分词器

Elasticsearch Analyzer 内置分词器篇主要介绍一下 Elasticsearch中 Analyzer 分词器的构成和一些Es中内置的分词器以及如何使用它们 …

技术杂谈 2023年7月10日
0089
离线安装 Dapr

Dapr 官方从 1.7 版本开始提供了离线安装Dapr 的支持。 Dapr CLI 工具和自宿主模式安装可以参考以下几个链接： Dapr 离线安装 & 在线执行 dap…

技术杂谈 2023年5月31日
0078
docker 报错：不能选择设备驱动 could not select device driver 的解决方法（实测有效）

Ubuntu安装完docker引擎后，在创建容器的时候指定 –gpus all，出现报错如下：报错： docker: Error response from daem…

技术杂谈 2023年7月10日
0099
获取不到数据库连接问题

org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; …

技术杂谈 2023年7月23日
0055
Verilog常用语法

Verilog常用语法该内容均可以在夏宇闻老师的《Verilog数字系统设计教程》第四版中找到，在此处只是便于回顾而已，没有书的可以参考，FPGA设计常用的都已经标出来了，有部分常…

技术杂谈 2023年5月31日
0095

Kernel Functions for Machine Learning Applications

Contents