张志华教授:机器学习-统计与计算之恋

编者按:本文摘自张志华老师在第九届中国语言文字大会和上海交通大学的两次讲座。张志华教授是上海交通大学计算机科学与工程系教授,上海交通大学数据科学研究中心兼职教授,计算机科学与技术与统计学博士生导师。在加入上海交通大学之前,他是浙江大学计算机学院教授,浙江大学统计科学中心兼职教授。张先生主要从事人工智能、机器学习和应用统计学领域的教学和研究。到目前为止,他已经在国际重要学术期刊和重要计算机科学会议上发表了70多篇论文。他是《美国数学评论》的特邀评论员,也是国际机器学习的旗舰出版物《机器学习研究杂志》的执行编辑委员会。其公开课《机器学习概论》和《统计机器学习》受到广泛关注。

[En]

Editor’s note: this article is compiled from teacher Zhang Zhihua’s two lectures at the Ninth China R language Conference and Shanghai Jiaotong University. Professor Zhang Zhihua is a professor in the Department of computer Science and Engineering of Shanghai Jiaotong University, an adjunct professor in the data Science Research Center of Shanghai Jiaotong University, and a doctoral supervisor in computer science and technology and statistics. Before joining Shanghai Jiaotong University, he was a professor in the School of computer Science of Zhejiang University and an adjunct professor in the Statistical Science Center of Zhejiang University. Mr. Zhang is mainly engaged in teaching and research in the fields of artificial intelligence, machine learning and applied statistics. so far, he has published more than 70 papers in important international academic journals and important computer science conferences. he is the invited commentator of American Mathematical Review and the executive editorial board of Journal of Machine Learning Research, the flagship publication of international machine learning. Its open courses “introduction to Machine Learning” and “Statistical Machine Learning” have attracted wide attention.

张志华教授:机器学习-统计与计算之恋

张志华老师和他的学生

[En]

Teacher Zhang Zhihua and his students

大家好。我今天演讲的主题是《机器学习:对统计和计算的热爱》。我用了一个非常浪漫的名字,但我的心情很害怕。一方面,我担心控制不了这么大的题材,另一方面,我其实是一个不懂风情的人,我的一些观点可能与国内学术界的主流声音不符。

[En]

Hello, everyone. The theme of my speech today is “ Machine Learning: the Love of Statistics and Computing “. I used a very romantic name, but my mood was frightened. On the one hand, I am worried that I will not be able to control such a big theme, and on the other hand, I am actually a person who does not understand amorous feelings, and some of my views may not be in line with the mainstream voices of domestic academic circles.*

最近,人工智能或机器学习的强势崛起,特别是AlphaGo和韩国棋手李世石之间的人机大战,再次让我们领略到人工智能或机器学习技术的巨大潜力,也深深地触动了我。面对这场史无前例的技术变革,作为一名从事统计机器学习一线教学研究10余年的学者,我愿借此机会与大家分享我个人的一些思考和感悟。

[En]

Recently, the strong rise of artificial intelligence or machine learning, especially the man-machine war between AlphaGo and South Korean chess player Lee se-dol, once again let us appreciate the great potential of artificial intelligence or machine learning technology, but also deeply touched me. In the face of this unprecedented technological change, as a scholar who has been engaged in the front-line teaching and research of statistical machine learning for more than 10 years, I would like to take this opportunity to share some of my personal thoughts and reflections with you.

张志华教授:机器学习-统计与计算之恋

在这场人工智能发展的盛会中,我突然发现,在我们中国学者看来,似乎是一群旁观者。不管你承认与否,事实是,我这一代人或更早的学者只能是旁观者。我们能做的就是帮助你们–中国的年轻一代,让你们在人工智能发展的大潮中具有竞争力,做出标杆成就,创造人类文明的价值,让我有一支欢呼的主场球队。

[En]

In this grand event of the development of artificial intelligence, I suddenly found that to us Chinese scholars, it seemed like a group of onlookers. Whether you admit it or not, the fact is that scholars of my generation or earlier can only be bystanders. What we can do is to help you-the younger generation in China, so that you can be competitive in the tide of artificial intelligence development, make benchmarking achievements, create the value of human civilization, and let me have a cheering home team.

我的发言主要由两部分组成。第一部分简要回顾了机器学习的发展历程,探讨了机器学习现象的内在本质。特别是讨论了它与统计学、计算机科学、运筹学和其他学科的关系,以及它与工业和创业的互补关系。在第二部分中,我们尝试使用多层次、自适应和平均的概念来简化丰富多彩的机器学习模型和计算方法背后的一些研究思路或想法。

[En]

My speech mainly consists of two parts. in the first part, I make a brief review of the development of machine learning and explore the inherent nature of machine learning phenomenon. in particular, it discusses its relationship with statistics, computer science, operational optimization and other disciplines, as well as its complementary relationship with industry and entrepreneurship. In the second part, we try to use the concepts of “multi-level”, “adaptive” and “average” to simplify some research ideas or ideas behind the colorful machine learning models and computing methods.

第一部分:回顾和反思

1、 什么是机器学习

毫无疑问,大数据和人工智能是当今最时尚的术语,它们将给我们未来的生活带来深刻的变化。数据是燃料,智能是目标,机器学习是火箭,也就是通向智能的技术路径。机器学习大师迈克·乔丹和汤姆·米切尔认为,机器学习是计算机科学和统计学的交集,也是人工智能和数据科学的核心。

[En]

There is no doubt that big data and artificial intelligence are the most fashionable terms today, and they will bring profound changes to our future lives. Data is the fuel, intelligence is the goal, and machine learning is the rocket, that is, the technical path to intelligence. Machine learning masters Mike Jordan and Tom Mitchell believe that machine learning is the intersection of computer science and statistics, as well as the core of artificial intelligence and data science.

张志华教授:机器学习-统计与计算之恋张志华教授:机器学习-统计与计算之恋“It is one of today’s rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science” —M. I. Jordan张志华教授:机器学习-统计与计算之恋张志华教授:机器学习-统计与计算之恋

总的来说,机器学习就是从数据中挖掘出有用的价值。数据本身是死的,它不能自动提供有用的信息。我们怎么才能找到有价值的东西呢?第一步是给出数据的抽象表示,然后基于该表示建立模型,然后估计模型的参数,即计算。为了应对大规模数据带来的问题,我们还需要设计一些有效的实现手段。

[En]

Generally speaking, machine learning is to dig out useful value from data. The data itself is dead, and it cannot automatically present useful information. How can we find something valuable? The first step is to give an abstract representation of the data, then model based on the representation, and then estimate the parameters of the model, that is, calculation. In order to deal with the problems caused by large-scale data, we also need to design some efficient implementation means.

我将这个过程解释为机器学习等于矩阵+统计+优化+算法。首先,当数据被定义为抽象表示时,它往往形成一个矩阵或图形,而图形也可以理解为矩阵。统计是建模的主要工具和方法,模型求解大多被定义为优化问题,尤其是频率统计方法实际上是一个优化问题。当然,贝叶斯模型的计算涉及到随机抽样的方法。当涉及到大数据问题的具体实施时,我们需要一些有效的方法。在计算机科学中,在算法和数据结构方面有许多很好的技能可以帮助我们解决这个问题。

[En]

I interpret this process as machine learning equals * matrix + statistics + optimization + algorithm * . First of all, when data is defined as an abstract representation, it often forms a matrix or a graph, and the graph can also be understood as a matrix. Statistics is the main tool and way of modeling, and model solving is mostly defined as an optimization problem, especially, the frequency statistics method is actually an optimization problem. Of course, the calculation of Bayesian model involves random sampling method. When it comes to the concrete implementation of big data’s problem, we need some efficient methods. There are many good skills in algorithms and data structures in computer science that can help us solve this problem.

借鉴Marr的关于计算机视觉的三级论定义,我把机器学习也分为三个层次:初级、中级和高级。初级阶段是数据获取以及特征的提取。中级阶段是数据处理与分析,它又包含三个方面,首先是应用问题导向,简单地说,它主要应用已有的模型和方法解决一些实际问题,我们可以理解为数据挖掘;第二,根据应用问题的需要,提出和发展模型、方法和算法以及研究支撑它们的数学原理或理论基础等,我理解这是机器学习学科的核心内容。第三,通过推理达到某种智能。最后,高级阶段是智能与认知,即实现智能的目标。从这里,我们看到,数据挖掘和机器学习本质上是一样的,其区别是数据挖掘更接地于数据库端,而机器学习则更接近于智能端。

2、 机器学习的发展历程

让我们梳理一下机器学习的发展历程。在20世纪90年代之前,我对它了解得还不够多,但我认为当时机器学习正处于一个平庸的发展时期。从1996年到2006年的黄金时期,学术界涌现了一批重要的成果,如基于统计学习理论的支持向量机和Boost等分类方法,基于再生核理论的非线性数据分析和处理方法,以套索为代表的稀疏学习模型和应用等。这些成就应该是统计界和计算机科学界共同努力取得的。

[En]

Let’s sort out the development of machine learning. Before the 1990s, I didn’t know enough about it, but I think machine learning was in a mediocre period of development at that time. The golden period from 1996 to 2006 is marked by the emergence of a number of important achievements in academic circles, such as classification methods such as SVM and boosting based on statistical learning theory, nonlinear data analysis and processing methods based on reproducing kernel theory, sparse learning models and applications represented by lasso, and so on. These achievements should be achieved by the joint efforts of the statistical community and the computer science community.

然而,机器学习也经历了短暂的徘徊。我对此表示同情,因为当时我在伯克利完成了博士后工作,正在找工作,所以我的导师迈克·乔丹教授当时多次与我沟通。一方面,他认为机器学习正处于困难时期,工作岗位已经满员。另一方面,他反复向我强调,将统计学引入机器学习是正确的。因为基于统计的机器学习作为一门学科的地位已经确立。主要的问题是,机器学习是一门应用型学科,需要在行业中发挥作用,可以为他们解决实际问题。幸运的是,这段时间很快就过去了。也许你们大多数人对这一时期没有印象,因为中国的学术发展往往是缓慢的。

[En]

However, machine learning has also experienced a short period of wandering. I sympathize with this, because at that time I finished my postdoctoral work in Berkeley and was looking for a job, so my mentor, Professor Mike Jordan, communicated with me many times at that time. On the one hand, he thought that machine learning was in a difficult period and the job position had become full. On the other hand, he repeatedly emphasized to me that it was right to introduce statistics into machine learning. Because the status of statistics-based machine learning as a discipline has been established. The main problem is that machine learning is an applied discipline, which needs to play a role in industry and can solve practical problems for them. Fortunately, this period passed quickly. Perhaps most of you have no impression of this period, because China’s academic development tends to be slow.

现在我们可以自信地说,机器学习已经成为计算机科学和人工智能的主流学科。它主要体现在以下三个标志性事件上。

[En]

Now we can confidently say that machine learning has become the mainstream discipline of computer science and artificial intelligence. It is mainly reflected in the following three iconic events.

首先,2010年2月,伯克利的Mike Jordan教授和CMU的Tom Mitchell教授同时被选为美国工程院院士,同年5月份,Mike Jordan和斯坦福的统计学家Jerome Friedman又被选为美国科学院院士。我们知道许多著名机器学习算法比如CART、MARS 和GBM等是 Friedman教授等提出。

张志华教授:机器学习-统计与计算之恋
在接下来的几年里,一批对机器学习做出重要贡献的学者被选为美国科学院或工程院院士。例如,人工智能专家达芙妮·科勒、Boosting的罗伯特·沙佩尔、拉索的罗伯特·蒂布希拉尼、中国著名统计学习专家俞斌、统计机器学习专家拉里·沃瑟曼、著名优化算法专家斯蒂芬·博伊德等;同时,机器学习专家、深度学习领军人物杰弗里·辛顿、多伦多大学统计学习专家南希·里德分别入选今年美国工程院和美国科学院外籍院士。
[En]

In the following years, a group of scholars who made important contributions to machine learning were selected as academicians of the American Academy of Sciences or the Academy of Engineering. For example, Daphne Koller of artificial intelligence expert, Robert Schapire of Boosting, Robert Tibshirani of Lasso, Yu Bin, a famous Chinese statistical learning expert, Larry Wasserman of statistical machine learning expert, Stephen Boyd, a famous optimization algorithm expert, etc. At the same time, Geoffrey Hinton, a machine learning expert and leader of deep learning, and Nancy Reid, a statistical learning expert at Toronto University, were selected as foreign academicians of the American Academy of Engineering and Academy of Sciences respectively this year.

张志华教授:机器学习-统计与计算之恋

这是当时Mike给我祝贺他当选为院士时的回信:

Thanks for your congratulations on my election to the National Academy. It’s nice to have machine learning recognized in this way.

因此,我理解,在美国,一个学科能否被接受为主流学科的一个重要标志,就是它的代表性科学家能否被选为院士。我们知道,汤姆·米切尔是机器学习的早期创始人和守护者,而迈克·乔丹是统计机器学习的创始人和推动者。

[En]

Therefore, I understand that an important sign of whether a discipline can be accepted as a mainstream discipline in the United States is whether its representative scientist can be selected as an academician. We know that Tom Mitchell is the early founder and guardian of machine learning, while Mike Jordan is the founder and promoter of statistical machine learning.

这种选拔机制无疑是先进的,它能够促进学科的健康发展,适应社会的动态发展和需求。相反,如果某某以某种方式被选为国家院士,那么他们将拥有国家话语权和资源分配权。这种机制可能会产生一些问题,比如一些过剩的学科或夕阳学科会获得太多的发展资源,而主流学科则会被边缘化。

[En]

This selection mechanism is undoubtedly advanced, it can promote the healthy development of the discipline and adapt to the dynamic development and needs of the society. On the contrary, if so-and-so is selected as a national academician in some way, then they will have the right to speak and allocate resources in the country. This mechanism may cause some problems, such as some surplus disciplines or sunset disciplines will get too many development resources, while mainstream disciplines will be marginalized.

张志华教授:机器学习-统计与计算之恋

其次,2011年的图灵奖授予了UCLA的Judea Pearl教授,他主要的研究领域是概率图模型和因果推理,这是机器学习的基础问题。我们知道,图灵奖通常颁给做纯理论计算机科学的学者,或者早期建立计算机架构的学者,而把图灵奖授予Judea Pearl教授具有方向标的意义。

第三,有当前的热点,如深度学习、AlphaGo、自动驾驶汽车、人工智能助手等。机器学习实际上可以用来帮助行业解决问题。行业对机器学习领域的人才需求很大,既有代码能力强的工程师,也有数学建模和解决问题的科学家。

[En]

Third, there are current hot spots, such as deep learning, AlphaGo, self-driving cars, artificial intelligence assistants and so on. Machine learning can actually be used to help industry solve problems. There is a great demand for talented people in the field of machine learning in the industry, not only engineers with strong code skills, but also scientists with mathematical modeling and problem solving.

让我们具体看看工业和机器学习之间的关系。在此之前的一年里,我一直是谷歌研究院的客座科学家。我的许多同事和以前的学生都在IT领域工作。平时,实验室经常接待一些公司的参观和交流,所以我对IT行业有了一些了解。

[En]

Let’s take a specific look at the relationship between industry and machine learning. I have been a visiting scientist at Google Research for a year before. Many of my colleagues and former students work in the IT field. Usually, the laboratory often receives visits and exchanges from some companies, so I know something about the IT industry.

我理解当今IT的发展已从传统的微软模式转变到谷歌模式。传统的微软模式可以理解为制造业,而谷歌模式则是服务业。谷歌搜索完全是免费的,服务社会,他们的搜索做得越来越极致,同时创造的财富也越来越丰厚。

财富蕴藏在数据中,挖掘财富的核心技术是机器学习。深度学习作为最具活力的机器学习方向之一,是计算机视觉、自然语言理解、语音识别、智能游戏等领域的颠覆性成果。它创造了许多新的初创企业。

[En]

Wealth is contained in data, and the core technology of mining wealth is machine learning. As one of the most dynamic machine learning directions, deep learning is a subversive achievement in the fields of computer vision, natural language understanding, speech recognition, intellectual games and so on. It has created a number of new start-ups.

3、 统计与计算

我的重点是回到学术界。让我们来关注统计学和计算机科学之间的关系。芝加哥大学统计学教授拉里·沃瑟曼最近当选为美国国家科学院院士。他写了一本书,书名非常霸气,叫《统计学的一切》。在这本书的引言中,有一个关于统计学和机器学习的非常有趣的描述。他认为,原来的统计在统计部,计算机在计算机部,两者没有关联,彼此的值也不一致。计算机科学家认为这些统计理论毫无用处,解决不了问题,而统计学家认为计算机科学家只是在重建轮子,没有什么新意。然而,他认为,随着统计学家认识到计算机科学家的贡献,计算机科学家认识到统计理论和方法的普遍意义,这种情况现在已经改变了。因此,拉里写了这本书,这是一本面向统计学家的计算机书籍,也是一本面向计算机学者的统计书籍。

[En]

My focus is to go back to academia. Let’s focus on the relationship between statistics and computer science. Larry Wasserman, a professor of statistics at CMU, was recently elected a member of the National Academy of Sciences. He wrote a book with a very domineering name, All of Statistics. There is a very interesting description of statistics and machine learning in the introduction of this book. He believes that the original statistics is in the statistics department, the computer is in the computer department, the two are not related to each other, and do not agree with each other’s value. Computer scientists think that those statistical theories are useless and do not solve problems, while statisticians think that computer scientists are just rebuilding wheels, nothing new. However, he believes that this situation has now changed, with statisticians recognizing the contributions being made by computer scientists, and computer scientists recognizing the universal significance of statistical theories and methodologies. So, Larry wrote this book, which is a computer book for statisticians and a statistical book for computer scholars.

现在有一个共识:如果你在使用一种机器学习方法,而不理解它的基本原理,这是一件非常可怕的事情。也正因为如此,学术界对深度学习仍存有疑虑。深度学习已经显示出其强大的实际应用效果,但其原理尚不清楚。

[En]

Now there is a consensus: if you are using a machine learning method and do not understand its basic principles, this is a very terrible thing. It is also for this reason that academic circles still have doubts about deep learning. Deep learning has shown the effect of its powerful practical application, but the principle is not clear yet.

让我们进一步分析统计学和计算机之间的关系。计算机科学家通常具有较强的计算能力和解决问题的直觉,而统计父母擅长理论分析和较强的建模能力,因此两者具有很好的互补性。

[En]

Let’s further analyze the relationship between statistics and computers. Computer scientists usually have strong computing power and problem-solving intuition, while statistical parents are good at theoretical analysis and strong modeling ability, so the two are well complementary.

Boosting, SVM 和稀疏学习是机器学习界也是统计界,在近十年或者是近二十年来,最活跃的方向,现在很难说谁比谁在其中做的贡献更大。比如,SVM的理论其实很早被Vapnik等提出来了,但计算机界发明了一个有效的求解算法,而且后来又有非常好的实现代码被陆续开源给大家使用,于是SVM就变成分类算法的一个基准模型。再比如,KPCA是由计算机学家提出的一个非线性降维方法,其实它等价于经典MDS。而后者在统计界是很早就存在的,但如果没有计算机界从新发现,有些好的东西可能就被埋没了。

机器学习已经成为统计学的主流方向,许多著名的统计部门都聘请了机器学习领域的医生作为老师。计算在统计学中变得越来越重要。传统的多元统计分析以矩阵为计算工具,而现代高维统计则以最优化为计算工具。另一方面,计算机科学提供高级统计课程,例如统计学的核心课程“经验过程”。

[En]

Machine learning has become a mainstream direction of statistics, and many famous statistics departments have recruited doctors in the field of machine learning as teachers. Calculation has become more and more important in statistics. The traditional multivariate statistical analysis takes matrix as the calculation tool, while the modern high-dimensional statistics uses optimization as the calculation tool. On the other hand, computer science offers advanced statistics courses, such as the core course “empirical process” in statistics.

让我们来看看机器学习在计算机科学中的地位。最近,Avrim Blum、John Hopcroft和Ravindran Kannan未出版的书《数据科学基础》的作者之一John Hopcroft获得了图灵奖。在本书的前沿部分,提到了计算机科学的发展可以分为三个阶段:早期、中期和现在。在早期,计算机是用来运行的,专注于编程语言、编译原理、操作系统和支撑它们的数学理论的发展。中期目标是让计算机变得有用和高效。重点是对算法和数据结构的研究。第三个阶段是让计算机得到更广泛的应用,发展的重点从离散数学转向概率统计。然后我们可以看到,第三个阶段实际上是机器学习所关注的。

[En]

Let’s take a look at the position of machine learning in computer science. Recently, one of the authors of an unpublished book, Foundation of Data Science, by Avrim Blum, John Hopcroft, and Ravindran Kannan, John Hopcroft is a Turing Award winner. In the frontier part of this book, it is mentioned that the development of computer science can be divided into three stages: early, middle and present. In the early days, computers were made to run, focusing on the development of programming languages, compilation principles, operating systems, and the mathematical theories that underpin them. The medium term is to make computers useful and efficient. The key point is to study the algorithm and data structure. The third stage is to make computers more widely used, and the focus of development shifts from discrete mathematics to probability and statistics. Then we can see that the third stage is actually what machine learning is concerned about.

现在计算机界戏称机器学习是“万能的学科”,它无所不在。一方面,机器学习有自己的学科体系;另一方面,它具有两个重要的辐射功能。一是为应用学科提供解决问题的方法和途径。更通俗地说,对于一门应用学科来说,机器学习的目的是将一些困难的数学转化为伪代码,使工程师能够编写程序。二是为一些传统学科发现新的研究问题,如统计学、理论计算机科学、运筹学等。

[En]

Now the computer industry jokingly calls machine learning “omnipotent discipline”, it is omnipresent. On the one hand, machine learning has its own discipline system; on the other hand, it has two important radiation functions. The first is to provide methods and ways to solve problems for applied disciplines. To put it more colloquially, for an applied discipline, the purpose of machine learning is to translate some difficult mathematics into pseudo-code that enables engineers to write programs. The second is to find new research problems for some traditional disciplines, such as statistics, theoretical computer science, operational optimization and so on.

4、 机器学习发展的启示

机器学习的发展告诉我们,一门学科的发展需要务实的态度。时尚的概念和名字无疑对学科的普及起到了一定的推动作用,但学科的根本仍然是所研究的问题、方法、技术和支持的基础,以及对社会的价值。

[En]

The development of machine learning tells us that the development of a discipline requires a pragmatic attitude. There is no doubt that fashionable concepts and names play a certain role in promoting the popularization of the discipline, but the root of the discipline is still the basis of the problems, methods, technology and support studied, as well as the value for the society.

机器学习是一个很酷的名字,简单地从字面上理解,它的目的是让机器像人类一样有学习的能力。但正如我们之前所看到的,在其十年的黄金发展期,机器学习领域并没有太多地炒作“智能”,而是更注重引入统计学来建立学科的理论基础,面向数据分析和处理。以无监督学习和监督学习为主要研究对象,提出并发展了一系列模型、方法和计算算法。实事求是地解决一些行业面临的实际问题。近年来,由于大数据的驱动和计算能力的大幅提升,一批机器学习的底层结构相继开发出来。深度神经网络的强势崛起,给行业带来了深刻的变革和机遇。

[En]

Machine learning is a cool name, simply taken literally, its purpose is to make machines as capable of learning as human beings. But as we have seen before, in its 10-year golden period of development, the field of machine learning did not hype too much about “intelligence”, but paid more attention to introducing statistics to establish the theoretical basis of the discipline, facing data analysis and processing. taking unsupervised learning and supervised learning as the two main research problems, a series of models, methods and computational algorithms are proposed and developed. Solve some practical problems faced by industry in a practical way. In recent years, due to the drive of big data and the great improvement of computing power, a number of underlying structures for machine learning have been developed one after another. the strong rise of deep neural network has brought profound changes and opportunities to the industry.

机器学习的发展也说明了跨学科研究的重要性和必要性。然而,这种交集并不是简单地知道几个名词或概念,而是真正融为一体。迈克·乔丹教授既是一流的计算机科学家,也是一流的统计学家,所以他可以承担建立统计机器学习的任务。而且他非常务实,从不提及那些空洞的概念和框架。他遵循的是自下而上的方法,即从具体问题、模型、方法、算法等入手,然后一步步系统化。杰弗里·辛顿教授是世界上最著名的认知心理学家和计算机科学家。虽然他很早就取得了很大的成就,在学术界享有很高的声誉,但他一直活跃在一线,编写着自己的代码。他提出的许多想法简单、可行、非常有效,因此被称为伟大的思想家。正是因为他的智慧和能力,深度学习技术才迎来了革命性的突破。

[En]

The development of machine learning also explains the importance and necessity of interdisciplinary. However, this kind of intersection is not simply to know a few nouns or concepts, but to really melt through. Professor Mike Jordan is both a first-class computer scientist and a first-class statistician, so he can undertake the task of establishing statistical machine learning. And he is very pragmatic, never mentioning those empty concepts and frameworks. He follows a bottom-up approach, that is, starting with specific problems, models, methods, algorithms, etc., and then systematizing step by step. Professor Geoffrey Hinton is the most famous cognitive psychologist and computer scientist in the world. Although he made great achievements a long time ago and has a great reputation in academic circles, he has been active on the front line, writing his own code. Many of the ideas he put forward are simple, feasible and very effective, so they are called great thinkers. It is because of his wisdom and ability that deep learning technology has ushered in a revolutionary breakthrough.

机器学习的主题是兼容的,也是可以接受的。可以说,机器学习是学术界、产业界、创业(或竞赛)等多方合力打造的。学术是引擎,产业是动力,创业是活力和未来。学术界和产业界应该有自己的责任和分工。学术界的责任是建立和发展机器学习学科,培养机器学习领域的专业人才,而大项目和大项目应该由市场驱动,由行业实施和完成。

[En]

The subject of machine learning is compatible and acceptable at the same time. We can say that machine learning is created by the joint efforts of academia, industry, entrepreneurship (or competition) and so on. Academia is the engine, industry is the driver, and entrepreneurship is the vitality and future. Academia and industry should have their own responsibilities and division of labor. The responsibility of academia is to establish and develop machine learning disciplines and train professionals in the field of machine learning, while large projects and large projects should be driven by the market and implemented and completed by industry.

5、国内外发展现状

让我们来看看机器学习在世界上的发展情况。我主要看了几所著名的大学。在伯克利,一个发人深省的举动是,机器学习教授在计算机科学和统计学方面都有正式的职位,据我所知,他们并不是兼职,在这两个系教授课程和研究。伯克利是美国统计的发源地,可以说是今天的统计圣地,但它包容一切,没有固步自封。迈克·乔丹教授是统计机器学习的主要创始人和推动者。他在机器学习领域培养了一大批优秀的学生。统计部的负责人现在是迈克,但他在早年接受的教育中没有统计学或数学背景。可以说,伯克利的统计部门造就了迈克,进而为伯克利的统计发展创造了新的活力和不可替代的功绩。

[En]

Let’s take a look at the development of machine learning in the world. I mainly look at several famous universities. A thought-provoking move at Berkeley is that professors of machine learning have formal positions in both computer science and statistics, and as far as I know, they are not part-time, teaching courses and research in both departments. Berkeley is the birthplace of American statistics, can be said to be the mecca of statistics today, but it is inclusive and not resting on its laurels. Professor Mike Jordan is the main founder and promoter of statistical machine learning. He has trained a large number of excellent students in the field of machine learning. The head of the statistics department is now Mike, but he did not have a background in statistics or mathematics in his early education. It can be said that the statistics department of Berkeley has made Mike, which in turn has created new vitality and irreplaceable exploits for the statistical development of Berkeley.

斯坦福和伯克利的统计是公认世界最好的两个。我们看到,斯坦福统计系的主流方向就是统计学习,比如我们熟知的《Elements of statistical learning》一书就是统计系几位著名教授撰写的。Stanford计算机科学的人工智能方向一直在世界占主导地位,特别在不确定推理、概率图模型、概率机器人等领域成就斐然,他们的网络公开课 《机器学习》、《概率图模型》以及《人工智能》等让世界受益。

张志华教授:机器学习-统计与计算之恋

张志华教授:机器学习-统计与计算之恋

CMU是一个非常独特的学校,她并不是美国传统的常春藤大学。可以说,它是以计算机科学为立校之本,它是世界第一个建立机器学习系的学校。Tom Mitchell 教授是机器学习的早期建立者之一和守护者,他一直为该校本科生教《机器学习》课程。然而,这个学校统计学同样强,尤其,她是贝叶斯统计学的世界研究中心。

在机器学习领域,多伦多大学有着举足轻重的地位,她们机器学习研究组云集了一批世界级的学者,在”Science” 和”Nature”发表多篇论文,实属罕见。Geoffrey Hinton 教授是伟大的思想家,但更是践行者。他是神经网络的建立者之一,是BP算法和深度学习的主要贡献者。正是由于他的不懈努力,神经网络迎来了大爆发。Radford Neal 教授是Hinton学生,他在贝叶斯统计领域,特别是关于MCMC做出了一系列的重要工作。

国际发展现状

张志华教授:机器学习-统计与计算之恋

那么让我们来看看中国目前的情况。总体而言,统计学和计算机科学这两个学科正处于拉里所说的独立战争的早期阶段。统计学和计算机科学面向大数据的交叉研究,既是机遇,也是挑战。

[En]

So let’s take a look at the current situation in China. Overall, the two disciplines, statistics and computer science, are at an early stage of what Larry calls a separate war. The cross-research of statistics and computer science facing big data is both an opportunity and a challenge.

我之前曾在浙江大学参与过统计跨学科中心的建立,所以我对统计界有一些了解。统计学在中国应该仍然是一门薄弱学科,直到最近才被国家指定为一级学科。中国的统计数据有两个极端。一是认为它是数学的一个分支,主要研究概率论、随机过程和数理统计理论。其次,它被归类为经济学的一个分支,主要研究经济分析的应用。然而,机器学习在统计学领域并没有得到深入的关注。因此,信息技术与统计学的深度融合对数据处理和分析具有巨大的潜力。

[En]

I have previously participated in the establishment of its statistical interdisciplinary center at Zhejiang University, so I have some knowledge of the statistical community. Statistics should still be a weak discipline in China, and it has only recently been designated as a first-tier discipline by the state. Statistics in China is in two extremes. One is that it is regarded as a branch of mathematics, which mainly studies probability theory, stochastic process and mathematical statistics theory. Second, it is classified as a branch of economics, which mainly studies the application of economic analysis. However, machine learning has not been deeply concerned in the field of statistics. Therefore, the deep fusion of IT and statistics for data processing and analysis has great potential.

虽然我与国内机器学习或人工智能学术界没有深入接触,但我在国内计算机系工作了近8年,一直从事与机器学习相关的教学和研究。在机器学习的现状下,应该有一定的发言权。机器学习在中国确实得到了广泛的关注并取得了一些成果,但我认为高质量的研究成果是稀缺的。他们热衷于一些关于机器学习高级阶段的概念性炒作,通常无法执行;他们更喜欢大项目和大集成,这些应该由行业来实施;而理论和方法等基础研究没有得到重视,理论无用的观点仍然有很大的市场。

[En]

Although I have no in-depth contact with domestic machine learning or artificial intelligence academia, I have worked in the domestic computer department for nearly 8 years and have been engaged in teaching and research related to machine learning. Should have a certain say in the current situation of machine learning. Machine learning has indeed received widespread attention and made some achievements in China, but I think high-quality research results are scarce. Keen on some conceptual hype about the advanced stage of machine learning, they are usually not enforceable; they prefer large projects and large integration, which should have been implemented by industry; while basic research such as theories and methods are not taken seriously, and there is still a big market for the view that theories are useless.

计算机专业的人才培养体系还处于发展的初级阶段。大多数学校都开设了人工智能和机器学习课程,但无论是深度还是前沿都滞后于学科发展,不能满足时代需要。人才培养在质量和数量上都不能满足产业的需求。这也是国内IT企业与国际同类企业存在较大技术差距的关键原因。

[En]

The training system of computer science is still in its early stage of development. Most schools have offered courses of artificial intelligence and machine learning, but both the depth and the frontier lag behind the development of the discipline and can not meet the needs of the times. The training of talents can not meet the needs of industry in terms of quality and quantity. This is also the key reason why there is a big technological gap between domestic IT companies and similar international companies.

第二部分:几个简单的研究思路

在这一部分,我的注意力回到了机器学习本身的研究上。机器学习的内容博大精深,新方法、新技术不断被提出和发现。在这里,我尝试用多层次、自适应和平均的概念来简化丰富多彩的机器学习模型和计算方法背后的一些研究思路和想法。希望这些对大家了解机器学习的一些现有模型、方法和未来的研究有所启发。

[En]

In this part, my attention goes back to the study of machine learning itself. The content of machine learning is broad and profound, and new methods and technologies are being proposed and discovered continuously. Here, I try to use the concepts of “multi-level”, “adaptive” and “average” to simplify some research ideas and ideas behind the colorful machine learning models and computing methods. I hope these will enlighten you to understand some of the existing models, methods and future research of machine learning.

1. 多级 (Hierarchical)

首先,我们来重点介绍一下“多层次”的技术思路。让我们来看三个具体的例子。

[En]

First of all, let’s focus on the technical idea of “multi-level”. Let’s look at three specific examples.

第一个例子是隐式数据模型,它是一个多层次模型。隐含数据模型是概率图模型的扩展,是一种重要的多元数据分析方法。隐含变量有三个重要属性。首先,可以用弱条件独立相关代替强边界独立相关。著名的de Finetti表示定理支持这一点。这个定理说,当且仅当给定一个参数时,一组可交换的随机变量可以表示为一组条件随机变量的混合。这给出了一组可交换随机变量的多级表示。也就是说,我们首先从分布中提取一个参数,然后基于该参数从分布中独立地提取这组随机变量。其次,可以引入隐变量技术来方便计算,例如期望最大值算法和更广泛的数据扩展技术就是基于这种思想。具体地说,一些复杂的分布,如t分布和拉普拉斯分布,可以通过将它们表示为高斯尺度混合来简化。第三,隐含变量本身可能具有某种可解释的物理含义,这与应用场景不谋而合。例如,在隐含狄利克雷分布(LDA)模型中,隐含变量具有主题的含义。

[En]

The first example is the implicit data model, which is a multi-level model. As an extension of probability graph model, hidden data model is an important multivariate data analysis method. Implied variables have three important properties. First, the weak conditional independent correlation can be used to replace the strong boundary independent correlation. The famous de Finetti representation theorem supports this. This theorem says that a group of exchangeable random variables can be expressed as a mixture of a set of conditional random variables if and only if a parameter is given. This gives a multi-level representation of a set of exchangeable random variables. That is, we first extract a parameter from a distribution, and then extract this set of random variables independently from a distribution based on this parameter. Second, the technology of implicit variables can be introduced to facilitate the calculation, such as the expected maximum algorithm and the broader data expansion technology is based on this idea. Specifically, some complex distributions, such as t-distribution and Laplace distribution, can be simplified by expressing them as Gaussian-scale mixtures. Third, the implied variable itself may have some interpretable physical meaning, which coincides with the application scenario. For example, in the implied Dirichlet distribution (LDA) model, the implied variable has the meaning of a topic.

第一个例子是隐式数据模型,它是一个多层次模型。隐含数据模型是概率图模型的扩展,是一种重要的多元数据分析方法。隐含变量有三个重要属性。首先,可以用弱条件独立相关代替强边界独立相关。著名的de Finetti表示定理支持这一点。这个定理说,当且仅当给定一个参数时,一组可交换的随机变量可以表示为一组条件随机变量的混合。这给出了一组可交换随机变量的多级表示。也就是说,我们首先从分布中提取一个参数,然后基于该参数从分布中独立地提取这组随机变量。其次,可以引入隐变量技术来方便计算,例如期望最大值算法和更广泛的数据扩展技术就是基于这种思想。具体地说,一些复杂的分布,如t分布和拉普拉斯分布,可以通过将它们表示为高斯尺度混合来简化。第三,隐含变量本身可能具有某种可解释的物理含义,这与应用场景不谋而合。例如,在隐含狄利克雷分布(LDA)模型中,隐含变量具有主题的含义。

[En]

The first example is the implicit data model, which is a multi-level model. As an extension of probability graph model, hidden data model is an important multivariate data analysis method. Implied variables have three important properties. First, the weak conditional independent correlation can be used to replace the strong boundary independent correlation. The famous de Finetti representation theorem supports this. This theorem says that a group of exchangeable random variables can be expressed as a mixture of a set of conditional random variables if and only if a parameter is given. This gives a multi-level representation of a set of exchangeable random variables. That is, we first extract a parameter from a distribution, and then extract this set of random variables independently from a distribution based on this parameter. Second, the technology of implicit variables can be introduced to facilitate the calculation, such as the expected maximum algorithm and the broader data expansion technology is based on this idea. Specifically, some complex distributions, such as t-distribution and Laplace distribution, can be simplified by expressing them as Gaussian-scale mixtures. Third, the implied variable itself may have some interpretable physical meaning, which coincides with the application scenario. For example, in the implied Dirichlet distribution (LDA) model, the implied variable has the meaning of a topic.

张志华教授:机器学习-统计与计算之恋

Laten Dirichlet Allocation

在第二个例子中,我们来看一下多层贝叶斯模型。在MCMC抽样的后验估计中,最高超参数往往需要由先辈给出。当然,MCMC算法的收敛性能取决于这些给定的超参数。如果我们在这些参数的选择上没有很好的经验,那么如果我们再增加一层,那么随着层的增加,对超参数选择的依赖性就会减弱。

[En]

In the second example, let’s look at the multi-level Bayesian model. In the posterior estimation of MCMC sampling, the top superparameters always need to be given by ancestors. Naturally, the convergence performance of MCMC algorithm depends on these given hyperparameters. If we do not have good experience in the selection of these parameters, then it is possible that if we add one more layer, the dependence on the selection of hyperparameters will be weakened with more layers.

张志华教授:机器学习-统计与计算之恋

Hierarchical Bayesian Model

第三个例子,深度学习也包含了多层次的思维。如果所有节点都被展平,然后完全连接,则它是一个完全连通的图。另一方面,CNN深度网络可以看作是完全连通图的结构正则化。正则化理论是统计学习的一个非常核心的思想。CNN和RNN是两种主要用于图像处理和自然语言处理的深度神经网络模型。研究表明,多层次结构具有较强的学习能力。

[En]

The third example, deep learning also contains multi-level thinking. If all the nodes are flattened and then fully connected, it is a fully connected graph. On the other hand, CNN deep network can be regarded as a structural regularization of fully connected graph. Regularization theory is a very core idea of statistical learning. CNN and RNN are two deep neural network models, which are mainly used in image processing and natural language processing respectively. The research shows that the multi-level structure has stronger learning ability.

张志华教授:机器学习-统计与计算之恋

Deep Learning

2. 自适应 (Adaptive)

让我们来看看自适应的技术想法,我们将通过几个例子来看看这个想法的作用。

[En]

Let’s take a look at the technical idea of self-adaptation, and we will look at the role of this idea through several examples.

第一个例子是自适应重要抽样技术。重要的采样方法通常可以提高均匀采样的性能,而自适应采样可以进一步提高重要采样的性能。

[En]

The first example is the adaptive important sampling technique. Important sampling methods can usually improve the performance of uniform sampling, while adaptive sampling can further improve the performance of important sampling.

第二个例子,自适应列选择问题。给定一个矩阵A,我们希望从中选取部分列构成一个矩阵C,然后用CC^+A去近似原矩阵A,而且希望近似误差尽可能小。这是一个NP难问题。在实际上,可以通过一个自适应的方式,先采出非常小一部分C_1,由此构造一个残差,通过这个定义一个概率,然后用概率再去采一部分C_2, 把C_1 和 C_2 合在一起组成C。

第三个例子是自适应随机迭代算法。考虑一个带正则化的经验风险最小化问题。当训练数据较多时,批处理的计算方法非常耗时,因此通常采用随机的方法。现有的随机梯度算法或随机对偶梯度算法可以得到参数的无偏估计。通过引入自适应技术,可以减小估计的方差。

[En]

The third example is the adaptive random iterative algorithm. Consider an empirical risk minimization problem with regularization. When there are a lot of training data, the calculation method of batch processing is very time-consuming, so a random method is usually adopted. An unbiased estimation of parameters can be obtained by the existing random gradient or random dual gradient algorithm. By introducing adaptive technology, the variance of estimation can be reduced.

第四个例子是Boosting分类法。它自适应地调整每个样本的权重,特别是增加错误样本的权重,减少样本对的权重。

[En]

The fourth example is the Boosting classification method. It adaptively adjusts the weight of each sample, specifically, increases the weight of the wrong sample, and reduces the weight of the sample pair.

3. 平均 (Averaging)

其实,boosting 蕴含着平均思想,即我最后要谈的技术思路。简单地说,boosting是把一组弱分类器集成在一起,形成一个强的分类器。第一好处是可以降低拟合的风险。第二,可以降低陷入局部的风险。第三,可以扩展假设空间。Bagging同样是经典的集成学习算法,它把训练数据分成几组,然后分别在小数据集上训练模型,通过这些模型来组合强分类器。另外这是一个两层的集成学习方式。

张志华教授:机器学习-统计与计算之恋

经典的Anderson 加速技术则是通过平均的思想来达到加速收敛过程。具体地,它是一个叠加的过程,这个叠加的过程通过求解一个残差最小得到一个加权组合。这个技术的好处,是没有增加太多的计算,往往还可以使数值迭代变得较为稳定。

另一个使用平均的例子是在分布式计算中。在许多情况下,分布式计算不是同步的,而是异步的。如果它是异步的呢?最简单的做法是独立完成,并在某个时刻将所有结果平均,然后将它们分发给每个工作者,然后独立运行它们,以此类推。这就像是一个热启动过程。

[En]

Another example of using averaging is in distributed computing. In many cases, distributed computing is not synchronous, but asynchronous. What if it is asynchronous? The simplest thing is to do it independently, and at some point average all the results, distribute them to each worker, and then run them independently, and so on. It’s like a hot start process.

正如我们已经看到的,这些想法经常被结合在一起使用,比如提振模型。我们的多层次、适应性和平庸思维是直截了当的,但也是有用的。

[En]

As we have seen, these ideas are often used together, such as the boosting model. Our multi-level, adaptive and average thinking is straightforward, but it is also useful.

在AlphaGo和李世石九段对弈中,一个值得关注的细节是,代表Alpha Go方悬挂的是英国国旗。我们知道AlphaGo是由deep mind团队研发的,deep mind是一家英国公司,但后来被google公司收购了。科学成果是世界人民共同拥有和分享的财富,但科学家则是有其国家情怀和归属感。

我认为,我国人工智能发展的根本出路在于教育。圣人曰:“磨刀而不误砍柴。”只有培养出一批又一批数学基础深厚、计算机动手能力强、交叉能力与国际视野真正融合的人才,才能大有作为。

[En]

I think the fundamental way out for the development of artificial intelligence in our country lies in education. The sage said: “sharpen the knife without mistakenly chopping firewood.” Only when we cultivate batch after batch of talents with deep mathematical foundation, strong computer hands-on execution, and real integration of cross-ability and international vision, will we make great achievements.

致谢

上述内容是根据我最近在第九届中国R语言会议(http://china-r.org/bj2016/)和上海交通大学的两次讲座而整理出来的,特别是R会主办方统计之都的同学们帮我做了该次演讲的记录。感谢统计之都的太云、凌秉和象宇的邀请,他们和统计之都的伙伴们正在做一件意义影响深远的学术公益,你们的情怀和奉献给了我信心来公开宣讲自己多年来的真实认识和思考。感谢我的学生们帮助我准备这个讲演报告,从主题的选定,内容的选取,材料的收集以及幻灯片的制作他们都给了我极大的支持,更重要的是,他们让我在机器学习领域的求索一直不孤独。谢谢大家!

统计之都:专业、人本、正直的中国统计学门户网站。

关注方式:扫描下图二维码。或查找公众帐号,搜索 统计之都 或 CapStat 即可。

往期推送:进入统计之都会话窗口,点击右上角小人图标,查看历史消息即可。

统计之都欢迎诸位看官积极投稿,投稿信箱contact@cos.name

张志华教授:机器学习-统计与计算之恋

Original: https://www.cnblogs.com/yymn/p/5616061.html
Author: 菜鸡一枚
Title: 张志华教授:机器学习-统计与计算之恋

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/7146/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

发表回复

登录后才能评论
免费咨询
免费咨询
扫码关注
扫码关注
联系站长

站长Johngo!

大数据和算法重度研究者!

持续产出大数据、算法、LeetCode干货,以及业界好资源!

2022012703491714

微信来撩,免费咨询:xiaozhu_tec

分享本页
返回顶部