演讲摘录:数据科学的本质 On the Nature of Data Science

演讲摘录 On the Nature of Data Science

演讲简介

Jeffrey David Ullman(1942年11月22日-),美国计算机科学家,斯坦福大学名誉教授。他关于编译器(各种版本被称为绿龙书)、计算理论(也被称为灰姑娘书)、数据结构和数据库的教科书被认为是各自领域的标准。他和他的长期合作者Alfred Aho是2020年图灵奖的获得者,一般被认为是计算机科学的最高荣誉。(摘自维基百科)

本演讲为KDD2021 Keynotes Talk的最后一场演讲。

演讲内容摘录

讲座重点讲解了数据科学与机器学习、统计学的关系,让听众更好地理解数据科学的本质。

[En]

The lecture focuses on the relationship between data science and machine learning and statistics, so that the audience can better understand the nature of data science.

从本世纪第一个十年的数据挖掘或知识发现,到第二个十年的大数据,再到今天的数据科学,这个领域的目的没有改变,就是将最快、最大的硬件设备与最快的算法、最高效的程序结合起来,解决商业和科学中的问题。

[En]

From data mining or knowledge discovery in the first decade of this century, to big data in the second decade, to today’s data science, the purpose of this field has not changed, that is, to combine the fastest and largest hardware devices with the fastest algorithms and the most efficient programs to solve * problems in business and science.*

演讲摘录:数据科学的本质 On the Nature of Data Science

发言者认为,数据科学是数据库系统自然进化的产物。

[En]

The speaker believes that data science is the product of the natural evolution of database systems.

同时,演讲者指出,要在数据科学领域有所作为,需要掌握计算机科学的核心,并拥有处理海量数据的专业知识。

[En]

At the same time, the speaker pointed out that to make a difference in the field of data science, you need to master the core of computer science and have expertise in dealing with large amounts of data.

演讲者驳斥了德鲁·康威的韦恩图,并给出了自己的韦恩图,展示了数据科学与其他领域的关系。数据科学是计算机科学与其他专业领域的结合,涉及机器学习,但不限于机器学习。此外,从数据科学的角度来看,数学和统计并不直接影响专业领域,而是通过计算机领域的算法间接影响。

[En]

The speaker refuted Drew Conway’s Wayne diagram and gave his own Wayne diagram to show the relationship between data science and other fields. Data science is the combination of computer science and other professional fields, which involves machine learning, but is not limited to machine learning. In addition, from the perspective of data science, mathematics and statistics do not directly affect the professional field, but indirectly through algorithms in the computer field.

演讲摘录:数据科学的本质 On the Nature of Data Science

与统计学相比,数据科学一般是一门实验学科。在数据中,科学家经常通过实现和运行算法或模型来验证方法的正确性,而不是通过分析和推导来避免模型错误。因此,对于数据科学来说,判断错误的标准和改进的方法比理论分析更重要。

[En]

Compared with statistics, data science is generally an experimental discipline. In data scientists often verify the correctness of a method by implementing and running an algorithm or model, rather than through analysis and derivation to avoid model errors. Therefore, for data science, the criteria and improved methods for judging errors are more important than theoretical analysis.

与机器学习相比,并不是所有的数据科学问题都是通过构建模型来解决的,比如位置敏感的哈希和近似计数(演讲者在这里推荐了一本名为《海量数据集挖掘》的书)。同时,该方法的可解释性在某些领域非常重要,例如保险公司估算保费。

[En]

Compared with machine learning, not all data science problems are solved by building models, such as Locality-Sensitive Hashing and Approximate Counting (the speaker recommends a book called “Mining of Massive Dataset” here). At the same time, the interpretability of the method is very important in some areas, such as insurance companies estimating premiums.

何时使用机器学习:

[En]

When to use machine learning:

  • 问题需要通过建模来解决
    [En]

    the problem needs to be solved through modeling*

  • 无需解释结果
    [En]

    there is no need to explain the results*

  • 对与问题相关的领域缺乏了解
    [En]

    lack of understanding of the areas related to the problem*

结论

  • 数据科学是许多计算机科学分支自然进化的结果,特别是那些通过处理大型数据集来帮助科学或行业发展的分支。
    [En]

    data science is the result of the natural evolution of many branches of computer science, especially those that help science or industry develop by dealing with large data sets.*

  • 统计学家特别独特,但他们过于注重分析数据,对解决实际问题不够
    [En]

    statisticians are particularly unique, but they pay too much attention to analyzing data and not enough to solving practical problems.

    机器学习是数据科学的重要组成部分,但它远不是数据科学的全部。

    [En]

    Machine learning is an important part of data science, but it is far from the whole of data science.*

    演讲摘录:数据科学的本质 On the Nature of Data Science

注:本文摘录自演讲稿,所有内容和图片均选自演讲稿,欢迎大家讨论,谢谢

[En]

Note: this article is an excerpt from the speech, all the contents and pictures are selected from the speech, welcome to discuss, thank you

Original: https://www.cnblogs.com/yc0806/p/16200083.html
Author: 多事鬼间人
Title: 演讲摘录:数据科学的本质 On the Nature of Data Science

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/7269/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

发表回复

登录后才能评论
免费咨询
免费咨询
扫码关注
扫码关注
联系站长

站长Johngo!

大数据和算法重度研究者!

持续产出大数据、算法、LeetCode干货,以及业界好资源!

2022012703491714

微信来撩,免费咨询:xiaozhu_tec

分享本页
返回顶部