Spark的39个机器学习库

Spark originally came out of Berkeley AMPLab and even today AMPLab projects, even though they are not in Apache Spark Foundation, enjoy a status a bit over your everyday github project.

Spark’s own MLLib forms the bottom layer of the three-layer ML Base, with MLI being the middle layer and ML Optimizer being the most abstract layer.

Ghostware was described in 2014 but never released. Of the 39 machine learning libraries, this is the only one that is vaporware, and is included only due to its AMPLab and ML Base status.

A recent project from June, 2015, this set of stochastic learning algorithms claims 25x – 75x faster performance than Spark MLlib on Stochastic Gradient Descent (SGD). Plus it’s an AMPLab project that begins with the letters “sp”, so it’s worth watching.

Brought machine learning pipelines to Spark, but pipelines have matured in recent versions of Spark. Also promises some computer vision capability, but there are limitations I previously blogged about.

A server to manage a large collection of machine learning models.

Brand new and frankly why I started this list for this blog post. Provides an interface to Keras.

Parameter server for model-parallel rather than data-parallel (as Spark’s MLlib is).

From Airbnb, used in their automated pricing

Logistic regression, LDA, Factorization machines, Neural Network, Restricted Boltzmann Machines

Similar to Spark DataFrames, but agnostic to engine (i.e. will run on engines other than Spark in the future). Includes cross-validation and interfaces to external machine learning libraries.

Export PMML, an industry standard XML format for transporting machine learning models.

Adds arbitrary distance functions to K-Means

Visualize the Streaming Machine Learning algorithms built into Spark MLlib

Factorization Machines

Recursive Neural Networks (RNNs)

SVM based on the performant Spark communication framework CoCoA listed above.

Matrix Factorization Recommendation System

40x faster clustering than Spark MLlib K-Means

Build graphs using k-nearest-neighbors and locality sensitive hashing (LSH)

Online Latent Dirichlet Allocation (LDA), Gibbs Sampling LDA, Online Hierarchical Dirichlet Process (HDP)

Adaboost and MP-Boost

Linear algebra operators to work with Spark MLlib’s linalg package

Sparse feature vectors

K-Means, Regression, and Statistics

Original: https://www.cnblogs.com/timssd/p/12651209.html
Author: xxxxxxxx1x2xxxxxxx
Title: Spark的39个机器学习库

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/7141/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

发表回复

登录后才能评论
免费咨询
免费咨询
扫码关注
扫码关注
联系站长

站长Johngo!

大数据和算法重度研究者!

持续产出大数据、算法、LeetCode干货,以及业界好资源!

2022012703491714

微信来撩,免费咨询:xiaozhu_tec

分享本页
返回顶部