AutoML | AutoSklearn的基本分类、回归、多输出回归和多标签分类数据集的使用示例

2023年7月13日下午1:56 • 人工智能 • 阅读 70

以下内容大部分来自于官方文档，如有错误，恳请指出。

文章目录

0. 概念的简要介绍
1. Classification测试
2. Regression测试
3. Multi-label Classification测试
4. Multi-output Regression测试
概念的简要介绍

首先用此章节来对多类、多标签、多输出分类和回归的概念进行总结：

以下内容是为了区分：多类分类、多标签分类、多类多输出分类和多输出回归的区别，在sklearn中，有两个模块来处理以下的所有问题sklearn.multiclass和sklearn.multioutput，其树状结构梳理图如下所示：

AutoML | AutoSklearn的基本分类、回归、多输出回归和多标签分类数据集的使用示例

几个概念之间的表格区别如下所示：

1. 多类分类

多类分类是具有两个以上类的分类任务。每个样本只能标记为一个类别。

例如，使用从一组水果图像中提取的特征进行分类，其中每个图像可能是橙子、苹果或梨的图像。每个图像都是一个样本，并被标记为 3 个可能的类别之一。多类分类假设每个样本都分配给一个且只有一个标签 – 例如，一个样本不能既是梨又是苹果。

包含两个以上离散值的 1d 或列向量。4 个样本的向量y示例：

>>> import numpy as np
>>> y = np.array(['apple', 'pear', 'apple', 'orange'])
>>> print(y)
['apple' 'pear' 'apple' 'orange']

OneVsRestClassifier
one-vs-rest策略，也称为one-vs-all。在 OneVsRestClassifier该策略包括为每个类拟合一个分类器。对于每个分类器，该类与所有其他类进行拟合。除了计算效率（只n_classes需要分类器）之外，这种方法的一个优点是它的可解释性。由于每个类由一个且只有一个分类器表示，因此可以通过检查其对应的分类器来获得有关该类的知识。这是最常用的策略，也是一个公平的默认选择。
OneVsOneClassifier
OneVsOneClassifier每对类构造一个分类器。在预测时，选择得票最多的类。如果出现平局（在投票数相等的两个类中），它通过对底层二元分类器计算的成对分类置信度求和来选择具有最高总分类置信度的类。
由于它需要拟合分类器，因此由于其 O(n_classes^2) 复杂度，此方法通常比 one-vs-the-rest 慢。但是，这种方法可能有利于算法，例如不能很好地扩展的内核算法。这是因为每个单独的学习问题只涉及数据的一小部分，而在 one-vs-the-rest 的情况下，完整的数据集会被使用多次。决策函数是一对一分类单调变换的结果。

2. 多标签分类

多标签分类（与多输出分类密切相关）是一个分类任务，用m 来自n_classes可能类的标签标记每个样本，其中m可以是 0 到 n_classes包含。这可以被认为是预测样本的不相互排斥的属性。正式地，为每个样本分配一个二进制输出给每个类。正类用 1 表示，负类用 0 或 -1 表示。因此，它可以与运行二进制分类任务相媲美n_classes ，例如使用 MultiOutputClassifier. 这种方法独立处理每个标签，而多标签分类器可以同时处理多个类，考虑它们之间的相关行为。
例如，预测与文本文档或视频相关的主题。文档或视频可以是关于”宗教”、”政治”、”金融”或”教育”之一、几个主题类或所有主题类。

多标签的有效表示是shape y的密集或稀疏二进制矩阵。每列代表一个类。每行中的1表示样本已标记的正类。3 个样本的密集矩阵示例：(n_samples, n_classes)

>>> y = np.array([[1, 0, 0, 1], [0, 0, 1, 1], [0, 0, 0, 0]])
>>> print(y)
[[1 0 0 1]
 [0 0 1 1]
 [0 0 0 0]]

3. 多类多输出分类

多类多输出分类（也称为多任务分类）是一种分类任务，它用一组非二进制属性标记每个样本。属性的数量和每个属性的类数都大于 2。因此，单个估计器可以处理多个联合分类任务。这既是多标签分类任务的泛化，只考虑二元属性，也是多类分类任务的泛化，只考虑一个属性。
例如，对一组水果图像的属性”水果类型”和”颜色”进行分类。属性”水果类型”有可能的类：”苹果”、”梨”和”橙子”。属性”color”具有可能的类别：”green”、”red”、”yellow”和”orange”。每个样本都是水果的图像，为两个属性输出一个标签，每个标签是相应属性的可能类别之一。
请注意，所有处理多类多输出（也称为多任务分类）任务的分类器都支持多标签分类任务作为特例。多任务分类类似于具有不同模型公式的多输出分类任务。

多输出的有效表示是类标签形状的密集 y矩阵。一维多类变量的逐列串联。3 个样本的示例：(n_samples, n_classes)

>>> y = np.array([['apple', 'green'], ['orange', 'orange'], ['pear', 'green']])
>>> print(y)
[['apple' 'green']
 ['orange' 'orange']
 ['pear' 'green']]

4. 多输出回归

多输出回归预测每个样本的多个数值属性。每个属性都是一个数值变量，每个样本要预测的属性数大于或等于 2。一些支持多输出回归的估计器比仅运行n_output 估计器更快。
例如，使用在某个位置获得的数据预测风速和风向（以度为单位）。每个样本将是在一个位置获得的数据，并且将为每个样本输出风速和风向。

多输出的有效表示是浮点 y形状的密集矩阵。连续变量的逐列串联。3 个样本的示例：(n_samples, n_output)

>>> y = np.array([[31.4, 94], [40.5, 109], [25.0, 30]])
>>> print(y)
[[ 31.4  94. ]
 [ 40.5 109. ]
 [ 25.   30. ]]

总结：
对于以上内容应该有个比较仔细的了解，一句话堆这些概念进行说明就是。多类分类就是普通的多分类问题；而多标签分类就是对样本进行多类型的二分类问题；而多类多输出就是对多类型都进行一个多分类问题。多输出回归就比较简单了，就是需要回归几个输出值。

Classification测试

import numpy as np
import sklearn.metrics
from pprint import pprint
from sklearn import datasets
from sklearn.model_selection import train_test_split
from autosklearn.classification import AutoSklearnClassifier


X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)

SEED = 42
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=SEED)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

((455, 30), (114, 30), (455,), (114,))

fit的运行过程中可能会出现内存溢出的错误：

[ERROR] [2022-03-05 16:51:41,118:Client-AutoML(1):breast_cancer] Dummy prediction failed with run state StatusType.MEMOUT and additional output: {‘error’: ‘Memout (used more than 3072 MB).’, ‘configuration_origin’: ‘DUMMY’}.

当以下代码出现以上错误时，说明out of memory，也就是内容分配不足，可以在memory_limit中设置的运行内存限制大一点。

automl = autosklearn.classification.AutoSklearnClassifier(
     ...

     # default&#xFF1A;memory_limit=3072,
     memory_limit=6144,
    ...

)

而且，如果文件已经存在同样会报错

FileExistsError: [Errno 17] File exists: ‘./autosklearn_classification_example_tmp’

重新训练时需要把这个文件夹删除，如果没有设置tmp_folder，默认创建为： /tmp/autosklearn_tmp_$pid_$random_number


automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=180,
    per_run_time_limit=30,
    memory_limit=8192,
    tmp_folder='./autosklearn_classification_example_tmp',
)
automl.fit(X_train, y_train, dataset_name='breast_cancer')

AutoSklearnClassifier(memory_limit=8192, per_run_time_limit=30,
                      time_left_for_this_task=180,
                      tmp_folder='./autosklearn_classification_example_tmp')


print(automl.leaderboard())

          rank  ensemble_weight                 type      cost  duration
model_id
54           1             0.08                  mlp  0.013245  0.925545
6            2             0.02                  mlp  0.019868  0.813444
4            3             0.04                  mlp  0.026490  1.093411
46           4             0.04                  sgd  0.026490  0.950709
7            5             0.02          extra_trees  0.033113  1.047053
10           6             0.04    gradient_boosting  0.033113  0.852145
21           7             0.12                  mlp  0.033113  1.647795
2            8             0.02        random_forest  0.046358  1.156928
53           9             0.04                  mlp  0.046358  0.866593
12          10             0.02    gradient_boosting  0.046358  1.059940
14          11             0.04                  mlp  0.046358  1.426609
15          12             0.04                  mlp  0.046358  2.378096
5           13             0.04        random_forest  0.052980  1.392029
40          14             0.06                  lda  0.052980  0.701233
33          15             0.02                  mlp  0.052980  1.394135
19          16             0.06          extra_trees  0.059603  2.198166
16          17             0.04        random_forest  0.059603  1.387480
8           18             0.02        random_forest  0.059603  1.359141
11          19             0.02        random_forest  0.066225  2.323715
9           20             0.02          extra_trees  0.066225  1.251749
57          21             0.02                  mlp  0.066225  0.806810
42          22             0.02  k_nearest_neighbors  0.079470  0.627965
30          23             0.02                  mlp  0.099338  1.350389
20          24             0.02   passive_aggressive  0.099338  0.538802
31          25             0.02                  mlp  0.112583  1.907936
38          26             0.02                  mlp  0.119205  0.795489
36          27             0.02                  mlp  0.125828  0.937948
51          28             0.06                  lda  0.139073  1.157996


pprint(automl.show_models(), indent=4)

{   2: {   'balancing': Balancing(random_state=1),
           'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc91862250>,
           'cost': 0.04635761589403975,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc3839ac10>,
           'ensemble_weight': 0.02,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc91862190>,
           'model_id': 2,
           'rank': 8,
           'sklearn_classifier': RandomForestClassifier(max_features=5, n_estimators=512, n_jobs=1,
                       random_state=1, warm_start=True)},
    4: {   'balancing': Balancing(random_state=1, strategy='weighting'),
           'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc38307850>,
           'cost': 0.026490066225165587,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdd0115b400>,
           'ensemble_weight': 0.04,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc383078b0>,
           'model_id': 4,
           'rank': 3,
           'sklearn_classifier': MLPClassifier(activation='tanh', alpha=0.00021148999718383549, beta_1=0.999,
              beta_2=0.9, hidden_layer_sizes=(113, 113, 113),
              learning_rate_init=0.0007452270241186694, max_iter=64,
              n_iter_no_change=32, random_state=1, validation_fraction=0.0,
              verbose=0, warm_start=True)},
    5: {   'balancing': Balancing(random_state=1, strategy='weighting'),
           'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc910d8580>,
           'cost': 0.052980132450331174,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc91680be0>,
           'ensemble_weight': 0.04,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc912b3a60>,
           'model_id': 5,
           'rank': 13,
           'sklearn_classifier': RandomForestClassifier(criterion='entropy', max_features=3, min_samples_leaf=2,
                       n_estimators=512, n_jobs=1, random_state=1,
                       warm_start=True)},
    6: {   'balancing': Balancing(random_state=1, strategy='weighting'),
           'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc38913370>,
           'cost': 0.019867549668874163,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc386a3250>,
           'ensemble_weight': 0.02,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc38913730>,
           'model_id': 6,
           'rank': 2,
           'sklearn_classifier': MLPClassifier(alpha=0.0017940473175767063, beta_1=0.999, beta_2=0.9,
              early_stopping=True, hidden_layer_sizes=(101, 101),
              learning_rate_init=0.0004684917334431039, max_iter=32,
              n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)},
    7: {   'balancing': Balancing(random_state=1),
           'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdd01155f10>,
           'cost': 0.0331125827814569,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc93e00c40>,
           'ensemble_weight': 0.02,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc3853b160>,
           'model_id': 7,
           'rank': 5,
           'sklearn_classifier': ExtraTreesClassifier(max_features=34, min_samples_leaf=3, min_samples_split=11,
                     n_estimators=512, n_jobs=1, random_state=1,
                     warm_start=True)},
    8: {   'balancing': Balancing(random_state=1, strategy='weighting'),
           'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc90da5790>,
           'cost': 0.05960264900662249,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc91288760>,
           'ensemble_weight': 0.02,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc90e3b6d0>,
           'model_id': 8,
           'rank': 16,
           'sklearn_classifier': RandomForestClassifier(max_features=2, min_samples_leaf=2, n_estimators=512,
                       n_jobs=1, random_state=1, warm_start=True)},
    9: {   'balancing': Balancing(random_state=1, strategy='weighting'),
           'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc907eb4c0>,
           'cost': 0.06622516556291391,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc90e51f10>,
           'ensemble_weight': 0.02,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc90898fd0>,
           'model_id': 9,
           'rank': 19,
           'sklearn_classifier': ExtraTreesClassifier(max_features=6, min_samples_split=10, n_estimators=512,
                     n_jobs=1, random_state=1, warm_start=True)},
    10: {   'balancing': Balancing(random_state=1),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc940396a0>,
            'cost': 0.0331125827814569,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc38692a00>,
            'ensemble_weight': 0.04,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc940399d0>,
            'model_id': 10,
            'rank': 6,
            'sklearn_classifier': HistGradientBoostingClassifier(early_stopping=True,
                               l2_regularization=0.005326508887463406,
                               learning_rate=0.060800813211425456, max_iter=512,
                               max_leaf_nodes=6, min_samples_leaf=5,
                               n_iter_no_change=5, random_state=1,
                               validation_fraction=None, warm_start=True)},
    11: {   'balancing': Balancing(random_state=1),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc90697640>,
            'cost': 0.06622516556291391,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc90d61a00>,
            'ensemble_weight': 0.02,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc90697340>,
            'model_id': 11,
            'rank': 20,
            'sklearn_classifier': RandomForestClassifier(criterion='entropy', max_features=23, min_samples_leaf=7,
                       n_estimators=512, n_jobs=1, random_state=1,
                       warm_start=True)},
    12: {   'balancing': Balancing(random_state=1, strategy='weighting'),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc918473d0>,
            'cost': 0.04635761589403975,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc93bae550>,
            'ensemble_weight': 0.02,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc91847250>,
            'model_id': 12,
            'rank': 9,
            'sklearn_classifier': HistGradientBoostingClassifier(early_stopping=False,
                               l2_regularization=1.0647401999412075e-10,
                               learning_rate=0.08291320147381159, max_iter=512,
                               max_leaf_nodes=39, n_iter_no_change=0,
                               random_state=1, validation_fraction=None,
                               warm_start=True)},
    14: {   'balancing': Balancing(random_state=1),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc913928b0>,
            'cost': 0.04635761589403975,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc918d66a0>,
            'ensemble_weight': 0.04,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc91665fa0>,
            'model_id': 14,
            'rank': 10,
            'sklearn_classifier': MLPClassifier(activation='tanh', alpha=2.5550223982458062e-06, beta_1=0.999,
              beta_2=0.9, hidden_layer_sizes=(54, 54, 54),
              learning_rate_init=0.00027271287919467994, max_iter=256,
              n_iter_no_change=32, random_state=1, validation_fraction=0.0,
              verbose=0, warm_start=True)},
    15: {   'balancing': Balancing(random_state=1, strategy='weighting'),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc913921f0>,
            'cost': 0.04635761589403975,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc91862e50>,
            'ensemble_weight': 0.04,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc91392070>,
            'model_id': 15,
            'rank': 11,
            'sklearn_classifier': MLPClassifier(alpha=4.2841884333778574e-06, beta_1=0.999, beta_2=0.9,
              hidden_layer_sizes=(263, 263, 263),
              learning_rate_init=0.0011804284312897009, max_iter=128,
              n_iter_no_change=32, random_state=1, validation_fraction=0.0,
              verbose=0, warm_start=True)},
    16: {   'balancing': Balancing(random_state=1, strategy='weighting'),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc90c58880>,
            'cost': 0.05960264900662249,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc910d8ee0>,
            'ensemble_weight': 0.04,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc90e12610>,
            'model_id': 16,
            'rank': 17,
            'sklearn_classifier': RandomForestClassifier(criterion='entropy', max_features=3, n_estimators=512,
                       n_jobs=1, random_state=1, warm_start=True)},
    19: {   'balancing': Balancing(random_state=1, strategy='weighting'),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc90cf7a00>,
            'cost': 0.05960264900662249,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc90f01280>,
            'ensemble_weight': 0.06,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc90cf7640>,
            'model_id': 19,
            'rank': 18,
            'sklearn_classifier': ExtraTreesClassifier(criterion='entropy', max_features=448, min_samples_leaf=2,
                     min_samples_split=20, n_estimators=512, n_jobs=1,
                     random_state=1, warm_start=True)},
    20: {   'balancing': Balancing(random_state=1),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc9023e610>,
            'cost': 0.09933774834437081,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc905ab160>,
            'ensemble_weight': 0.02,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc9023e490>,
            'model_id': 20,
            'rank': 23,
            'sklearn_classifier': PassiveAggressiveClassifier(C=0.14268277711454813, max_iter=32, random_state=1,
                            tol=0.0002600768160857831, warm_start=True)},
    21: {   'balancing': Balancing(random_state=1, strategy='weighting'),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc93baefa0>,
            'cost': 0.0331125827814569,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc37a46af0>,
            'ensemble_weight': 0.12,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc93baeee0>,
            'model_id': 21,
            'rank': 7,
            'sklearn_classifier': MLPClassifier(alpha=0.02847755502162456, beta_1=0.999, beta_2=0.9,
              hidden_layer_sizes=(123, 123),
              learning_rate_init=0.000421568792103947, max_iter=256,
              n_iter_no_change=32, random_state=1, validation_fraction=0.0,
              verbose=0, warm_start=True)},
    30: {   'balancing': Balancing(random_state=1, strategy='weighting'),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc901c8c40>,
            'cost': 0.09933774834437081,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc909841c0>,
            'ensemble_weight': 0.02,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc901c8940>,
            'model_id': 30,
            'rank': 24,
            'sklearn_classifier': MLPClassifier(activation='tanh', alpha=8.05325583028895e-05, beta_1=0.999,
              beta_2=0.9, hidden_layer_sizes=(140, 140),
              learning_rate_init=0.0005706565389402362, max_iter=128,
              n_iter_no_change=32, random_state=1, validation_fraction=0.0,
              verbose=0, warm_start=True)},
    31: {   'balancing': Balancing(random_state=1, strategy='weighting'),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc90163940>,
            'cost': 0.11258278145695366,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc902a40a0>,
            'ensemble_weight': 0.02,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc90163820>,
            'model_id': 31,
            'rank': 25,
            'sklearn_classifier': MLPClassifier(alpha=0.0001363185819149026, beta_1=0.999, beta_2=0.9,
              hidden_layer_sizes=(139, 139, 139),
              learning_rate_init=0.00018009776276177523, max_iter=256,
              n_iter_no_change=32, random_state=1, validation_fraction=0.0,
              verbose=0, warm_start=True)},
    33: {   'balancing': Balancing(random_state=1, strategy='weighting'),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc910d80a0>,
            'cost': 0.052980132450331174,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc91392d60>,
            'ensemble_weight': 0.02,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc91158e20>,
            'model_id': 33,
            'rank': 14,
            'sklearn_classifier': MLPClassifier(activation='tanh', alpha=0.000807743464484268, beta_1=0.999,
              beta_2=0.9, hidden_layer_sizes=(139,),
              learning_rate_init=0.00021433050558430938, max_iter=256,
              n_iter_no_change=32, random_state=1, validation_fraction=0.0,
              verbose=0, warm_start=True)},
    36: {   'balancing': Balancing(random_state=1, strategy='weighting'),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc90072730>,
            'cost': 0.1258278145695364,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc901d3a60>,
            'ensemble_weight': 0.02,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc900725b0>,
            'model_id': 36,
            'rank': 27,
            'sklearn_classifier': MLPClassifier(alpha=0.05657753566180125, beta_1=0.999, beta_2=0.9,
              early_stopping=True, hidden_layer_sizes=(150, 150, 150),
              learning_rate_init=0.0284552208272282, max_iter=32,
              n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)},
    38: {   'balancing': Balancing(random_state=1),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc900dc5b0>,
            'cost': 0.11920529801324509,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc901c04c0>,
            'ensemble_weight': 0.02,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc90169a30>,
            'model_id': 38,
            'rank': 26,
            'sklearn_classifier': MLPClassifier(activation='tanh', alpha=0.03530075517934556, beta_1=0.999,
              beta_2=0.9, early_stopping=True, hidden_layer_sizes=(151, 151),
              learning_rate_init=0.012624724152433505, max_iter=32,
              n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)},
    40: {   'balancing': Balancing(random_state=1),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc90fb6490>,
            'cost': 0.052980132450331174,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc912abfd0>,
            'ensemble_weight': 0.06,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc90fb6310>,
            'model_id': 40,
            'rank': 15,
            'sklearn_classifier': LinearDiscriminantAnalysis(tol=8.850809824093198e-05)},
    42: {   'balancing': Balancing(random_state=1, strategy='weighting'),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc90382400>,
            'cost': 0.07947019867549665,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc9096c6a0>,
            'ensemble_weight': 0.02,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc9036cdc0>,
            'model_id': 42,
            'rank': 22,
            'sklearn_classifier': KNeighborsClassifier(n_neighbors=27, p=1)},
    46: {   'balancing': Balancing(random_state=1),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc37a46f10>,
            'cost': 0.026490066225165587,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc3805a700>,
            'ensemble_weight': 0.04,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc37a461c0>,
            'model_id': 46,
            'rank': 4,
            'sklearn_classifier': SGDClassifier(alpha=0.0028239629801064844, average=True,
              epsilon=0.01391093587699247, eta0=0.01, loss='modified_huber',
              max_iter=128, penalty='l1', random_state=1,
              tol=0.0005283535863021666, warm_start=True)},
    51: {   'balancing': Balancing(random_state=1, strategy='weighting'),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc8ff7ca90>,
            'cost': 0.13907284768211925,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc90163fa0>,
            'ensemble_weight': 0.06,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc8ff7c9a0>,
            'model_id': 51,
            'rank': 28,
            'sklearn_classifier': LinearDiscriminantAnalysis(shrinkage=0.2362694848390572, solver='lsqr',
                           tol=4.087618610024571e-05)},
    53: {   'balancing': Balancing(random_state=1),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc912ab940>,
            'cost': 0.04635761589403975,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc916655e0>,
            'ensemble_weight': 0.04,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc912ab850>,
            'model_id': 53,
            'rank': 12,
            'sklearn_classifier': MLPClassifier(activation='tanh', alpha=0.00011205455217546472, beta_1=0.999,
              beta_2=0.9, early_stopping=True, hidden_layer_sizes=(113, 113),
              learning_rate_init=0.0010157011622160305, max_iter=32,
              n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)},
    54: {   'balancing': Balancing(random_state=1),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc93e00fa0>,
            'cost': 0.013245033112582738,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc384d8e50>,
            'ensemble_weight': 0.08,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc93e002e0>,
            'model_id': 54,
            'rank': 1,
            'sklearn_classifier': MLPClassifier(alpha=0.00016472833354638788, beta_1=0.999, beta_2=0.9,
              early_stopping=True, hidden_layer_sizes=(113, 113),
              learning_rate_init=0.0007607734350660931, max_iter=32,
              n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)},
    57: {   'balancing': Balancing(random_state=1),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7fdc90327940>,
            'cost': 0.06622516556291391,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7fdc909420d0>,
            'ensemble_weight': 0.02,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7fdc904db340>,
            'model_id': 57,
            'rank': 21,
            'sklearn_classifier': MLPClassifier(alpha=0.0023369498985981963, beta_1=0.999, beta_2=0.9,
              early_stopping=True, hidden_layer_sizes=(103, 103),
              learning_rate_init=0.0004684917334431039, max_iter=32,
              n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)}}
</autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice>


predictions = automl.predict(X_test)
print("Accuracy score:", sklearn.metrics.accuracy_score(y_test, predictions))

Accuracy score: 0.9824561403508771

correct = np.equal(predictions, y_test).sum()
totals = predictions.size
score = correct / totals
print("correct:{}, totals:{}, scores:{}".format(correct, totals, score))

correct:112, totals:114, scores:0.9824561403508771

查看结果可以发现性能十分强悍了，只错了2个，正确率高达98.2%

Regression测试

import numpy as np
import sklearn.metrics
import matplotlib.pyplot as plt
from pprint import pprint
from sklearn import datasets
from sklearn.model_selection import train_test_split
import autosklearn.regression


X, y = sklearn.datasets.load_diabetes(return_X_y=True)

SEED = 42
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=SEED)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

((353, 10), (89, 10), (353,), (89,))


automl = autosklearn.regression.AutoSklearnRegressor(
    time_left_for_this_task=180,
    per_run_time_limit=30,
    memory_limit=8192,
    tmp_folder='./autosklearn_regression_example_tmp',
)
automl.fit(X_train, y_train, dataset_name='diabetes')

AutoSklearnRegressor(memory_limit=8192, per_run_time_limit=30,
                     time_left_for_this_task=180,
                     tmp_folder='./autosklearn_regression_example_tmp')


print(automl.leaderboard())

          rank  ensemble_weight              type      cost   duration
model_id
59           1             0.52        libsvm_svr  0.496368   0.820438
62           2             0.14    ard_regression  0.503867   0.474854
34           3             0.04     liblinear_svr  0.506597   0.465134
5            4             0.04  gaussian_process  0.571439  11.650054
22           5             0.14        libsvm_svr  0.580072   0.481025
29           6             0.10  gaussian_process  0.596072   0.694429
36           7             0.02     liblinear_svr  0.680804   0.522072


pprint(automl.show_models(), indent=4)

{   5: {   'cost': 0.5714392217171937,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f4355a10730>,
           'ensemble_weight': 0.04,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f4355ab9d90>,
           'model_id': 5,
           'rank': 4,
           'regressor': <autosklearn.pipeline.components.regression.regressorchoice object at 0x7f4355ab9f70>,
           'sklearn_regressor': GaussianProcessRegressor(alpha=0.283161627129086,
                         kernel=RBF(length_scale=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
                         n_restarts_optimizer=10, normalize_y=True,
                         random_state=1)},
    22: {   'cost': 0.5800720723074761,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f436cd95b80>,
            'ensemble_weight': 0.14,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f435597b130>,
            'model_id': 22,
            'rank': 5,
            'regressor': <autosklearn.pipeline.components.regression.regressorchoice object at 0x7f435597b820>,
            'sklearn_regressor': SVR(C=1.4272136443763257, cache_size=5357.580729166667,
    coef0=0.2694141260648879, degree=2, epsilon=0.10000000000000006,
    gamma=0.05757315877344016, kernel='poly', shrinking=False,
    tol=0.0010000000000000002, verbose=0)},
    29: {   'cost': 0.596072394456454,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f428a455640>,
            'ensemble_weight': 0.1,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f428a40e310>,
            'model_id': 29,
            'rank': 6,
            'regressor': <autosklearn.pipeline.components.regression.regressorchoice object at 0x7f428a40e790>,
            'sklearn_regressor': GaussianProcessRegressor(alpha=0.22788692419220857,
                         kernel=RBF(length_scale=[1, 1, 1, 1, 1, 1, 1, 1, 1]),
                         n_restarts_optimizer=10, normalize_y=True,
                         random_state=1)},
    34: {   'cost': 0.5065968734118893,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f428a1b7940>,
            'ensemble_weight': 0.04,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f43558ff490>,
            'model_id': 34,
            'rank': 3,
            'regressor': <autosklearn.pipeline.components.regression.regressorchoice object at 0x7f43558ffd30>,
            'sklearn_regressor': LinearSVR(C=25232.12061129609, dual=False, epsilon=0.002019395600869544,
          loss='squared_epsilon_insensitive', random_state=1,
          tol=0.009223250275815446)},
    36: {   'cost': 0.6808038917513319,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f43558b9d30>,
            'ensemble_weight': 0.02,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f43559bc040>,
            'model_id': 36,
            'rank': 7,
            'regressor': <autosklearn.pipeline.components.regression.regressorchoice object at 0x7f4355840070>,
            'sklearn_regressor': LinearSVR(C=113.58659319519185, dual=False, epsilon=0.953621220533319,
          loss='squared_epsilon_insensitive', random_state=1,
          tol=0.006172262678900209)},
    59: {   'cost': 0.496368425910942,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f428a3fe490>,
            'ensemble_weight': 0.52,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f428a3dcc10>,
            'model_id': 59,
            'rank': 1,
            'regressor': <autosklearn.pipeline.components.regression.regressorchoice object at 0x7f428a3dcd00>,
            'sklearn_regressor': SVR(C=0.8411452049277826, cache_size=5349.908854166667,
    coef0=0.028890874524519994, epsilon=0.00577061356609876, gamma=0.1,
    kernel='sigmoid', tol=0.0006935969948540294, verbose=0)},
    62: {   'cost': 0.5038673768126611,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f4289fed160>,
            'ensemble_weight': 0.14,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f4355a10550>,
            'model_id': 62,
            'rank': 2,
            'regressor': <autosklearn.pipeline.components.regression.regressorchoice object at 0x7f4355a10e20>,
            'sklearn_regressor': ARDRegression(alpha_1=0.0009920132163129295, alpha_2=1.7797740837908024e-05,
              copy_X=False, lambda_1=4.023304088550062e-09,
              lambda_2=3.759668315507968e-08,
              threshold_lambda=72842.75949581455, tol=0.0667287949732316)}}
</autosklearn.pipeline.components.regression.regressorchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.regression.regressorchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.regression.regressorchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.regression.regressorchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.regression.regressorchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.regression.regressorchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.regression.regressorchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice>


train_predictions = automl.predict(X_train)
print("Train R2 score:", sklearn.metrics.r2_score(y_train, train_predictions))
test_predictions = automl.predict(X_test)
print("Test R2 score:", sklearn.metrics.r2_score(y_test, test_predictions))

Train R2 score: 0.5538993106240657
Test R2 score: 0.48404488514413047


plt.scatter(train_predictions, y_train, label="Train samples", c='#d95f02')
plt.scatter(test_predictions, y_test, label="Test samples", c='#7570b3')
plt.xlabel("Predicted value")
plt.ylabel("True value")
plt.legend()

plt.plot([30, 400], [30, 400], c='k', zorder=0)
plt.xlim([30, 400])
plt.ylim([30, 400])
plt.tight_layout()
plt.show()

Multi-label Classification测试

import numpy as np
import sklearn.datasets
import sklearn.metrics
import autosklearn.classification
from pprint import pprint
from sklearn.utils.multiclass import type_of_target
from sklearn.model_selection import train_test_split


X, y = sklearn.datasets.fetch_openml(data_id=40594, return_X_y=True, as_frame=False)

y[y == 'TRUE'] = 1
y[y == 'FALSE'] = 0
y = y.astype(int)

print(f"type_of_target={type_of_target(y)}")

SEED = 42
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=SEED)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

type_of_target=multilabel-indicator

((1600, 243), (400, 243), (1600, 7), (400, 7))


automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=180,
    per_run_time_limit=30,
    initial_configurations_via_metalearning=0,
    memory_limit=8192,

    tmp_folder='./autosklearn_multi_classification_example_tmp',
)

automl.fit(X_train, y_train, dataset_name='reuters')

AutoSklearnClassifier(initial_configurations_via_metalearning=0,
                      memory_limit=8192, per_run_time_limit=30,
                      time_left_for_this_task=180,
                      tmp_folder='./autosklearn_multi_classification_example_tmp')


print(automl.leaderboard())

          rank  ensemble_weight                 type      cost  duration
model_id
31           1             0.34  k_nearest_neighbors  0.398509  3.560389
18           2             0.18          gaussian_nb  0.461616  0.523785
11           3             0.18          gaussian_nb  0.488684  0.543580
2            4             0.02        random_forest  0.489481  2.743526
9            5             0.08         bernoulli_nb  0.513874  3.612643
8            6             0.02                  mlp  0.515206  3.365863
23           7             0.04          gaussian_nb  0.540634  0.765986
25           8             0.04         bernoulli_nb  0.547112  0.543565
10           9             0.02       multinomial_nb  0.577200  1.838750
21          10             0.08          gaussian_nb  0.599070  0.575137


pprint(automl.show_models(), indent=4)

{   2: {   'balancing': Balancing(random_state=1),
           'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7f9bc4b9da60>,
           'cost': 0.48948102811225125,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f9aeacd01c0>,
           'ensemble_weight': 0.02,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f9bc4b9d0a0>,
           'model_id': 2,
           'rank': 4,
           'sklearn_classifier': RandomForestClassifier(max_features=15, n_estimators=512, n_jobs=1,
                       random_state=1, warm_start=True)},
    8: {   'balancing': Balancing(random_state=1),
           'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7f9bc4a200a0>,
           'cost': 0.5152055408242915,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f9bc4a00760>,
           'ensemble_weight': 0.02,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f9bc4a20280>,
           'model_id': 8,
           'rank': 6,
           'sklearn_classifier': MLPClassifier(activation='tanh', alpha=0.0029548979739433792, beta_1=0.999,
              beta_2=0.9, hidden_layer_sizes=(31, 31),
              learning_rate_init=0.00022421940958541154, max_iter=256,
              n_iter_no_change=32, random_state=1, validation_fraction=0.0,
              verbose=0, warm_start=True)},
    9: {   'balancing': Balancing(random_state=1),
           'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7f9bc4a60f10>,
           'cost': 0.5138737715667154,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f9bc4ba8400>,
           'ensemble_weight': 0.08,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f9bc4a60340>,
           'model_id': 9,
           'rank': 5,
           'sklearn_classifier': OneVsRestClassifier(estimator=BernoulliNB(alpha=0.3379748507977488), n_jobs=1)},
    10: {   'balancing': Balancing(random_state=1, strategy='weighting'),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7f9bc4c3f160>,
            'cost': 0.5772002498640842,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f9bc4a95ee0>,
            'ensemble_weight': 0.02,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f9bc4a76400>,
            'model_id': 10,
            'rank': 9,
            'sklearn_classifier': OneVsRestClassifier(estimator=MultinomialNB(alpha=4.603485200325942), n_jobs=1)},
    11: {   'balancing': Balancing(random_state=1, strategy='weighting'),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7f9bc4a00c40>,
            'cost': 0.48868433214909013,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f9bc4b04580>,
            'ensemble_weight': 0.18,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f9bc4a00580>,
            'model_id': 11,
            'rank': 3,
            'sklearn_classifier': OneVsRestClassifier(estimator=GaussianNB(), n_jobs=1)},
    18: {   'balancing': Balancing(random_state=1),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7f9aeb297ca0>,
            'cost': 0.46161569155493276,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f9ae8df3310>,
            'ensemble_weight': 0.18,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f9aeb2972e0>,
            'model_id': 18,
            'rank': 2,
            'sklearn_classifier': OneVsRestClassifier(estimator=GaussianNB(), n_jobs=1)},
    21: {   'balancing': Balancing(random_state=1, strategy='weighting'),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7f9bc440aeb0>,
            'cost': 0.5990703229537311,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f9bc49eb190>,
            'ensemble_weight': 0.08,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f9bc4b785e0>,
            'model_id': 21,
            'rank': 10,
            'sklearn_classifier': OneVsRestClassifier(estimator=GaussianNB(), n_jobs=1)},
    23: {   'balancing': Balancing(random_state=1),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7f9bc4b3f4f0>,
            'cost': 0.540634450557514,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f9bc4b82a90>,
            'ensemble_weight': 0.04,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f9bc4b3f250>,
            'model_id': 23,
            'rank': 7,
            'sklearn_classifier': OneVsRestClassifier(estimator=GaussianNB(), n_jobs=1)},
    25: {   'balancing': Balancing(random_state=1),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7f9bc4bf13d0>,
            'cost': 0.5471123873198415,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f9bc4a60e50>,
            'ensemble_weight': 0.04,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f9bc4bf19a0>,
            'model_id': 25,
            'rank': 8,
            'sklearn_classifier': OneVsRestClassifier(estimator=BernoulliNB(alpha=0.1588461793645986), n_jobs=1)},
    31: {   'balancing': Balancing(random_state=1),
            'classifier': <autosklearn.pipeline.components.classification.classifierchoice object at 0x7f9ae8df3820>,
            'cost': 0.3985087938580333,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f9ae8df36d0>,
            'ensemble_weight': 0.34,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f9ae8df38b0>,
            'model_id': 31,
            'rank': 1,
            'sklearn_classifier': OneVsRestClassifier(estimator=KNeighborsClassifier(n_neighbors=1, p=1),
                    n_jobs=1)}}
</autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice></autosklearn.pipeline.components.classification.classifierchoice>


print(automl.sprint_statistics())

auto-sklearn results:
  Dataset name: reuters
  Metric: f1_macro
  Best validation score: 0.601491
  Number of target algorithm runs: 32
  Number of successful target algorithm runs: 24
  Number of crashed target algorithm runs: 3
  Number of target algorithms that exceeded the time limit: 3
  Number of target algorithms that exceeded the memory limit: 2


predictions = automl.predict(X_test)
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))

Accuracy score 0.6025

Multi-output Regression测试

import numpy as np
import sklearn.metrics
import matplotlib.pyplot as plt
from pprint import pprint
from sklearn import datasets
from sklearn.model_selection import train_test_split
import autosklearn.regression


X, y = sklearn.datasets.make_regression(
    n_samples=1000, n_features=10, n_informative=5, n_targets=3
)

SEED = 42
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=SEED)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

((800, 10), (200, 10), (800, 3), (200, 3))


automl = autosklearn.regression.AutoSklearnRegressor(
    time_left_for_this_task=180,
    per_run_time_limit=30,
    memory_limit=8192,
    tmp_folder='./autosklearn_multi_regression_example_tmp',
)
automl.fit(X_train, y_train, dataset_name='synthetic')

[WARNING] [2022-03-05 21:31:30,473:Client-AutoMLSMBO(1)::synthetic] Could not find meta-data directory /home/fs/anaconda3/envs/automl/lib/python3.9/site-packages/autosklearn/metalearning/files/r2_multioutput.regression_dense

AutoSklearnRegressor(memory_limit=8192, per_run_time_limit=30,
                     time_left_for_this_task=180,
                     tmp_folder='./autosklearn_regression_example_tmp')


print(automl.leaderboard())

          rank  ensemble_weight              type          cost  duration
model_id
22           1              1.0  gaussian_process  1.211008e-09  4.055037


pprint(automl.show_models(), indent=4)

{   22: {   'cost': 1.2110078495553012e-09,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice object at 0x7f62ad156d00>,
            'ensemble_weight': 1.0,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice object at 0x7f62ad266250>,
            'model_id': 22,
            'rank': 1,
            'regressor': <autosklearn.pipeline.components.regression.regressorchoice object at 0x7f62ad1543d0>,
            'sklearn_regressor': GaussianProcessRegressor(alpha=1.4980082486136626e-11,
                         kernel=RBF(length_scale=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
                         n_restarts_optimizer=10, normalize_y=True,
                         random_state=1)}}
</autosklearn.pipeline.components.regression.regressorchoice></autosklearn.pipeline.components.feature_preprocessing.featurepreprocessorchoice></autosklearn.pipeline.components.data_preprocessing.datapreprocessorchoice>


train_predictions = automl.predict(X_train)
print("Train R2 score:", sklearn.metrics.r2_score(y_train, train_predictions))
test_predictions = automl.predict(X_test)
print("Test R2 score:", sklearn.metrics.r2_score(y_test, test_predictions))

Train R2 score: 0.9999999996059499
Test R2 score: 0.9999999995397361

这个结果，有点离谱，已经100%拟合数据集了，可能利用模型随机初始化的数据集与真实的数据集还是相差比较大的，确实了一点真实数据的分布


print(automl.get_configuration_space(X_train, y_train))

Configuration space object:
  Hyperparameters:
    data_preprocessor:__choice__, Type: Categorical, Choices: {feature_type}, Default: feature_type
    data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__, Type: Categorical, Choices: {encoding, no_encoding, one_hot_encoding}, Default: one_hot_encoding
    data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__, Type: Categorical, Choices: {minority_coalescer, no_coalescense}, Default: minority_coalescer
    data_preprocessor:feature_type:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction, Type: UniformFloat, Range: [0.0001, 0.5], Default: 0.01, on log-scale
    data_preprocessor:feature_type:numerical_transformer:imputation:strategy, Type: Categorical, Choices: {mean, median, most_frequent}, Default: mean
    data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__, Type: Categorical, Choices: {minmax, none, normalize, power_transformer, quantile_transformer, robust_scaler, standardize}, Default: standardize
    data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:n_quantiles, Type: UniformInteger, Range: [10, 2000], Default: 1000
    data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:output_distribution, Type: Categorical, Choices: {uniform, normal}, Default: uniform
    data_preprocessor:feature_type:numerical_transformer:rescaling:robust_scaler:q_max, Type: UniformFloat, Range: [0.7, 0.999], Default: 0.75
    data_preprocessor:feature_type:numerical_transformer:rescaling:robust_scaler:q_min, Type: UniformFloat, Range: [0.001, 0.3], Default: 0.25
    feature_preprocessor:__choice__, Type: Categorical, Choices: {extra_trees_preproc_for_regression, fast_ica, feature_agglomeration, kernel_pca, kitchen_sinks, no_preprocessing, nystroem_sampler, pca, polynomial, random_trees_embedding}, Default: no_preprocessing
    feature_preprocessor:extra_trees_preproc_for_regression:bootstrap, Type: Categorical, Choices: {True, False}, Default: False
    feature_preprocessor:extra_trees_preproc_for_regression:criterion, Type: Categorical, Choices: {mse, friedman_mse, mae}, Default: mse
    feature_preprocessor:extra_trees_preproc_for_regression:max_depth, Type: Constant, Value: None
    feature_preprocessor:extra_trees_preproc_for_regression:max_features, Type: UniformFloat, Range: [0.1, 1.0], Default: 1.0
    feature_preprocessor:extra_trees_preproc_for_regression:max_leaf_nodes, Type: Constant, Value: None
    feature_preprocessor:extra_trees_preproc_for_regression:min_samples_leaf, Type: UniformInteger, Range: [1, 20], Default: 1
    feature_preprocessor:extra_trees_preproc_for_regression:min_samples_split, Type: UniformInteger, Range: [2, 20], Default: 2
    feature_preprocessor:extra_trees_preproc_for_regression:min_weight_fraction_leaf, Type: Constant, Value: 0.0
    feature_preprocessor:extra_trees_preproc_for_regression:n_estimators, Type: Constant, Value: 100
    feature_preprocessor:fast_ica:algorithm, Type: Categorical, Choices: {parallel, deflation}, Default: parallel
    feature_preprocessor:fast_ica:fun, Type: Categorical, Choices: {logcosh, exp, cube}, Default: logcosh
    feature_preprocessor:fast_ica:n_components, Type: UniformInteger, Range: [10, 2000], Default: 100
    feature_preprocessor:fast_ica:whiten, Type: Categorical, Choices: {False, True}, Default: False
    feature_preprocessor:feature_agglomeration:affinity, Type: Categorical, Choices: {euclidean, manhattan, cosine}, Default: euclidean
    feature_preprocessor:feature_agglomeration:linkage, Type: Categorical, Choices: {ward, complete, average}, Default: ward
    feature_preprocessor:feature_agglomeration:n_clusters, Type: UniformInteger, Range: [2, 400], Default: 25
    feature_preprocessor:feature_agglomeration:pooling_func, Type: Categorical, Choices: {mean, median, max}, Default: mean
    feature_preprocessor:kernel_pca:coef0, Type: UniformFloat, Range: [-1.0, 1.0], Default: 0.0
    feature_preprocessor:kernel_pca:degree, Type: UniformInteger, Range: [2, 5], Default: 3
    feature_preprocessor:kernel_pca:gamma, Type: UniformFloat, Range: [3.0517578125e-05, 8.0], Default: 0.01, on log-scale
    feature_preprocessor:kernel_pca:kernel, Type: Categorical, Choices: {poly, rbf, sigmoid, cosine}, Default: rbf
    feature_preprocessor:kernel_pca:n_components, Type: UniformInteger, Range: [10, 2000], Default: 100
    feature_preprocessor:kitchen_sinks:gamma, Type: UniformFloat, Range: [3.0517578125e-05, 8.0], Default: 1.0, on log-scale
    feature_preprocessor:kitchen_sinks:n_components, Type: UniformInteger, Range: [50, 10000], Default: 100, on log-scale
    feature_preprocessor:nystroem_sampler:coef0, Type: UniformFloat, Range: [-1.0, 1.0], Default: 0.0
    feature_preprocessor:nystroem_sampler:degree, Type: UniformInteger, Range: [2, 5], Default: 3
    feature_preprocessor:nystroem_sampler:gamma, Type: UniformFloat, Range: [3.0517578125e-05, 8.0], Default: 0.1, on log-scale
    feature_preprocessor:nystroem_sampler:kernel, Type: Categorical, Choices: {poly, rbf, sigmoid, cosine}, Default: rbf
    feature_preprocessor:nystroem_sampler:n_components, Type: UniformInteger, Range: [50, 10000], Default: 100, on log-scale
    feature_preprocessor:pca:keep_variance, Type: UniformFloat, Range: [0.5, 0.9999], Default: 0.9999
    feature_preprocessor:pca:whiten, Type: Categorical, Choices: {False, True}, Default: False
    feature_preprocessor:polynomial:degree, Type: UniformInteger, Range: [2, 3], Default: 2
    feature_preprocessor:polynomial:include_bias, Type: Categorical, Choices: {True, False}, Default: True
    feature_preprocessor:polynomial:interaction_only, Type: Categorical, Choices: {False, True}, Default: False
    feature_preprocessor:random_trees_embedding:bootstrap, Type: Categorical, Choices: {True, False}, Default: True
    feature_preprocessor:random_trees_embedding:max_depth, Type: UniformInteger, Range: [2, 10], Default: 5
    feature_preprocessor:random_trees_embedding:max_leaf_nodes, Type: Constant, Value: None
    feature_preprocessor:random_trees_embedding:min_samples_leaf, Type: UniformInteger, Range: [1, 20], Default: 1
    feature_preprocessor:random_trees_embedding:min_samples_split, Type: UniformInteger, Range: [2, 20], Default: 2
    feature_preprocessor:random_trees_embedding:min_weight_fraction_leaf, Type: Constant, Value: 1.0
    feature_preprocessor:random_trees_embedding:n_estimators, Type: UniformInteger, Range: [10, 100], Default: 10
    regressor:__choice__, Type: Categorical, Choices: {decision_tree, extra_trees, gaussian_process, k_nearest_neighbors, random_forest}, Default: random_forest
    regressor:decision_tree:criterion, Type: Categorical, Choices: {mse, friedman_mse, mae}, Default: mse
    regressor:decision_tree:max_depth_factor, Type: UniformFloat, Range: [0.0, 2.0], Default: 0.5
    regressor:decision_tree:max_features, Type: Constant, Value: 1.0
    regressor:decision_tree:max_leaf_nodes, Type: Constant, Value: None
    regressor:decision_tree:min_impurity_decrease, Type: Constant, Value: 0.0
    regressor:decision_tree:min_samples_leaf, Type: UniformInteger, Range: [1, 20], Default: 1
    regressor:decision_tree:min_samples_split, Type: UniformInteger, Range: [2, 20], Default: 2
    regressor:decision_tree:min_weight_fraction_leaf, Type: Constant, Value: 0.0
    regressor:extra_trees:bootstrap, Type: Categorical, Choices: {True, False}, Default: False
    regressor:extra_trees:criterion, Type: Categorical, Choices: {mse, friedman_mse, mae}, Default: mse
    regressor:extra_trees:max_depth, Type: Constant, Value: None
    regressor:extra_trees:max_features, Type: UniformFloat, Range: [0.1, 1.0], Default: 1.0
    regressor:extra_trees:max_leaf_nodes, Type: Constant, Value: None
    regressor:extra_trees:min_impurity_decrease, Type: Constant, Value: 0.0
    regressor:extra_trees:min_samples_leaf, Type: UniformInteger, Range: [1, 20], Default: 1
    regressor:extra_trees:min_samples_split, Type: UniformInteger, Range: [2, 20], Default: 2
    regressor:extra_trees:min_weight_fraction_leaf, Type: Constant, Value: 0.0
    regressor:gaussian_process:alpha, Type: UniformFloat, Range: [1e-14, 1.0], Default: 1e-08, on log-scale
    regressor:gaussian_process:thetaL, Type: UniformFloat, Range: [1e-10, 0.001], Default: 1e-06, on log-scale
    regressor:gaussian_process:thetaU, Type: UniformFloat, Range: [1.0, 100000.0], Default: 100000.0, on log-scale
    regressor:k_nearest_neighbors:n_neighbors, Type: UniformInteger, Range: [1, 100], Default: 1, on log-scale
    regressor:k_nearest_neighbors:p, Type: Categorical, Choices: {1, 2}, Default: 2
    regressor:k_nearest_neighbors:weights, Type: Categorical, Choices: {uniform, distance}, Default: uniform
    regressor:random_forest:bootstrap, Type: Categorical, Choices: {True, False}, Default: True
    regressor:random_forest:criterion, Type: Categorical, Choices: {mse, friedman_mse, mae}, Default: mse
    regressor:random_forest:max_depth, Type: Constant, Value: None
    regressor:random_forest:max_features, Type: UniformFloat, Range: [0.1, 1.0], Default: 1.0
    regressor:random_forest:max_leaf_nodes, Type: Constant, Value: None
    regressor:random_forest:min_impurity_decrease, Type: Constant, Value: 0.0
    regressor:random_forest:min_samples_leaf, Type: UniformInteger, Range: [1, 20], Default: 1
    regressor:random_forest:min_samples_split, Type: UniformInteger, Range: [2, 20], Default: 2
    regressor:random_forest:min_weight_fraction_leaf, Type: Constant, Value: 0.0
  Conditions:
    data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__ | data_preprocessor:__choice__ == 'feature_type'
    data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__ | data_preprocessor:__choice__ == 'feature_type'
    data_preprocessor:feature_type:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction | data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__ == 'minority_coalescer'
    data_preprocessor:feature_type:numerical_transformer:imputation:strategy | data_preprocessor:__choice__ == 'feature_type'
    data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__ | data_preprocessor:__choice__ == 'feature_type'
    data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:n_quantiles | data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__ == 'quantile_transformer'
    data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:output_distribution | data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__ == 'quantile_transformer'
    data_preprocessor:feature_type:numerical_transformer:rescaling:robust_scaler:q_max | data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__ == 'robust_scaler'
    data_preprocessor:feature_type:numerical_transformer:rescaling:robust_scaler:q_min | data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__ == 'robust_scaler'
    feature_preprocessor:extra_trees_preproc_for_regression:bootstrap | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression'
    feature_preprocessor:extra_trees_preproc_for_regression:criterion | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression'
    feature_preprocessor:extra_trees_preproc_for_regression:max_depth | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression'
    feature_preprocessor:extra_trees_preproc_for_regression:max_features | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression'
    feature_preprocessor:extra_trees_preproc_for_regression:max_leaf_nodes | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression'
    feature_preprocessor:extra_trees_preproc_for_regression:min_samples_leaf | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression'
    feature_preprocessor:extra_trees_preproc_for_regression:min_samples_split | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression'
    feature_preprocessor:extra_trees_preproc_for_regression:min_weight_fraction_leaf | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression'
    feature_preprocessor:extra_trees_preproc_for_regression:n_estimators | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression'
    feature_preprocessor:fast_ica:algorithm | feature_preprocessor:__choice__ == 'fast_ica'
    feature_preprocessor:fast_ica:fun | feature_preprocessor:__choice__ == 'fast_ica'
    feature_preprocessor:fast_ica:n_components | feature_preprocessor:fast_ica:whiten == 'True'
    feature_preprocessor:fast_ica:whiten | feature_preprocessor:__choice__ == 'fast_ica'
    feature_preprocessor:feature_agglomeration:affinity | feature_preprocessor:__choice__ == 'feature_agglomeration'
    feature_preprocessor:feature_agglomeration:linkage | feature_preprocessor:__choice__ == 'feature_agglomeration'
    feature_preprocessor:feature_agglomeration:n_clusters | feature_preprocessor:__choice__ == 'feature_agglomeration'
    feature_preprocessor:feature_agglomeration:pooling_func | feature_preprocessor:__choice__ == 'feature_agglomeration'
    feature_preprocessor:kernel_pca:coef0 | feature_preprocessor:kernel_pca:kernel in {'poly', 'sigmoid'}
    feature_preprocessor:kernel_pca:degree | feature_preprocessor:kernel_pca:kernel == 'poly'
    feature_preprocessor:kernel_pca:gamma | feature_preprocessor:kernel_pca:kernel in {'poly', 'rbf'}
    feature_preprocessor:kernel_pca:kernel | feature_preprocessor:__choice__ == 'kernel_pca'
    feature_preprocessor:kernel_pca:n_components | feature_preprocessor:__choice__ == 'kernel_pca'
    feature_preprocessor:kitchen_sinks:gamma | feature_preprocessor:__choice__ == 'kitchen_sinks'
    feature_preprocessor:kitchen_sinks:n_components | feature_preprocessor:__choice__ == 'kitchen_sinks'
    feature_preprocessor:nystroem_sampler:coef0 | feature_preprocessor:nystroem_sampler:kernel in {'poly', 'sigmoid'}
    feature_preprocessor:nystroem_sampler:degree | feature_preprocessor:nystroem_sampler:kernel == 'poly'
    feature_preprocessor:nystroem_sampler:gamma | feature_preprocessor:nystroem_sampler:kernel in {'poly', 'rbf', 'sigmoid'}
    feature_preprocessor:nystroem_sampler:kernel | feature_preprocessor:__choice__ == 'nystroem_sampler'
    feature_preprocessor:nystroem_sampler:n_components | feature_preprocessor:__choice__ == 'nystroem_sampler'
    feature_preprocessor:pca:keep_variance | feature_preprocessor:__choice__ == 'pca'
    feature_preprocessor:pca:whiten | feature_preprocessor:__choice__ == 'pca'
    feature_preprocessor:polynomial:degree | feature_preprocessor:__choice__ == 'polynomial'
    feature_preprocessor:polynomial:include_bias | feature_preprocessor:__choice__ == 'polynomial'
    feature_preprocessor:polynomial:interaction_only | feature_preprocessor:__choice__ == 'polynomial'
    feature_preprocessor:random_trees_embedding:bootstrap | feature_preprocessor:__choice__ == 'random_trees_embedding'
    feature_preprocessor:random_trees_embedding:max_depth | feature_preprocessor:__choice__ == 'random_trees_embedding'
    feature_preprocessor:random_trees_embedding:max_leaf_nodes | feature_preprocessor:__choice__ == 'random_trees_embedding'
    feature_preprocessor:random_trees_embedding:min_samples_leaf | feature_preprocessor:__choice__ == 'random_trees_embedding'
    feature_preprocessor:random_trees_embedding:min_samples_split | feature_preprocessor:__choice__ == 'random_trees_embedding'
    feature_preprocessor:random_trees_embedding:min_weight_fraction_leaf | feature_preprocessor:__choice__ == 'random_trees_embedding'
    feature_preprocessor:random_trees_embedding:n_estimators | feature_preprocessor:__choice__ == 'random_trees_embedding'
    regressor:decision_tree:criterion | regressor:__choice__ == 'decision_tree'
    regressor:decision_tree:max_depth_factor | regressor:__choice__ == 'decision_tree'
    regressor:decision_tree:max_features | regressor:__choice__ == 'decision_tree'
    regressor:decision_tree:max_leaf_nodes | regressor:__choice__ == 'decision_tree'
    regressor:decision_tree:min_impurity_decrease | regressor:__choice__ == 'decision_tree'
    regressor:decision_tree:min_samples_leaf | regressor:__choice__ == 'decision_tree'
    regressor:decision_tree:min_samples_split | regressor:__choice__ == 'decision_tree'
    regressor:decision_tree:min_weight_fraction_leaf | regressor:__choice__ == 'decision_tree'
    regressor:extra_trees:bootstrap | regressor:__choice__ == 'extra_trees'
    regressor:extra_trees:criterion | regressor:__choice__ == 'extra_trees'
    regressor:extra_trees:max_depth | regressor:__choice__ == 'extra_trees'
    regressor:extra_trees:max_features | regressor:__choice__ == 'extra_trees'
    regressor:extra_trees:max_leaf_nodes | regressor:__choice__ == 'extra_trees'
    regressor:extra_trees:min_impurity_decrease | regressor:__choice__ == 'extra_trees'
    regressor:extra_trees:min_samples_leaf | regressor:__choice__ == 'extra_trees'
    regressor:extra_trees:min_samples_split | regressor:__choice__ == 'extra_trees'
    regressor:extra_trees:min_weight_fraction_leaf | regressor:__choice__ == 'extra_trees'
    regressor:gaussian_process:alpha | regressor:__choice__ == 'gaussian_process'
    regressor:gaussian_process:thetaL | regressor:__choice__ == 'gaussian_process'
    regressor:gaussian_process:thetaU | regressor:__choice__ == 'gaussian_process'
    regressor:k_nearest_neighbors:n_neighbors | regressor:__choice__ == 'k_nearest_neighbors'
    regressor:k_nearest_neighbors:p | regressor:__choice__ == 'k_nearest_neighbors'
    regressor:k_nearest_neighbors:weights | regressor:__choice__ == 'k_nearest_neighbors'
    regressor:random_forest:bootstrap | regressor:__choice__ == 'random_forest'
    regressor:random_forest:criterion | regressor:__choice__ == 'random_forest'
    regressor:random_forest:max_depth | regressor:__choice__ == 'random_forest'
    regressor:random_forest:max_features | regressor:__choice__ == 'random_forest'
    regressor:random_forest:max_leaf_nodes | regressor:__choice__ == 'random_forest'
    regressor:random_forest:min_impurity_decrease | regressor:__choice__ == 'random_forest'
    regressor:random_forest:min_samples_leaf | regressor:__choice__ == 'random_forest'
    regressor:random_forest:min_samples_split | regressor:__choice__ == 'random_forest'
    regressor:random_forest:min_weight_fraction_leaf | regressor:__choice__ == 'random_forest'
  Forbidden Clauses:
    (Forbidden: feature_preprocessor:feature_agglomeration:affinity in {'cosine', 'manhattan'} && Forbidden: feature_preprocessor:feature_agglomeration:linkage == 'ward')
    (Forbidden: feature_preprocessor:__choice__ == 'random_trees_embedding' && Forbidden: regressor:__choice__ == 'gaussian_process')
    (Forbidden: regressor:__choice__ == 'decision_tree' && Forbidden: feature_preprocessor:__choice__ == 'kitchen_sinks')
    (Forbidden: regressor:__choice__ == 'decision_tree' && Forbidden: feature_preprocessor:__choice__ == 'kernel_pca')
    (Forbidden: regressor:__choice__ == 'decision_tree' && Forbidden: feature_preprocessor:__choice__ == 'nystroem_sampler')
    (Forbidden: regressor:__choice__ == 'extra_trees' && Forbidden: feature_preprocessor:__choice__ == 'kitchen_sinks')
    (Forbidden: regressor:__choice__ == 'extra_trees' && Forbidden: feature_preprocessor:__choice__ == 'kernel_pca')
    (Forbidden: regressor:__choice__ == 'extra_trees' && Forbidden: feature_preprocessor:__choice__ == 'nystroem_sampler')
    (Forbidden: regressor:__choice__ == 'gaussian_process' && Forbidden: feature_preprocessor:__choice__ == 'kitchen_sinks')
    (Forbidden: regressor:__choice__ == 'gaussian_process' && Forbidden: feature_preprocessor:__choice__ == 'kernel_pca')
    (Forbidden: regressor:__choice__ == 'gaussian_process' && Forbidden: feature_preprocessor:__choice__ == 'nystroem_sampler')
    (Forbidden: regressor:__choice__ == 'k_nearest_neighbors' && Forbidden: feature_preprocessor:__choice__ == 'kitchen_sinks')
    (Forbidden: regressor:__choice__ == 'k_nearest_neighbors' && Forbidden: feature_preprocessor:__choice__ == 'kernel_pca')
    (Forbidden: regressor:__choice__ == 'k_nearest_neighbors' && Forbidden: feature_preprocessor:__choice__ == 'nystroem_sampler')
    (Forbidden: regressor:__choice__ == 'random_forest' && Forbidden: feature_preprocessor:__choice__ == 'kitchen_sinks')
    (Forbidden: regressor:__choice__ == 'random_forest' && Forbidden: feature_preprocessor:__choice__ == 'kernel_pca')
    (Forbidden: regressor:__choice__ == 'random_forest' && Forbidden: feature_preprocessor:__choice__ == 'nystroem_sampler')


plt.scatter(train_predictions, y_train, label="Train samples", c='#d95f02')
plt.scatter(test_predictions, y_test, label="Test samples", c='#7570b3')
plt.xlabel("Predicted value")
plt.ylabel("True value")
plt.legend()

plt.plot([-500, 500], [-500, 500], c='k', zorder=0)

plt.tight_layout()
plt.show()

可以看见最后的拟合结果，接近100%的拟合准确率，所以基本所以点都在直线上

参考资料：

更多的aotosklearn使用例子：https://automl.github.io/auto-sklearn/master/examples/index.html
更多的autosklearn的API使用文档：https://automl.github.io/auto-sklearn/master/api.html
更多关于多分类与多输出回归算法介绍：https://scikit-learn.org/stable/modules/multiclass.html

Original: https://blog.csdn.net/weixin_44751294/article/details/123302017
Author: Clichong
Title: AutoML | AutoSklearn的基本分类、回归、多输出回归和多标签分类数据集的使用示例

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/689948/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding

论文目的 SimCSE 使用dropout作为数据增强的最小方法，因为transformer使用位置embedding，句子的长度信息会被编码到句子embedding中，所以uns…

人工智能 2023年5月31日
0071
语义分割数据集：Cityscapes的使用

本文主要介绍Cityscapes在语义分割方向上的理解和使用。 Cityscapes官网：官方网站 Cityscapes 简介 Cityscapes大致有两个数据集，分别为精细的标…

人工智能 2023年6月23日
00121
Matlab打印运行进度

在运行matlab程序的过程中，有时候需要实时地掌握程序运行的进度，尤其对于一些耗时较长的循环操作，能够及时地输出运行进度，显得非常有必要。打印进度条的实现方式就是不断地退格、输…

人工智能 2023年6月6日
0086
TensorFlow安装并在Pycharm搭建环境

Anaconda安装：注意：此处要勾选第一项Add Anaconda3 to my PATH environment variable TensorFlow安装： 1、打开An…

人工智能 2023年7月4日
0058
第五章 Spark的DataFrame与Spark SQL

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档随着Spark版本的更迭，不足凸显出来，它处在底层，在实际开发中效率低下，因此进行了高层封装，诞生了Spark D…

人工智能 2023年7月7日
0076
8.5 Spring解决循环依赖的原理(非AOP)

Spring为什么可以解决 set + singleton模式下循环依赖？根本的原因在于：这种方式可以做到将” 实例&am…

人工智能 2023年6月30日
0091
方寸知识篇 – 数字图像处理（四）- FPGA与图像处理

随着图像分辨率大幅提升，和图像算法复杂度的推动，传统的串行处理器已经无法满足实时处理的需求了。因此多核处理器、GPU以及FPGA很快在实时图像处理领域得到了迅速的发展。FPGA通过…

人工智能 2023年6月20日
0093
ROS2学习笔记（十一）– ROS2 bag数据记录与回放

简介：ROS2提供了ros2 bag命令，可以记录指定主题的数据到文件中，也可以将记录下的内容再发布出来，相当于是数据的回放，除了通过命令行的方式实现数据记录以外，也可以通过编程实…

人工智能 2023年6月1日
00112
【图像分类案例】(10) Vision Transformer 动物图像三分类，附Pytorch完整代码

大家好，今天和各位分享一下如何使用 Pytorch构建 Vision Transformer网络模型，并使用权重迁移学习方法训练模型并预测。 Vision Transformer…

人工智能 2023年7月21日
0050
基于逻辑回归算法模型搭建思路

在真实工作场景中，有多种算法依据借贷数据集建立模型，主要使用的算法有逻辑回归、神经网络、决策树、贝叶斯信念网、GBDT算法等，本系列文章旨在为刚入门和对模型感兴趣的同学介绍传统风控…

人工智能 2023年6月17日
0089
深度强化学习中的episode、epoch、batch-size、iteration

即批大小，如果把全部数据放入内存后再加载到显存中，空间显然不够的；如果一个一个数据加载训练并更新模型参数，效率极低。所以考虑一批一批地加载数据，每次送进去的数量就是batch_si…

人工智能 2023年6月16日
0082
图像去雾学习总结

前言：本来题目想作为如何学习图像去雾，去雾字如其名，而学习是学会去雾方面相关的知识。但是后来一想，每个研究方向均是一片海洋，而自己是半瓶不满的杯水，如何教别人呢，因此本文只能算作…

人工智能 2023年5月26日
00105
GC—MS常见数据库有哪些，NIST和AMDIS有什么作用?

(1)提出问题 GC—MS常见数据库有哪些，特点是什么?NIST和AMDIS有什么作用? (2)GC—MS常见数据库介绍谱库检索是定性分析最为广泛的辅助手段之一。通常GC—MS仪…

人工智能 2023年6月1日
0068
Week2 Python之机器学习

1 多项式回归 1.1 生成数据集我们首先通过Numpy的随机采样函数获取数据集，同时添加一定的噪声，并将其保存至文件中。 import numpy as np x = np.r…

人工智能 2023年6月17日
00109
机器学习笔记 – 如何对两个分类变量使用卡方检验？

1、卡方检验概述卡方检验被誉为二十世纪科学技术所有分支中的20大发明之一，它的发明者卡尔·皮尔逊是一位历史上罕见的百科全书式的学者，研究领域涵盖了生物、历史、宗教、哲学、法律。是…

人工智能 2023年7月1日
00144
pandas 实现无关联key数据交叉连接（cross join）

有两个数据帧，分别有一列col1，col2，他们没有相同的key： left = pd.DataFrame({‘col1’ : [‘A’, ‘B’, ‘C’]}) right = …

人工智能 2023年7月8日
0069

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

AutoML | AutoSklearn的基本分类、回归、多输出回归和多标签分类数据集的使用示例

文章目录

大家都在看