mlens.ensemble.base module

ML-ENSEMBLE

author:Sebastian Flennerhag
copyright:2017
licence:MIT

Base classes for ensemble layer management.

class mlens.ensemble.base.BaseEnsemble(shuffle=False, random_state=None, scorer=None, raise_on_exception=True, verbose=False, n_jobs=-1, layers=None, array_check=2, backend=None)[source]

Bases: mlens.externals.sklearn.base.BaseEstimator

BaseEnsemble class.

Core ensemble class methods used to add ensemble layers and manipulate parameters.

fit(X, y=None)[source]

Fit ensemble.

Parameters:
  • X (array-like of shape = [n_samples, n_features]) – input matrix to be used for prediction.
  • y (array-like of shape = [n_samples, ] or None (default = None)) – output vector to trained estimators on.
Returns:

self – class instance with fitted estimators.

Return type:

instance

predict(X)[source]

Predict with fitted ensemble.

Parameters:X (array-like, shape=[n_samples, n_features]) – input matrix to be used for prediction.
Returns:y_pred – predictions for provided input array.
Return type:array-like, shape=[n_samples, ]
predict_proba(X)[source]

Predict class probabilities with fitted ensemble.

Compatibility method for Scikit-learn. This method checks that the final layer has proba=True, then calls the regular predict method.

Parameters:X (array-like, shape=[n_samples, n_features]) – input matrix to be used for prediction.
Returns:y_pred – predicted class membership probabilities for provided input array.
Return type:array-like, shape=[n_samples, n_classes]
set_verbosity(verbose)[source]

Adjust the level of verbosity.

class mlens.ensemble.base.Layer(estimators, cls, indexer=None, preprocessing=None, proba=False, partitions=1, propagate_features=None, scorer=None, raise_on_exception=False, name=None, dtype=None, verbose=False, cls_kwargs=None)[source]

Bases: mlens.externals.sklearn.base.BaseEstimator

Layer of preprocessing pipes and estimators.

Layer is an internal class that holds a layer and its associated data including an estimation procedure. It behaves as an estimator from an Scikit-learn API point of view.

Parameters:
  • estimators (dict of lists or list) –

    estimators constituting the layer. If preprocessing is None or list, estimators should be a list. The list can either contain estimator instances, named tuples of estimator instances, or a combination of both.

    option_1 = [estimator_1, estimator_2]
    option_2 = [("est-1", estimator_1), ("est-2", estimator_2)]
    option_3 = [estimator_1, ("est-2", estimator_2)]
    

    If different preprocessing pipelines are desired, a dictionary that maps estimators to preprocessing pipelines must be passed. The names of the estimator dictionary must correspond to the names of the estimator dictionary.

    preprocessing_cases = {"case-1": [trans_1, trans_2].
                           "case-2": [alt_trans_1, alt_trans_2]}
    
    estimators = {"case-1": [est_a, est_b].
                  "case-2": [est_c, est_d]}
    

    The lists for each dictionary entry can be any of option_1, option_2 and option_3.

  • cls (str) – type of layers. Should be the name of an accepted estimator class.
  • indexer (instance, optional) – Indexer instance to use. Defaults to the layer class indexer instantiated with default settings. Required arguments depend on the indexer. See mlens.base for details.
  • preprocessing (dict of lists or list, optional (default = None)) –

    preprocessing pipelines for given layer. If the same preprocessing applies to all estimators, preprocessing should be a list of transformer instances. The list can contain the instances directly, named tuples of transformers, or a combination of both.

    option_1 = [transformer_1, transformer_2]
    option_2 = [("trans-1", transformer_1),
                ("trans-2", transformer_2)]
    option_3 = [transformer_1, ("trans-2", transformer_2)]
    

    If different preprocessing pipelines are desired, a dictionary that maps preprocessing pipelines must be passed. The names of the preprocessing dictionary must correspond to the names of the estimator dictionary.

    preprocessing_cases = {"case-1": [trans_1, trans_2].
                           "case-2": [alt_trans_1, alt_trans_2]}
    
    estimators = {"case-1": [est_a, est_b].
                  "case-2": [est_c, est_d]}
    

    The lists for each dictionary entry can be any of option_1, option_2 and option_3.

  • proba (bool (default = False)) – whether to call predict_proba on the estimators in the layer when predicting.
  • partitions (int (default = 1)) – Number of subset-specific fits to generate from the learner library.
  • propagate_features (list, optional) – Features to propagate from the input array to the output array. Carries input features to the output of the layer, useful for propagating original data through several stacked layers. Propagated features are stored in the left-most columns.
  • raise_on_exception (bool (default = False)) – whether to raise an error on soft exceptions, else issue warning.
  • verbose (int or bool (default = False)) –

    level of verbosity.

    • verbose = 0 silent (same as verbose = False)
    • verbose = 1 messages at start and finish (same as verbose = True)
    • verbose = 2 messages for each layer

    If verbose >= 50 prints to sys.stdout, else sys.stderr. For verbosity in the layers themselves, use fit_params.

  • dtype (numpy dtype class, default = numpy.float32) – dtype format of prediction array.
  • cls_kwargs (dict or None) – optional arguments to pass to the layer type class.
estimators_

OrderedDict, list – container for fitted estimators, possibly mapped to preprocessing cases and / or folds.

preprocessing_

OrderedDict, list – container for fitted preprocessing pipelines, possibly mapped to preprocessing cases and / or folds.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:deep (boolean (default = True)) – If True, will return the layers separately as individual parameters. If False, will return the collapsed dictionary.
Returns:params – mapping of parameter names mapped to their values.
Return type:dict
class mlens.ensemble.base.LayerContainer(layers=None, n_jobs=-1, backend=None, raise_on_exception=False, verbose=False)[source]

Bases: mlens.externals.sklearn.base.BaseEstimator

Container class for layers.

The LayerContainer class stories all layers as an ordered dictionary and modifies possesses a get_params method to appear as an estimator in the Scikit-learn API. This allows correct cloning and parameter updating.

Parameters:
  • layers (OrderedDict, None (default = None)) – An ordered dictionary of Layer instances. To initiate a new LayerContainer instance, set layers = None.
  • n_jobs (int (default = -1)) – Number of CPUs to use. Set n_jobs = -1 for all available CPUs, and n_jobs = -2 for all available CPUs except one, e.tc..
  • backend (str, (default="threading")) – the joblib backend to use (i.e. “multiprocessing” or “threading”).
  • raise_on_exception (bool (default = False)) – raise error on soft exceptions. Otherwise issue warning.
  • verbose (int or bool (default = False)) –

    level of verbosity.

    • verbose = 0 silent (same as verbose = False)
    • verbose = 1 messages at start and finish (same as verbose = True)
    • verbose = 2 messages for each layer

    If verbose >= 50 prints to sys.stdout, else sys.stderr. For verbosity in the layers themselves, use fit_params.

add(estimators, cls, indexer=None, preprocessing=None, **kwargs)[source]

Method for adding a layer.

Parameters:
  • estimators (dict of lists or list) –

    estimators constituting the layer. If preprocessing is None or list, estimators should be a list. The list can either contain estimator instances, named tuples of estimator instances, or a combination of both.

    option_1 = [estimator_1, estimator_2]
    option_2 = [("est-1", estimator_1), ("est-2", estimator_2)]
    option_3 = [estimator_1, ("est-2", estimator_2)]
    

    If different preprocessing pipelines are desired, a dictionary that maps estimators to preprocessing pipelines must be passed. The names of the estimator dictionary must correspond to the names of the estimator dictionary.

    preprocessing_cases = {"case-1": [trans_1, trans_2].
                           "case-2": [alt_trans_1, alt_trans_2]}
    
    estimators = {"case-1": [est_a, est_b].
                  "case-2": [est_c, est_d]}
    

    The lists for each dictionary entry can be any of option_1, option_2 and option_3.

  • cls (str) – Type of layer, as defined by the estimation class to instantiate when processing a layer. See mlens.ensemble for available classes.
  • indexer (instance or None (default = None)) – Indexer instance to use. Defaults to the layer class indexer with default settings. See mlens.base for details.
  • preprocessing (dict of lists or list, optional (default = None)) –

    preprocessing pipelines for given layer. If the same preprocessing applies to all estimators, preprocessing should be a list of transformer instances. The list can contain the instances directly, named tuples of transformers, or a combination of both.

    option_1 = [transformer_1, transformer_2]
    option_2 = [("trans-1", transformer_1),
                ("trans-2", transformer_2)]
    option_3 = [transformer_1, ("trans-2", transformer_2)]
    

    If different preprocessing pipelines are desired, a dictionary that maps preprocessing pipelines must be passed. The names of the preprocessing dictionary must correspond to the names of the estimator dictionary.

    preprocessing_cases = {"case-1": [trans_1, trans_2].
                           "case-2": [alt_trans_1, alt_trans_2]}
    
    estimators = {"case-1": [est_a, est_b].
                  "case-2": [est_c, est_d]}
    

    The lists for each dictionary entry can be any of option_1, option_2 and option_3.

  • **kwargs (optional) – keyword arguments to be passed onto the layer at instantiation.
Returns:

self – if in_place = True, returns self with the layer instantiated.

Return type:

instance, optional

fit(X=None, y=None, return_preds=None, **process_kwargs)[source]

Fit instance by calling predict_proba in the first layer.

Similar to fit, but will call the predict_proba method on estimators. Thus, each the n_test_samples * n_labels prediction matrix of each estimator will be stacked and used as input in the subsequent layer.

Parameters:
  • X (array-like of shape = [n_samples, n_features]) – input matrix to be used for fitting and predicting.
  • y (array-like of shape = [n_samples, ]) – training labels.
  • return_preds (bool) – whether to return final prediction array
  • **process_kwargs (optional) – optional arguments to initialize processor with.
Returns:

  • out (dict) – dictionary of output data (possibly empty) generated through fitting. Keys correspond to layer names and values to the output generated by calling the layer’s fit_function.

    out = {'layer-i-estimator-j': some_data,
           ...
           'layer-s-estimator-q': some_data}
    
  • X (array-like, optional) – predictions from final layer’s fit_proba call.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the layers separately as individual parameters. If False, will return the collapsed dictionary.
Returns:params – mapping of parameter names mapped to their values.
Return type:dict
predict(X=None, *args, **kwargs)[source]

Generic method for predicting through all layers in the container.

Parameters:
  • X (array-like of shape = [n_samples, n_features]) – input matrix to be used for prediction.
  • *args (optional) – optional arguments.
  • **kwargs (optional) – optional keyword arguments.
Returns:

X_pred – predictions from final layer.

Return type:

array-like of shape = [n_samples, n_fitted_estimators]

transform(X=None, *args, **kwargs)[source]

Generic method for reproducing predictions of the fit call.

Parameters:
  • X (array-like of shape = [n_samples, n_features]) – input matrix to be used for prediction.
  • *args (optional) – optional arguments.
  • **kwargs (optional) – optional keyword arguments.
Returns:

X_pred – predictions from fit call to final layer.

Return type:

array-like of shape = [n_test_samples, n_fitted_estimators]

mlens.ensemble.base.print_job(lc, start_message)[source]

Print job details.

Parameters:
  • lc (LayerContainer) – The LayerContainer instance running the job.
  • start_message (str) – Initial message.