mlens.ensemble.base module¶
ML-ENSEMBLE
author: | Sebastian Flennerhag |
---|---|
copyright: | 2017 |
licence: | MIT |
Base classes for ensemble layer management.
-
class
mlens.ensemble.base.
BaseEnsemble
(shuffle=False, random_state=None, scorer=None, raise_on_exception=True, verbose=False, n_jobs=-1, layers=None, array_check=2, backend=None)[source]¶ Bases:
mlens.externals.sklearn.base.BaseEstimator
BaseEnsemble class.
Core ensemble class methods used to add ensemble layers and manipulate parameters.
-
fit
(X, y=None)[source]¶ Fit ensemble.
Parameters: - X (array-like of shape = [n_samples, n_features]) – input matrix to be used for prediction.
- y (array-like of shape = [n_samples, ] or None (default = None)) – output vector to trained estimators on.
Returns: self – class instance with fitted estimators.
Return type: instance
-
predict
(X)[source]¶ Predict with fitted ensemble.
Parameters: X (array-like, shape=[n_samples, n_features]) – input matrix to be used for prediction. Returns: y_pred – predictions for provided input array. Return type: array-like, shape=[n_samples, ]
-
predict_proba
(X)[source]¶ Predict class probabilities with fitted ensemble.
Compatibility method for Scikit-learn. This method checks that the final layer has
proba=True
, then calls the regularpredict
method.Parameters: X (array-like, shape=[n_samples, n_features]) – input matrix to be used for prediction. Returns: y_pred – predicted class membership probabilities for provided input array. Return type: array-like, shape=[n_samples, n_classes]
-
-
class
mlens.ensemble.base.
Layer
(estimators, cls, indexer=None, preprocessing=None, proba=False, partitions=1, propagate_features=None, scorer=None, raise_on_exception=False, name=None, dtype=None, verbose=False, cls_kwargs=None)[source]¶ Bases:
mlens.externals.sklearn.base.BaseEstimator
Layer of preprocessing pipes and estimators.
Layer is an internal class that holds a layer and its associated data including an estimation procedure. It behaves as an estimator from an Scikit-learn API point of view.
Parameters: - estimators (dict of lists or list) –
estimators constituting the layer. If
preprocessing
isNone
orlist
,estimators
should be alist
. The list can either contain estimator instances, named tuples of estimator instances, or a combination of both.option_1 = [estimator_1, estimator_2] option_2 = [("est-1", estimator_1), ("est-2", estimator_2)] option_3 = [estimator_1, ("est-2", estimator_2)]
If different preprocessing pipelines are desired, a dictionary that maps estimators to preprocessing pipelines must be passed. The names of the estimator dictionary must correspond to the names of the estimator dictionary.
preprocessing_cases = {"case-1": [trans_1, trans_2]. "case-2": [alt_trans_1, alt_trans_2]} estimators = {"case-1": [est_a, est_b]. "case-2": [est_c, est_d]}
The lists for each dictionary entry can be any of
option_1
,option_2
andoption_3
. - cls (str) – type of layers. Should be the name of an accepted estimator class.
- indexer (instance, optional) – Indexer instance to use. Defaults to the layer class indexer
instantiated with default settings. Required arguments depend on the
indexer. See
mlens.base
for details. - preprocessing (dict of lists or list, optional (default = None)) –
preprocessing pipelines for given layer. If the same preprocessing applies to all estimators,
preprocessing
should be a list of transformer instances. The list can contain the instances directly, named tuples of transformers, or a combination of both.option_1 = [transformer_1, transformer_2] option_2 = [("trans-1", transformer_1), ("trans-2", transformer_2)] option_3 = [transformer_1, ("trans-2", transformer_2)]
If different preprocessing pipelines are desired, a dictionary that maps preprocessing pipelines must be passed. The names of the preprocessing dictionary must correspond to the names of the estimator dictionary.
preprocessing_cases = {"case-1": [trans_1, trans_2]. "case-2": [alt_trans_1, alt_trans_2]} estimators = {"case-1": [est_a, est_b]. "case-2": [est_c, est_d]}
The lists for each dictionary entry can be any of
option_1
,option_2
andoption_3
. - proba (bool (default = False)) – whether to call predict_proba on the estimators in the layer when predicting.
- partitions (int (default = 1)) – Number of subset-specific fits to generate from the learner library.
- propagate_features (list, optional) – Features to propagate from the input array to the output array. Carries input features to the output of the layer, useful for propagating original data through several stacked layers. Propagated features are stored in the left-most columns.
- raise_on_exception (bool (default = False)) – whether to raise an error on soft exceptions, else issue warning.
- verbose (int or bool (default = False)) –
level of verbosity.
verbose = 0
silent (same asverbose = False
)verbose = 1
messages at start and finish (same asverbose = True
)verbose = 2
messages for each layer
If
verbose >= 50
prints tosys.stdout
, elsesys.stderr
. For verbosity in the layers themselves, usefit_params
. - dtype (numpy dtype class, default =
numpy.float32
) – dtype format of prediction array. - cls_kwargs (dict or None) – optional arguments to pass to the layer type class.
-
estimators_
¶ OrderedDict, list – container for fitted estimators, possibly mapped to preprocessing cases and / or folds.
-
preprocessing_
¶ OrderedDict, list – container for fitted preprocessing pipelines, possibly mapped to preprocessing cases and / or folds.
-
get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: deep (boolean (default = True)) – If True
, will return the layers separately as individual parameters. IfFalse
, will return the collapsed dictionary.Returns: params – mapping of parameter names mapped to their values. Return type: dict
- estimators (dict of lists or list) –
-
class
mlens.ensemble.base.
LayerContainer
(layers=None, n_jobs=-1, backend=None, raise_on_exception=False, verbose=False)[source]¶ Bases:
mlens.externals.sklearn.base.BaseEstimator
Container class for layers.
The LayerContainer class stories all layers as an ordered dictionary and modifies possesses a
get_params
method to appear as an estimator in the Scikit-learn API. This allows correct cloning and parameter updating.Parameters: - layers (OrderedDict, None (default = None)) – An ordered dictionary of Layer instances. To initiate a new
LayerContainer
instance, setlayers = None
. - n_jobs (int (default = -1)) – Number of CPUs to use. Set
n_jobs = -1
for all available CPUs, andn_jobs = -2
for all available CPUs except one, e.tc.. - backend (str, (default="threading")) – the joblib backend to use (i.e. “multiprocessing” or “threading”).
- raise_on_exception (bool (default = False)) – raise error on soft exceptions. Otherwise issue warning.
- verbose (int or bool (default = False)) –
level of verbosity.
verbose = 0
silent (same asverbose = False
)verbose = 1
messages at start and finish (same asverbose = True
)verbose = 2
messages for each layer
If
verbose >= 50
prints tosys.stdout
, elsesys.stderr
. For verbosity in the layers themselves, usefit_params
.
-
add
(estimators, cls, indexer=None, preprocessing=None, **kwargs)[source]¶ Method for adding a layer.
Parameters: - estimators (dict of lists or list) –
estimators constituting the layer. If
preprocessing
isNone
orlist
,estimators
should be alist
. The list can either contain estimator instances, named tuples of estimator instances, or a combination of both.option_1 = [estimator_1, estimator_2] option_2 = [("est-1", estimator_1), ("est-2", estimator_2)] option_3 = [estimator_1, ("est-2", estimator_2)]
If different preprocessing pipelines are desired, a dictionary that maps estimators to preprocessing pipelines must be passed. The names of the estimator dictionary must correspond to the names of the estimator dictionary.
preprocessing_cases = {"case-1": [trans_1, trans_2]. "case-2": [alt_trans_1, alt_trans_2]} estimators = {"case-1": [est_a, est_b]. "case-2": [est_c, est_d]}
The lists for each dictionary entry can be any of
option_1
,option_2
andoption_3
. - cls (str) – Type of layer, as defined by the estimation class to instantiate
when processing a layer. See
mlens.ensemble
for available classes. - indexer (instance or None (default = None)) – Indexer instance to use. Defaults to the layer class
indexer with default settings. See
mlens.base
for details. - preprocessing (dict of lists or list, optional (default = None)) –
preprocessing pipelines for given layer. If the same preprocessing applies to all estimators,
preprocessing
should be a list of transformer instances. The list can contain the instances directly, named tuples of transformers, or a combination of both.option_1 = [transformer_1, transformer_2] option_2 = [("trans-1", transformer_1), ("trans-2", transformer_2)] option_3 = [transformer_1, ("trans-2", transformer_2)]
If different preprocessing pipelines are desired, a dictionary that maps preprocessing pipelines must be passed. The names of the preprocessing dictionary must correspond to the names of the estimator dictionary.
preprocessing_cases = {"case-1": [trans_1, trans_2]. "case-2": [alt_trans_1, alt_trans_2]} estimators = {"case-1": [est_a, est_b]. "case-2": [est_c, est_d]}
The lists for each dictionary entry can be any of
option_1
,option_2
andoption_3
. - **kwargs (optional) – keyword arguments to be passed onto the layer at instantiation.
Returns: self – if
in_place = True
, returnsself
with the layer instantiated.Return type: instance, optional
- estimators (dict of lists or list) –
-
fit
(X=None, y=None, return_preds=None, **process_kwargs)[source]¶ Fit instance by calling
predict_proba
in the first layer.Similar to
fit
, but will call thepredict_proba
method on estimators. Thus, each then_test_samples * n_labels
prediction matrix of each estimator will be stacked and used as input in the subsequent layer.Parameters: - X (array-like of shape = [n_samples, n_features]) – input matrix to be used for fitting and predicting.
- y (array-like of shape = [n_samples, ]) – training labels.
- return_preds (bool) – whether to return final prediction array
- **process_kwargs (optional) – optional arguments to initialize processor with.
Returns: out (dict) – dictionary of output data (possibly empty) generated through fitting. Keys correspond to layer names and values to the output generated by calling the layer’s
fit_function
.out = {'layer-i-estimator-j': some_data, ... 'layer-s-estimator-q': some_data}
X (array-like, optional) – predictions from final layer’s
fit_proba
call.
-
get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: deep (boolean, optional) – If True, will return the layers separately as individual parameters. If False, will return the collapsed dictionary. Returns: params – mapping of parameter names mapped to their values. Return type: dict
-
predict
(X=None, *args, **kwargs)[source]¶ Generic method for predicting through all layers in the container.
Parameters: - X (array-like of shape = [n_samples, n_features]) – input matrix to be used for prediction.
- *args (optional) – optional arguments.
- **kwargs (optional) – optional keyword arguments.
Returns: X_pred – predictions from final layer.
Return type: array-like of shape = [n_samples, n_fitted_estimators]
-
transform
(X=None, *args, **kwargs)[source]¶ Generic method for reproducing predictions of the
fit
call.Parameters: - X (array-like of shape = [n_samples, n_features]) – input matrix to be used for prediction.
- *args (optional) – optional arguments.
- **kwargs (optional) – optional keyword arguments.
Returns: X_pred – predictions from
fit
call to final layer.Return type: array-like of shape = [n_test_samples, n_fitted_estimators]
- layers (OrderedDict, None (default = None)) – An ordered dictionary of Layer instances. To initiate a new
-
mlens.ensemble.base.
print_job
(lc, start_message)[source]¶ Print job details.
Parameters: - lc (
LayerContainer
) – The LayerContainer instance running the job. - start_message (str) – Initial message.
- lc (