mlens.ensemble.sequential module¶
ML-ENSEMBLE
| author: | Sebastian Flennerhag |
|---|---|
| copyright: | 2017 |
| licence: | MIT |
Sequential Ensemble class. Fully integrable with Scikit-learn.
-
class
mlens.ensemble.sequential.SequentialEnsemble(shuffle=False, random_state=None, scorer=None, raise_on_exception=True, array_check=2, verbose=False, n_jobs=-1, backend=None, layers=None)[source]¶ Bases:
mlens.ensemble.base.BaseEnsembleSequential Ensemble class.
The Sequential Ensemble class allows users to build ensembles with different classes of layers. The type of layer and its parameters are specified when added to the ensemble. See respective ensemble class for details on parameters.
See also
BlendEnsemble,Subsemble,SuperLearnerParameters: - shuffle (bool (default = True)) – whether to shuffle data before generating folds.
- random_state (int (default = None)) – random seed if shuffling inputs.
- scorer (object (default = None)) – scoring function. If a function is provided, base estimators will be
scored on the training set assembled for fitting the meta estimator.
Since those predictions are out-of-sample, the scores represent valid
test scores. The scorer should be a function that accepts an array of
true values and an array of predictions:
score = f(y_true, y_pred). - raise_on_exception (bool (default = True)) – whether to issue warnings on soft exceptions or raise error.
Examples include lack of layers, bad inputs, and failed fit of an
estimator in a layer. If set to
False, warnings are issued instead but estimation continues unless exception is fatal. Note that this can result in unexpected behavior unless the exception is anticipated. - array_check (int (default = 2)) –
level of strictness in checking input arrays.
array_check = 0will not checkXoryarray_check = 1will checkXandyfor inconsistencies and warn when format looks suspicious, but retain original format.array_check = 2will impose Scikit-learn array checks, which convertsXandyto numpy arrays and raises an error if conversion fails.
- verbose (int or bool (default = False)) –
level of verbosity.
verbose = 0silent (same asverbose = False)verbose = 1messages at start and finish (same asverbose = True)verbose = 2messages for each layer
If
verbose >= 50prints tosys.stdout, elsesys.stderr. For verbosity in the layers themselves, usefit_params. - n_jobs (int (default = -1)) – number of CPU cores to use for fitting and prediction.
- backend (str or object (default = 'threading')) – backend infrastructure to use during call to
mlens.externals.joblib.Parallel. See Joblib for further documentation. To change global backend, setmlens.config.BACKEND
-
scores_¶ dict – if
scorerwas passed to instance,scores_contains dictionary with cross-validated scores assembled duringfitcall. The fold structure used for scoring is determined byfolds.
Examples
>>> from mlens.ensemble import SequentialEnsemble >>> from mlens.metrics.metrics import rmse >>> from sklearn.datasets import load_boston >>> from sklearn.linear_model import Lasso >>> from sklearn.svm import SVR >>> >>> X, y = load_boston(True) >>> >>> ensemble = SequentialEnsemble() >>> >>> # Add a subsemble with 5 partitions as first layer >>> ensemble.add('subset', [SVR(), Lasso()], n_partitions=10, n_splits=10) >>> >>> # Add a super learner as second layer >>> ensemble.add('stack', [SVR(), Lasso()], n_splits=20) >>> >>> # Specify a meta estimator >>> ensemble.add_meta(SVR()) >>> >>> ensemble.fit(X, y) >>> preds = ensemble.predict(X) >>> rmse(y, preds) 6.5628...
-
add(cls, estimators, preprocessing=None, **kwargs)[source]¶ Add layer to ensemble.
For full set of optional arguments, see the ensemble API for the specified type.
Parameters: - cls (str) –
layer class. Accepted types are:
- ‘blend’ : blend ensemble
- ‘subset’ : subsemble
- ‘stack’ : super learner
- estimators (dict of lists or list or instance) –
estimators constituting the layer. If preprocessing is none and the layer is meant to be the meta estimator, it is permissible to pass a single instantiated estimator. If
preprocessingisNoneorlist,estimatorsshould be alist. The list can either contain estimator instances, named tuples of estimator instances, or a combination of both.option_1 = [estimator_1, estimator_2] option_2 = [("est-1", estimator_1), ("est-2", estimator_2)] option_3 = [estimator_1, ("est-2", estimator_2)]
If different preprocessing pipelines are desired, a dictionary that maps estimators to preprocessing pipelines must be passed. The names of the estimator dictionary must correspond to the names of the estimator dictionary.
preprocessing_cases = {"case-1": [trans_1, trans_2], "case-2": [alt_trans_1, alt_trans_2]} estimators = {"case-1": [est_a, est_b], "case-2": [est_c, est_d]}
The lists for each dictionary entry can be any of
option_1,option_2andoption_3. - preprocessing (dict of lists or list, optional (default = None)) –
preprocessing pipelines for given layer. If the same preprocessing applies to all estimators,
preprocessingshould be a list of transformer instances. The list can contain the instances directly, named tuples of transformers, or a combination of both.option_1 = [transformer_1, transformer_2] option_2 = [("trans-1", transformer_1), ("trans-2", transformer_2)] option_3 = [transformer_1, ("trans-2", transformer_2)]
If different preprocessing pipelines are desired, a dictionary that maps preprocessing pipelines must be passed. The names of the preprocessing dictionary must correspond to the names of the estimator dictionary.
preprocessing_cases = {"case-1": [trans_1, trans_2], "case-2": [alt_trans_1, alt_trans_2]} estimators = {"case-1": [est_a, est_b], "case-2": [est_c, est_d]}
The lists for each dictionary entry can be any of
option_1,option_2andoption_3. - **kwargs (optional) – optional keyword arguments to instantiate layer with. See respective ensemble for further details.
Returns: self – ensemble instance with layer instantiated.
Return type: instance
- cls (str) –