API

ML-Ensemble estimators behave identically to Scikit-learn estimators, with one main difference: to properly instantiate an ensemble, at least on layer, and if applicable a meta estimator, must be added to the ensemble. Otherwise, there is no ensemble to estimate. The difference can be summarized as follows.

# sklearn API
estimator = Estimator()
estimator.fit(X, y)

# mlens API
ensemble = Ensemble().add(list_of_estimators).add_meta(estimator)
ensemble.fit(X, y)

Ensemble estimators

SuperLearner([folds, shuffle, random_state, ...]) Super Learner class.
Subsemble([partitions, partition_estimator, ...]) Subsemble class.
BlendEnsemble([test_size, shuffle, ...]) Blend Ensemble class.
SequentialEnsemble([shuffle, random_state, ...]) Sequential Ensemble class.

Model Selection

Evaluator(scorer[, cv, shuffle, ...]) Model selection across several estimators and preprocessing pipelines.

Preprocessing

EnsembleTransformer([shuffle, random_state, ...]) Ensemble Transformer class.
Subset([subset]) Select a subset of features.

Visualization

corrmat(corr[, figsize, annotate, inflate, ...]) Function for generating color-coded correlation triangle.
clustered_corrmap(corr, cls[, ...]) Function for plotting a clustered correlation heatmap.
corr_X_y(X, y[, top, figsize, fontsize, ...]) Function for plotting input feature correlations with output.
pca_plot(X, estimator[, y, cmap, figsize, ...]) Function to plot a PCA analysis of 1, 2, or 3 dims.
pca_comp_plot(X[, y, figsize, title, ...]) Function for comparing PCA analysis.
exp_var_plot(X, estimator[, figsize, ...]) Function to plot the explained variance using PCA.

For developers

The following base classes are good starting points for building new ensembles. You may want to study the source code directly.

Indexers

IdTrain([size]) Container to identify training set.
BlendIndex([test_size, train_size, X, ...]) Indexer that generates two non-overlapping subsets of X.
FoldIndex([n_splits, X, raise_on_exception]) Indexer that generates the full size of X.
SubsetIndex([n_partitions, n_splits, X, ...]) Subsample index generator.
FullIndex([X]) Vacuous indexer to be used with final layers.
ClusteredSubsetIndex(estimator[, ...]) Clustered Subsample index generator.

Estimation routines

ParallelProcessing(caller) Parallel processing engine.
ParallelEvaluation(caller) Parallel cross-validation engine.
Stacker(job, layer) Stacked fit sub-process class.
Blender(job, layer) Blended fit sub-process class.
SubStacker(job, layer) Stacked subset fit sub-process class.
SingleRun(job, layer) Single run fit sub-process class.
Evaluation(evaluator) Evaluation engine.
BaseEstimator(layer) Base class for estimating a layer in parallel.