mlens.utils.dummy module

ML-ENSEMBLE

author:Sebastian Flennerhag
copyright:2017
license:MIT

Collection of dummy estimator classes, Mixins to build transparent layers for unit testing.

Also contains pre-made Layer, LayerContainers and data generation functions for unit testing.

class mlens.utils.dummy.Cache(X, y, data)[source]

Bases: object

Object for controlling caching.

layer_est(layer, attr)[source]

Test the estimation routine for a layer.

store_X_y(X, y, as_csv=False)[source]

Save X and y to file in temporary directory.

terminate()[source]

Remove temporary items in directory during tests.

class mlens.utils.dummy.Data(cls, proba, preprocessing, *args, **kwargs)[source]

Bases: object

Class for getting data.

get_data(shape, m)[source]

Generate X and y data with X.

Parameters:
  • shape (tuple) – shape of data to be generated
  • m (int) – length of step function for y
Returns:

  • train (ndarray) – generated as a sequence of reshaped to (LEN, WIDTH)
  • labels (ndarray) – generated as a step-function with a step every m. As such, each prediction fold during cross-validation have a unique level value.

ground_truth(X, y, subsets=1, verbose=False)[source]

Set up an experiment ground truth.

Returns:
  • F (ndarray) – Full prediction array (train errors)
  • P (ndarray) – Folded prediction array (test errors)
Raises:AssertionError : – Raises assertion error if any weight vectors overlap or any predictions (as measured by columns in F and P) are judged to be equal.
class mlens.utils.dummy.DummyPartition(tri)[source]

Bases: object

Dummy class to generate tri.

partition(as_array=True)[source]

Return the tri index.

class mlens.utils.dummy.InitMixin[source]

Bases: object

Mixin to make a mlens ensemble behave as Scikit-learn estimator.

Scikit-learn expects an estimator to be fully initialized when instantiated, but an ML-Ensemble estimator requires layers to be initialized before calling fit or predict makes sense.

InitMixin is intended to be used to create temporary test classes of proper mlens ensemble classes that are identical to the parent class except that __init__ will also initialize one layer with one estimator, and if applicable one meta estimator.

The layer estimator and the meta estimator are both the dummy AverageRegressor class to minimize complexity and avoids raising errors due to the estimators in the layers.

To create a testing class, modify the __init__ of the test class to call super().__init__ as in the example below.

Examples

Assert the SuperLearner passes the Scikit-learn estimator test

>>> from sklearn.utils.estimator_checks import check_estimator
>>> from mlens.ensemble import SuperLearner
>>> from mlens.utils.dummy import InitMixin
>>>
>>> class TestSuperLearner(InitMixin, SuperLearner):
...
...     def __init__(self):
...         super(TestSuperLearner, self).__init__()
>>>
>>> check_estimator(TestSuperLearner)
class mlens.utils.dummy.LayerGenerator[source]

Bases: object

Class for generating architectures of various types.

get_layer(kls, proba, preprocessing, *args, **kwargs)[source]

Generate a layer instance.

Parameters:
  • kls (str) – class type
  • proba (bool) – whether to set proba to True
  • preprocessing (bool) – layer with preprocessing cases
get_layer_container(kls, proba, preprocessing, *args, **kwargs)[source]

Generate a layer container instance.

Parameters:
  • kls (str) – class type
  • proba (bool) – whether to set proba to True
  • preprocessing (bool) – layer with preprocessing cases
static load_indexer(kls, args, kwargs)[source]

Load indexer and return remaining kwargs

class mlens.utils.dummy.LogisticRegression(offset=0)[source]

Bases: mlens.utils.dummy.OLS

No frill Logistic Regressor w. one-vs-rest estimation of P(label).

MWE of a Scikit-learn classifier.

LogisticRegression is a simple classifier estimator designed for transparency in unit testing. It implements a Logistic Regression with one-vs-rest strategy of classification.

The estimator is a wrapper around the OLS. The OLS prediction is squashed using the Sigmoid function, and classification is done by picking the label with the highest probability.

The offset option allows the user to offset weights in the OLS by a scalar value, if different instances should be differentiated in their predictions.

Examples

Asserting the LogisticRegression passes the Scikit-learn estimator test

>>> from sklearn.utils.estimator_checks import check_estimator
>>> from mlens.utils.dummy import LogisticRegression
>>> check_estimator(LogisticRegression)

Comparison with Scikit-learn’s LogisticRegression

>>> from mlens.utils.dummy import LogisticRegression as mlensL
>>> from sklearn.linear_model import LogisticRegression as sklearnL
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification()
>>>
>>> slr = sklearnL()
>>> slr.fit(X, y)
>>>
>>> mlr = mlensL()
>>> mlr.fit(X, y)
>>>
>>> (mlr.predict(X) == slr.predict(X)).sum() / y.shape
array([ 0.98])
fit(X, y)[source]

Fit one model per label.

predict(X)[source]

Get label predictions.

predict_proba(X)[source]

Get probability predictions.

class mlens.utils.dummy.OLS(offset=0)[source]

Bases: mlens.externals.sklearn.base.BaseEstimator

No frills vanilla OLS estimator implemented through the normal equation.

MWE of a Scikit-learn estimator.

OLS is a simple estimator designed to allow for total control over predictions in unit testing. It implements OLS through the Normal Equation, no learning takes place. The offset option allows the user to offset weights by a scalar value, if different instances should be differentiated in their predictions.

Parameters:offset (float (default = 0)) – scalar value to add to the coefficient vector after fitting.

Examples

Asserting the OLS passes the Scikit-learn estimator test

>>> from sklearn.utils.estimator_checks import check_estimator
>>> from mlens.utils.dummy import OLS
>>> check_estimator(OLS)

OLS comparison with Scikit-learn’s LinearRegression

>>> from numpy.testing import assert_array_equal
>>> from mlens.utils.dummy import OLS
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.datasets import load_boston
>>> X, y = load_boston(True)
>>>
>>> lr = LinearRegression(False)
>>> lr.fit(X, y)
>>>
>>> ols = OLS()
>>> ols.fit(X, y)
>>>
>>> assert_array_equal(lr.coef_, ols.coef_)
fit(X, y)[source]

Fit coefficient vector.

predict(X)[source]

Predict with fitted weights.

class mlens.utils.dummy.Scale(copy=True)[source]

Bases: mlens.externals.sklearn.base.BaseEstimator, mlens.externals.sklearn.base.TransformerMixin

Removes the a learnt mean in a column-wise manner in an array.

MWE of a Scikit-learn transformer, to be used for unit-tests of ensemble classes.

Parameters:copy (bool (default = True)) – Whether to copy X before transforming.

Examples

Asserting Scale passes the Scikit-learn estimator test

>>> from sklearn.utils.estimator_checks import check_estimator
>>> from mlens.utils.dummy import Scale
>>> check_estimator(Scale)

Scaling elements

>>> from numpy import arange
>>> from mlens.utils.dummy import Scale
>>> X = arange(6).reshape(3, 2)
>>> X[:, 1] *= 2
>>> print('X:')
>>> print('%r' % X)
>>> print('Scaled:')
>>> S = Scale().fit_transform(X)
>>> print('%r' % S)
X:
array([[ 0,  2],
       [ 2,  6],
       [ 4, 10]])
Scaled:
array([[-2., -4.],
       [ 0.,  0.],
       [ 2.,  4.]])
fit(X, y=None)[source]

Estimate mean.

Parameters:
  • X (array-like) – training data to fit transformer on.
  • y (array-like or None) – pass through for pipeline.
transform(X)[source]

Transform array by adjusting all elements with scale.

Parameters:X (ndarray) – matrix to transform.
mlens.utils.dummy.layer_fit(layer, cache, F, wf)[source]

Test the layer’s fit method.

mlens.utils.dummy.layer_predict(layer, cache, P, wp)[source]

Test the layer’s predict method.

mlens.utils.dummy.layer_transform(layer, cache, F)[source]

Test the layer’s transform method.

mlens.utils.dummy.lc_feature_prop(lc, X, y, F)[source]

Test input feature propagation.

mlens.utils.dummy.lc_fit(lc, X, y, F, wf)[source]

Test the layer containers fit method.

mlens.utils.dummy.lc_from_csv(lc, cache, X, y, F, wf, P, wp)[source]

Fit a layer container from file path to csv.

mlens.utils.dummy.lc_from_file(lc, cache, X, y, F, wf, P, wp)[source]

Fit a layer container from file path to numpy array.

mlens.utils.dummy.lc_predict(lc, X, P, wp)[source]

Test the layer containers predict method.

mlens.utils.dummy.lc_transform(lc, X, F)[source]

Test the layer containers transform method.