mlens.utils.dummy module¶
ML-ENSEMBLE
author: | Sebastian Flennerhag |
---|---|
copyright: | 2017 |
license: | MIT |
Collection of dummy estimator classes, Mixins to build transparent layers for unit testing.
Also contains pre-made Layer, LayerContainers and data generation functions for unit testing.
-
class
mlens.utils.dummy.
Data
(cls, proba, preprocessing, *args, **kwargs)[source]¶ Bases:
object
Class for getting data.
-
get_data
(shape, m)[source]¶ Generate X and y data with X.
Parameters: - shape (tuple) – shape of data to be generated
- m (int) – length of step function for y
Returns: - train (ndarray) – generated as a sequence of reshaped to (LEN, WIDTH)
- labels (ndarray) – generated as a step-function with a step every
m
. As such, each prediction fold during cross-validation have a unique level value.
-
ground_truth
(X, y, subsets=1, verbose=False)[source]¶ Set up an experiment ground truth.
Returns: - F (ndarray) – Full prediction array (train errors)
- P (ndarray) – Folded prediction array (test errors)
Raises: AssertionError : – Raises assertion error if any weight vectors overlap or any predictions (as measured by columns in F and P) are judged to be equal.
-
-
class
mlens.utils.dummy.
InitMixin
[source]¶ Bases:
object
Mixin to make a mlens ensemble behave as Scikit-learn estimator.
Scikit-learn expects an estimator to be fully initialized when instantiated, but an ML-Ensemble estimator requires layers to be initialized before calling
fit
orpredict
makes sense.InitMixin
is intended to be used to create temporary test classes of proper mlens ensemble classes that are identical to the parent class except that__init__
will also initialize one layer with one estimator, and if applicable one meta estimator.The layer estimator and the meta estimator are both the dummy
AverageRegressor
class to minimize complexity and avoids raising errors due to the estimators in the layers.To create a testing class, modify the
__init__
of the test class to callsuper().__init__
as in the example below.Examples
Assert the
SuperLearner
passes the Scikit-learn estimator test>>> from sklearn.utils.estimator_checks import check_estimator >>> from mlens.ensemble import SuperLearner >>> from mlens.utils.dummy import InitMixin >>> >>> class TestSuperLearner(InitMixin, SuperLearner): ... ... def __init__(self): ... super(TestSuperLearner, self).__init__() >>> >>> check_estimator(TestSuperLearner)
-
class
mlens.utils.dummy.
LayerGenerator
[source]¶ Bases:
object
Class for generating architectures of various types.
-
get_layer
(kls, proba, preprocessing, *args, **kwargs)[source]¶ Generate a layer instance.
Parameters: - kls (str) – class type
- proba (bool) – whether to set
proba
toTrue
- preprocessing (bool) – layer with preprocessing cases
-
-
class
mlens.utils.dummy.
LogisticRegression
(offset=0)[source]¶ Bases:
mlens.utils.dummy.OLS
No frill Logistic Regressor w. one-vs-rest estimation of P(label).
MWE of a Scikit-learn classifier.
LogisticRegression is a simple classifier estimator designed for transparency in unit testing. It implements a Logistic Regression with one-vs-rest strategy of classification.
The estimator is a wrapper around the
OLS
. The OLS prediction is squashed using the Sigmoid function, and classification is done by picking the label with the highest probability.The
offset
option allows the user to offset weights in the OLS by a scalar value, if different instances should be differentiated in their predictions.Examples
Asserting the LogisticRegression passes the Scikit-learn estimator test
>>> from sklearn.utils.estimator_checks import check_estimator >>> from mlens.utils.dummy import LogisticRegression >>> check_estimator(LogisticRegression)
Comparison with Scikit-learn’s LogisticRegression
>>> from mlens.utils.dummy import LogisticRegression as mlensL >>> from sklearn.linear_model import LogisticRegression as sklearnL >>> from sklearn.datasets import make_classification >>> X, y = make_classification() >>> >>> slr = sklearnL() >>> slr.fit(X, y) >>> >>> mlr = mlensL() >>> mlr.fit(X, y) >>> >>> (mlr.predict(X) == slr.predict(X)).sum() / y.shape array([ 0.98])
-
class
mlens.utils.dummy.
OLS
(offset=0)[source]¶ Bases:
mlens.externals.sklearn.base.BaseEstimator
No frills vanilla OLS estimator implemented through the normal equation.
MWE of a Scikit-learn estimator.
OLS is a simple estimator designed to allow for total control over predictions in unit testing. It implements OLS through the Normal Equation, no learning takes place. The
offset
option allows the user to offset weights by a scalar value, if different instances should be differentiated in their predictions.Parameters: offset (float (default = 0)) – scalar value to add to the coefficient vector after fitting. Examples
Asserting the OLS passes the Scikit-learn estimator test
>>> from sklearn.utils.estimator_checks import check_estimator >>> from mlens.utils.dummy import OLS >>> check_estimator(OLS)
OLS comparison with Scikit-learn’s LinearRegression
>>> from numpy.testing import assert_array_equal >>> from mlens.utils.dummy import OLS >>> from sklearn.linear_model import LinearRegression >>> from sklearn.datasets import load_boston >>> X, y = load_boston(True) >>> >>> lr = LinearRegression(False) >>> lr.fit(X, y) >>> >>> ols = OLS() >>> ols.fit(X, y) >>> >>> assert_array_equal(lr.coef_, ols.coef_)
-
class
mlens.utils.dummy.
Scale
(copy=True)[source]¶ Bases:
mlens.externals.sklearn.base.BaseEstimator
,mlens.externals.sklearn.base.TransformerMixin
Removes the a learnt mean in a column-wise manner in an array.
MWE of a Scikit-learn transformer, to be used for unit-tests of ensemble classes.
Parameters: copy (bool (default = True)) – Whether to copy X before transforming. Examples
Asserting
Scale
passes the Scikit-learn estimator test>>> from sklearn.utils.estimator_checks import check_estimator >>> from mlens.utils.dummy import Scale >>> check_estimator(Scale)
Scaling elements
>>> from numpy import arange >>> from mlens.utils.dummy import Scale >>> X = arange(6).reshape(3, 2) >>> X[:, 1] *= 2 >>> print('X:') >>> print('%r' % X) >>> print('Scaled:') >>> S = Scale().fit_transform(X) >>> print('%r' % S) X: array([[ 0, 2], [ 2, 6], [ 4, 10]]) Scaled: array([[-2., -4.], [ 0., 0.], [ 2., 4.]])
-
mlens.utils.dummy.
lc_from_csv
(lc, cache, X, y, F, wf, P, wp)[source]¶ Fit a layer container from file path to csv.