mlens.utils.validation module

ML-ENSEMBLE

author:Sebastian Flennerhag
copyright:2017
license:MIT

Input validation module. Builds on Scikit-learns validation module, but extends it to a soft check that issues warnings but don’t force change the inputs.

mlens.utils.validation.check_all_finite(X)[source]

Return False if X contains NaN or infinity.

mlens.utils.validation.check_inputs(X, y=None, check_level=0)[source]

Pre-checks on input arrays X and y.

Checks input data according to check_level to ensure format is roughly in line with what a typical estimator expects.

If check_level = 0 this test is turned off.

Parameters:
  • X (nd-array, list or sparse matrix) – Input data.
  • y (nd-array, list or sparse matrix) – Labels.
  • check_level (int (default = 2)) –

    level of strictness in checking input arrays.

    • check_level = 0 no checks, returns X, y
    • check_level = 1 will raises warnings if any non-critical test fails. Returns boolean FAIL flag.
    • check_level = 2 will impose Scikit-learn array check, which converts X and y to numpy arrays and raises error if conversion fails.
Returns:

  • FAIL (fail flag, optional) – boolean for whether any test failed. Returned if check_level = 1
  • X_converted (numpy array, optional) – The converted and validated X. Returned if check_level = 2
  • y_converted (numpy array, optional) – The converted and validated y. Returned if check_level = 2.
  • random_state (object, optional) – numpy RandomState object.

mlens.utils.validation.soft_check_1d(y, y_numeric, estimator)[source]

Check if y is numeric, finite and one-dimensional.

mlens.utils.validation.soft_check_array(array, accept_sparse=True, dtype=None, ensure_2d=True, force_all_finite=True, allow_nd=True, ensure_min_samples=1, ensure_min_features=1, estimator=None)[source]

Input validation on an array, list, sparse matrix or similar.

Like Scikit-learn’s check_array , but issues warnings on failed tests and do no forced array conversion.

Parameters:
  • array (array-like) – Input object, expected to be array-like, to check / convert.
  • accept_sparse (string, list of string or None (default=None)) – String[s] representing allowed sparse matrix formats, such as ‘csc’, ‘csr’, etc. None means that sparse matrix input will raise an error. If the input is sparse but not in the allowed format, it will be converted to the first listed format.
  • dtype (string, type, list of types or None (default="numeric")) – Data type of result. If None, the dtype of the input is preserved. If “numeric”, warning is raised if array.dtype is object. If dtype is a list of types, warning is raised if array.dtype is not a member of the list.
  • force_all_finite (boolean (default=True)) – Whether to raise an error on np.inf and np.nan in X.
  • ensure_2d (boolean (default=True)) – Whether to warn if X is not at least 2d.
  • allow_nd (boolean (default=False)) – Whether to allow X.ndim > 2.
  • ensure_min_samples (int (default=1)) – Make sure that the array has a minimum number of samples in its first axis (rows for a 2D array). Setting to 0 disables this check.
  • ensure_min_features (int (default=1)) – Make sure that the 2D array has some minimum number of features (columns). The default value of 1 rejects empty datasets. This check is only enforced when the input data has effectively 2 dimensions or is originally 1D and ensure_2d is True. Setting to 0 disables this check.
  • estimator (str or estimator instance (default=None)) – If passed, include the name of the estimator in warning messages.
Returns:

CHANGE – Whether X should be changed.

Return type:

bool

mlens.utils.validation.soft_check_x_y(X, y, accept_sparse=True, dtype=None, force_all_finite=True, ensure_2d=True, allow_nd=True, multi_output=False, ensure_min_samples=1, ensure_min_features=1, y_numeric=False, estimator=None)[source]

Input validation before estimation.

Checks X and y for consistent length, and X 2d and y 1d. Standard input checks are only applied to y, such as checking that y does not have np.nan or np.inf targets. For multi-label y, set multi_output=True to allow 2d and sparse y. Raises warnings if the dtype is object.

Parameters:
  • X (nd-array, list or sparse matrix) – Input data.
  • y (nd-array, list or sparse matrix) – Labels.
  • accept_sparse (string, list of string or None (default=None)) – String[s] representing allowed sparse matrix formats, such as ‘csc’, ‘csr’, etc. None means that sparse matrix input will raise an error. If the input is sparse but not in the allowed format, it will be converted to the first listed format.
  • dtype (string, type, list of types or None (default="numeric")) – Data type of result. If None, the dtype of the input is preserved. If “numeric”, dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.
  • force_all_finite (boolean (default=True)) – Whether to raise an error on np.inf and np.nan in X. This parameter does not influence whether y can have np.inf or np.nan values.
  • ensure_2d (boolean (default=True)) – Whether to make X at least 2d.
  • allow_nd (boolean (default=False)) – Whether to allow X.ndim > 2.
  • multi_output (boolean (default=False)) – Whether to allow 2-d y (array or sparse matrix). If false, y will be validated as a vector. y cannot have np.nan or np.inf values if multi_output=True.
  • ensure_min_samples (int (default=1)) – Make sure that X has a minimum number of samples in its first axis (rows for a 2D array).
  • ensure_min_features (int (default=1)) – Make sure that the 2D array has some minimum number of features (columns). The default value of 1 rejects empty datasets. This check is only enforced when X has effectively 2 dimensions or is originally 1D and ensure_2d is True. Setting to 0 disables this check.
  • y_numeric (boolean (default=False)) – Whether to ensure that y has a numeric type. If dtype of y is object, it is converted to float64. Should only be used for regression algorithms.
  • estimator (str or estimator instance (default=None)) – If passed, include the name of the estimator in warning messages.
Returns:

  • X_converted (object) – The converted and validated X.
  • y_converted (object) – The converted and validated y.