mlens.utils package

Module contents

ML-ENSEMBLE

author:Sebastian Flennerhag
copyright:2017
licence:MIT
mlens.utils.check_inputs(X, y=None, check_level=0)[source]

Pre-checks on input arrays X and y.

Checks input data according to check_level to ensure format is roughly in line with what a typical estimator expects.

If check_level = 0 this test is turned off.

Parameters:
  • X (nd-array, list or sparse matrix) – Input data.
  • y (nd-array, list or sparse matrix) – Labels.
  • check_level (int (default = 2)) –

    level of strictness in checking input arrays.

    • check_level = 0 no checks, returns X, y
    • check_level = 1 will raises warnings if any non-critical test fails. Returns boolean FAIL flag.
    • check_level = 2 will impose Scikit-learn array check, which converts X and y to numpy arrays and raises error if conversion fails.
Returns:

  • FAIL (fail flag, optional) – boolean for whether any test failed. Returned if check_level = 1
  • X_converted (numpy array, optional) – The converted and validated X. Returned if check_level = 2
  • y_converted (numpy array, optional) – The converted and validated y. Returned if check_level = 2.
  • random_state (object, optional) – numpy RandomState object.

mlens.utils.check_instances(instances)[source]

Helper to ensure all instances are named.

Check if instances is formatted as expected, and if not convert formatting or throw traceback error if impossible to anticipate formatting.

Parameters:instances (iterable) – instance iterable to test.
Returns:formatted – formatted instances object. Will be formatted as a dict if preprocessing cases are detected, otherwise as a list. The dict will contain lists identical to those in the single preprocessing case. Each list is of the form [('name', instance] and no names overlap.
Return type:list or dict
Raises:LayerSpecificationError : – Raises error if formatting fails, which is most likely due to wrong ordering of tuple entries, or wrong argument in the wrong position.
mlens.utils.check_is_fitted(estimator, attr)[source]

Check that ensemble has been fitted.

Parameters:
  • estimator (estimator instance) – ensemble instance to check.
  • attr (str) – attribute to assert existence of.
mlens.utils.check_ensemble_build(inst, attr='layers')[source]

Check that layers have been instantiated.

mlens.utils.assert_correct_format(estimators, preprocessing)[source]

Initial check to assert layer can be constructed.

mlens.utils.check_initialized(inst)[source]

Check if a ParallelProcessing instance is initialized properly.

mlens.utils.pickle_save(obj, name)[source]

Utility function for pickling an object

mlens.utils.pickle_load(name)[source]

Utility function for loading pickled object

mlens.utils.print_time(t0, message='', **kwargs)[source]

Utility function for printing time

mlens.utils.safe_print(*objects, **kwargs)[source]

Safe print function for backwards compatibility.

class mlens.utils.CMLog(verbose=False)[source]

Bases: object

CPU and Memory logger.

Class for starting a monitor job of CPU and memory utilization in the background in a Python script. The monitor class records the cpu_percent, rss and vms as collected by the psutil library for the parent process’ pid.

CPU usage and memory utilization are stored as attributes in numpy arrays.

Examples

>>> from time import sleep
>>> from mlens.utils.utils import CMLog
>>> cm = CMLog(verbose=True)
>>> cm.monitor(2, 0.5)
>>> _ = [i for i in range(10000000)]
>>>
>>> # Collecting before completion triggers a message but no error
>>> cm._collect()
>>>
>>> sleep(2)
>>> cm._collect()
>>> print('CPU usage:')
>>> cm.cpu
[CMLog] Monitoring for 2 seconds with checks every 0.5 seconds.
[CMLog] Job not finished. Cannot _collect yet.
[CMLog] Collecting... done. Read 4 lines in 0.000 seconds.
CPU usage:
array([ 50. ,  22.4,   6. ,  11.9])
Raises:ImportError : – Depends on psutil. If not installed, raises ImportError on instantiation.
Parameters:verbose (bool) – whether to notify of job start.
collect()[source]

Collect monitored data.

Once a monitor job finishes, call _collect to read the CPU and memory usage into python objects in the current process. If called before the job finishes, _collect issues a print statement to try again later, but no warning or error is raised.

monitor(stop=None, ival=0.1, kill=True)[source]

Start monitoring CPU and memory usage.

Parameters:
  • stop (float or None (default = None)) – seconds to monitor for. If None, monitors until _collect is called.
  • ival (float (default=0.1)) – interval of monitoring.
  • kill (bool (default = True)) – whether to kill the monitoring job if _collect is called before timeout (stop). If set to False, calling _collect will cause the instance to wait until the job completes.
mlens.utils.kwarg_parser(func, kwargs)[source]

Utility function for parsing keyword arguments