Warning

You are reading the documentation related to the development version. Go here if you are looking for the documentation of the stable release.

spectrochempy.Baseline

class Baseline(log_level='WARNING', warm_start=False, *, asymmetry=0.05, include_limits=True, lamb=100000.0, lls=False, max_iter=50, model='polynomial', multivariate=False, n_components=5, order=1, ranges, snip_width=0, tol=0.001)[source]

Baseline Correction processor.

The baseline correction can be applied to 1D datasets consisting in a single row with n_features or to a 2D dataset with shape (n_observations, n_features).

When dealing with 2D datasets, the baseline correction can be applied either sequentially (default) or using a multivariate approach (parameter `multivariate`set to `True).

  • The 'sequential' approach which can be used for both 1D and 2D datasets consists in fitting the baseline sequentially for each observation row (spectrum).

  • The 'multivariate' approach can only be applied to 2D datasets (at least 3 observations). The 2D dataset is first dimensionally reduced into several principal components using a conventional Singular Value Decomposition SVD or a non-negative matrix factorization (NMF). Each component is then fitted before an inverse transform performed to recover the baseline correction.

In both approaches, various models can be used to estimate the baseline.

  • 'detrend' : remove trends from data. Depending on the order parameter, the detrend can be constant (mean removal), linear (order=1), quadratic (order=2) or `cubic`(order=3).

  • 'asls' : Asymmetric Least Squares Smoothing baseline correction. This method is based on the work of Eilers and Boelens ([Eilers and Boelens, 2005]).

  • 'snip' : Simple Non-Iterative Peak (SNIP) detection algorithm ([Ryan et al., 1988]).

  • 'rubberband' : Rubberband baseline correction.

  • 'polynomial' : Fit a nth-degree polynomial to the data. The order of the polynomial is defined by the order parameter. The baseline is then obtained by evaluating the polynomial at each feature defined in predefined ranges.

By default, ranges is set to the feature limits (i.e. ranges=[features[0], features[-1]])

Parameters
  • log_level (any of ["INFO", "DEBUG", "WARNING", "ERROR"], optional, default: "WARNING") – The log level at startup. It can be changed later on using the set_log_level method or by changing the log_level attribute.

  • warm_start (bool, optional, default: False) – When fitting repeatedly on the same dataset, but for multiple parameter values (such as to find the value maximizing performance), it may be possible to reuse previous model learned from the previous parameter value, saving time.

    When warm_start is True, the existing fitted model attributes is used to initialize the new model in a subsequent call to fit.

  • asymmetry (float, optional, default: 0.05) – The asymmetry parameter for the AsLS method. It is typically between 0.001 and 0.1. 0.001 gives almost the same fit as the unconstrained least squares

  • include_limits (bool, optional, default: True) – Whether to automatically include the features limits to the specified ranges.

  • lamb (float, optional, default: 100000.0) – The smoothness parameter for the AsLS method. Larger values make the baseline stiffer. Values should be in the range (0, 1e9).

  • lls (bool, optional, default: False) – If True, the baseline is determined on data transformed using the log-log-square transform. This compress the dynamic range of signal and thus emphasize smaller features. This parameter is always True for the ‘snip’ model.

  • max_iter (int, optional, default: 50) – Maximum number of AsLS iteration.

  • model (any value of [ 'polynomial' , 'detrend' , 'asls' , 'snip' , 'rubberband' ], optional, default: 'polynomial') – The model used to determine the baseline.

    • ‘polynomial’: the baseline correction is determined by a nth-degree polynomial fitted on the data belonging to the selected ranges. The order parameter to determine the degree of the polynomial.

    • ‘detrend’: removes a constant, linear or polynomial trend to the data. The order of the trend is determined by the order parameter.

    • ‘asls’: the baseline is determined by an asymmetric least square algorithm.

    • ‘snip’: the baseline is determined by a simple non-iterative peak detection algorithm.

    • ‘rubberband’: the baseline is determined by a rubberband algorithm.

  • multivariate (a boolean or any of [‘nmf’, ‘svd’] (case-insensitive), optional, default: False) – For 2D datasets, if True or if multivariate=’svd’ or ‘nmf’ , a multivariate method is used to fit a baseline on the principal components determined using a SVD decomposition if multivariate='svd'or True, or a NMF factorization if multivariate='nmf',followed by an inverse-transform to retrieve the baseline corrected dataset. If False , a sequential method is used which consists in fitting a baseline on each row (observations) of the dataset.

  • n_components (int, optional, default: 5) – Number of components to use for the multivariate method (n_observations >= n_components).

  • order (an int or any of [‘constant’, ‘linear’, ‘quadratic’, ‘cubic’, ‘pchip’] (case-insensitive), optional, default: 1) – Polynom order to use for polynomial/pchip interpolation or detrend.

    • If an integer is provided, it is the order of the polynom to fit, i.e. 1 for linear,

    • If a string if provided among ‘constant’, ‘linear’, ‘quadratic’ and ‘cubic’, it is equivalent to order O (constant) to 3 (cubic).

    • If a string equal to pchip is provided, the polynomial interpolation is replaced by a piecewise cubic hermite interpolation (see scipy.interpolate.PchipInterpolator

  • ranges (list, optional, default: []) – A sequence of features values or feature’s regions which are assumed to belong to the baseline. Feature ranges are defined as a list of 2 numerical values (start, end). Single values are internally converted to a pair (start=value, end=start). The limits of the spectra are automatically added during the fit process unless the remove_limit parameter is True

  • snip_width (int, optional, default: 0) – The width of the window used to determine the baseline using the SNIP algorithm.

  • tol (float, optional, default: 0.001) – The tolerance parameter for the AsLS method. Smaller values make the fitting better but potentially increases the number of iterations and the running time. Values should be in the range (0, 1).

See also

get_baseline

Compuute a baseline using the Baseline class.

basc

Make a baseline correction using the Baseline class.

asls

Perform an Asymmetric Least Squares Smoothing baseline correction.

snip

Perform a Simple Non-Iterative Peak (SNIP) detection algorithm.

rubberband

Perform a Rubberband baseline correction.

autosub

Perform an automatic subtraction of reference.

detrend

Remove polynomial trend along a dimension from dataset.

Attributes Summary

X

Return the X input dataset (eventually modified by the model).

asymmetry

The asymmetry parameter for the AsLS method.

baseline

Computed baseline.

breakpoints

Breakpoints to define piecewise segments of the data, specified as a vector containing coordinate values or indices indicating the location of the breakpoints.

config

traitlets.config.Config object.

corrected

Dataset with baseline removed.

include_limits

Whether to automatically include the features limits to the specified ranges.

lamb

The smoothness parameter for the AsLS method.

lls

If True, the baseline is determined on data transformed using the log-log-square transform.

log

Return log output.

max_iter

Maximum number of AsLS iteration.

model

The model used to determine the baseline.

multivariate

For 2D datasets, if True or if multivariate='svd' or 'nmf' , a multivariate method is used to fit a baseline on the principal components determined using a SVD decomposition if multivariate='svd'or True, or a NMF factorization if multivariate='nmf',followed by an inverse-transform to retrieve the baseline corrected dataset.

n_components

Number of components to use for the multivariate method (n_observations >= n_components).

name

Object name

order

Polynom order to use for polynomial/pchip interpolation or detrend.

ranges

A sequence of features values or feature's regions which are assumed to belong to the baseline.

snip_width

The width of the window used to determine the baseline using the SNIP algorithm.

tol

The tolerance parameter for the AsLS method.

used_ranges

The actual ranges used during fitting

Methods Summary

fit(X)

Fit a baseline model on a X dataset.

parameters([replace, removed, default])

Alias for params method.

params([default])

Current or default configuration values.

plot(**kwargs)

Plot the original, baseline and corrected dataset.

reset()

Reset configuration parameters to their default values

to_dict()

Return config value in a dict form.

transform()

Return a dataset with baseline removed.

Attributes Documentation

X

Return the X input dataset (eventually modified by the model).

asymmetry

The asymmetry parameter for the AsLS method. It is typically between 0.001 and 0.1. 0.001 gives almost the same fit as the unconstrained least squares

baseline

Computed baseline.

breakpoints

Breakpoints to define piecewise segments of the data, specified as a vector containing coordinate values or indices indicating the location of the breakpoints. Breakpoints are useful when you want to compute separate baseline/trends for different segments of the data.

config

traitlets.config.Config object.

corrected

Dataset with baseline removed.

include_limits

Whether to automatically include the features limits to the specified ranges.

lamb

The smoothness parameter for the AsLS method. Larger values make the baseline stiffer. Values should be in the range (0, 1e9).

lls

If True, the baseline is determined on data transformed using the log-log-square transform. This compress the dynamic range of signal and thus emphasize smaller features. This parameter is always True for the ‘snip’ model.

log

Return log output.

max_iter

Maximum number of AsLS iteration.

model

The model used to determine the baseline.

  • ‘polynomial’: the baseline correction is determined by a nth-degree polynomial fitted on the data belonging to the selected ranges. The order parameter to determine the degree of the polynomial.

  • ‘detrend’: removes a constant, linear or polynomial trend to the data. The order of the trend is determined by the order parameter.

  • ‘asls’: the baseline is determined by an asymmetric least square algorithm.

  • ‘snip’: the baseline is determined by a simple non-iterative peak detection algorithm.

  • ‘rubberband’: the baseline is determined by a rubberband algorithm.

multivariate

For 2D datasets, if True or if multivariate=’svd’ or ‘nmf’ , a multivariate method is used to fit a baseline on the principal components determined using a SVD decomposition if multivariate='svd'or True, or a NMF factorization if multivariate='nmf',followed by an inverse-transform to retrieve the baseline corrected dataset. If False , a sequential method is used which consists in fitting a baseline on each row (observations) of the dataset.

n_components

Number of components to use for the multivariate method (n_observations >= n_components).

name

Object name

order

Polynom order to use for polynomial/pchip interpolation or detrend.

  • If an integer is provided, it is the order of the polynom to fit, i.e. 1 for linear,

  • If a string if provided among ‘constant’, ‘linear’, ‘quadratic’ and ‘cubic’, it is equivalent to order O (constant) to 3 (cubic).

  • If a string equal to pchip is provided, the polynomial interpolation is replaced by a piecewise cubic hermite interpolation (see scipy.interpolate.PchipInterpolator

ranges

A sequence of features values or feature’s regions which are assumed to belong to the baseline. Feature ranges are defined as a list of 2 numerical values (start, end). Single values are internally converted to a pair (start=value, end=start). The limits of the spectra are automatically added during the fit process unless the remove_limit parameter is True

snip_width

The width of the window used to determine the baseline using the SNIP algorithm.

tol

The tolerance parameter for the AsLS method. Smaller values make the fitting better but potentially increases the number of iterations and the running time. Values should be in the range (0, 1).

used_ranges

The actual ranges used during fitting

Eventually the features limits are included and the list returned is trimmed, cleaned and ordered.

Methods Documentation

fit(X)[source]

Fit a baseline model on a X dataset.

Parameters

X (NDDataset or array-like of shape (n_observations, n_features)) – Training data.

Returns

self – The fitted instance itself.

parameters(replace="params", removed="0.7.1") def parameters(self, default=False)[source]

Alias for params method.

params(default=False)[source]

Current or default configuration values.

Parameters

default (bool, optional, default: False) – If default is True, the default parameters are returned, else the current values.

Returns

dict – Current or default configuration values.

plot(**kwargs)[source]

Plot the original, baseline and corrected dataset.

Parameters

**kwargs (keyword parameters, optional) – See Other Parameters.

Returns

Axes – Matplotlib subplot axe.

Other Parameters
  • colors (tuple or ndarray of 3 colors, optional) – Colors for original , baseline and corrected data. in the case of 2D, The default colormap is used for the original data. By default, the three colors are NBlue , NGreen and NRed (which are colorblind friendly).

  • offset (float, optional, default: None) – Specify the separation (in percent) between the original and corrected data.

  • nb_traces (int or 'all', optional) – Number of lines to display. Default is 'all'.

  • **others (Other keywords parameters) – Parameters passed to the internal plot method of the datasets.

reset()[source]

Reset configuration parameters to their default values

to_dict()[source]

Return config value in a dict form.

Returns

dict – A regular dictionary.

transform()[source]

Return a dataset with baseline removed.

Examples using spectrochempy.Baseline

Processing RAMAN spectra

Processing RAMAN spectra

NDDataset baseline correction

NDDataset baseline correction