spectrochempy.Baseline
- class Baseline(log_level='WARNING', warm_start=False, *, asymmetry=0.05, include_limits=True, lamb=100000.0, lls=False, max_iter=50, model='polynomial', multivariate=False, n_components=5, order=1, ranges, snip_width=0, tol=0.001)[source]
Baseline Correction processor.
The baseline correction can be applied to 1D datasets consisting in a single row with n_features or to a 2D dataset with shape (n_observations, n_features).
When dealing with 2D datasets, the baseline correction can be applied either sequentially (default) or using a multivariate approach (parameter
multivariateset toTrue).The
'sequential'approach which can be used for both 1D and 2D datasets consists in fitting the baseline sequentially for each observation row (spectrum).The
'multivariate'approach can only be applied to 2D datasets (at least 3 observations). The 2D dataset is first dimensionally reduced into several principal components using a conventional Singular Value Decomposition SVD or a non-negative matrix factorization (NMF). Each component is then fitted before an inverse transform performed to recover the baseline correction.
In both approaches, various models can be used to estimate the baseline.
'detrend': remove trends from data. Depending on theorderparameter, the detrend can beconstant(mean removal),linear(order=1),quadratic(order=2) orcubic(order=3).'asls': Asymmetric Least Squares Smoothing baseline correction. This method is based on the work of Eilers and Boelens ([Eilers and Boelens, 2005]).'snip': Simple Non-Iterative Peak (SNIP) detection algorithm ([Ryan et al., 1988]).'rubberband': Rubberband baseline correction.'polynomial': Fit a nth-degree polynomial to the data. The order of the polynomial is defined by theorderparameter. The baseline is then obtained by evaluating the polynomial at each feature defined in predefinedrangesparameter.
By default,
rangesis set to the feature limits (i.e.,ranges=[features[0], features[-1]]) and the baseline is fitted on the full range of the dataset.- Parameters:
log_level (any of [
"INFO","DEBUG","WARNING","ERROR"], optional, default:"WARNING") – The log level at startup. It can be changed later on using theset_log_levelmethod or by changing thelog_levelattribute.warm_start (
bool, optional, default:False) – When fitting repeatedly on the same dataset, but for multiple parameter values (such as to find the value maximizing performance), it may be possible to reuse previous model learned from the previous parameter value, saving time.When
warm_startisTrue, the existing fitted model attributes is used to initialize the new model in a subsequent call tofit.asymmetry (
float, optional, default: 0.05) – The asymmetry parameter for the AsLS method. It is typically between 0.001 and 0.1. 0.001 gives almost the same fit as the unconstrained least squaresinclude_limits (
bool, optional, default: True) – Whether to automatically include the features limits to the specified ranges.lamb (
float, optional, default: 100000.0) – The smoothness parameter for the AsLS method. Larger values make the baseline stiffer. Values should be in the range (0, 1e9).lls (
bool, optional, default: False) – IfTrue, the baseline is determined on data transformed using the log-log-square transform. This compress the dynamic range of signal and thus emphasize smaller features. This parameter is alwaysTruefor the ‘snip’ model.max_iter (
int, optional, default: 50) – Maximum number of AsLS iteration.model (any value of [
'polynomial','detrend','asls','snip','rubberband'], optional, default:'polynomial') – The model used to determine the baseline.polynomial: the baseline correction is determined by a nth-degree polynomial fitted on the data belonging to the selectedranges. Theorderparameter to determine the degree of the polynomial.detrend: removes a constant, linear or polynomial trend to the data. The order of the trend is determined by theorderparameter.asls: the baseline is determined by an asymmetric least square algorithm.snip: the baseline is determined by a simple non-iterative peak detection algorithm.rubberband: the baseline is determined by a rubberband algorithm.
multivariate (a boolean or any of [‘nmf’, ‘svd’] (case-insensitive), optional, default: False) – For 2D datasets, if
Trueor if multivariate=’svd’ or ‘nmf’ , a multivariate method is used to fit a baseline on the principal components determined using a SVD decomposition ifmultivariate='svd'orTrue, or a NMF factorization ifmultivariate='nmf',followed by an inverse-transform to retrieve the baseline corrected dataset. IfFalse, a sequential method is used which consists in fitting a baseline on each row (observations) of the dataset.n_components (
int, optional, default: 5) – Number of components to use for the multivariate method (n_observations >=n_components).order (an int or any of [‘constant’, ‘linear’, ‘quadratic’, ‘cubic’, ‘pchip’] (case-insensitive), optional, default: 1) – Polynom order to use for polynomial/pchip interpolation or detrend.
If an integer is provided, it is the order of the polynom to fit, i.e. 1 for linear,
If a string if provided among
constant,linear,quadraticandcubic, it is equivalent to order O (constant) to 3 (cubic).If a string equal to
pchipis provided, the polynomial interpolation is replaced by a piecewise cubic hermite interpolation (seescipy.interpolate.PchipInterpolator)
ranges (
list, optional, default: []) – A sequence of features values or feature’s regions which are assumed to belong to the baseline. Feature ranges are defined as a list of 2 numerical values (start, end). Single values are internally converted to a pair (start=value, end=start). The limits of the spectra are automatically added during the fit process unless theremove_limitparameter isTrue.snip_width (
int, optional, default: 0) – The width of the window used to determine the baseline using the SNIP algorithm.tol (
float, optional, default: 0.001) – The tolerance parameter for the AsLS method. Smaller values make the fitting better but potentially increases the number of iterations and the running time. Values should be in the range (0, 1).
See also
get_baselineCompute a baseline using the
Baselineclass.bascMake a baseline correction using the
Baselineclass.aslsPerform an Asymmetric Least Squares Smoothing baseline correction.
snipPerform a Simple Non-Iterative Peak (SNIP) detection algorithm.
rubberbandPerform a Rubberband baseline correction.
autosubPerform an automatic subtraction of reference.
detrendRemove polynomial trend along a dimension from dataset.
Initialize the BaseConfigurable class.
- Parameters:
log_level (int, optional) – The log level at startup. Default is logging.WARNING.
**kwargs (dict) – Additional keyword arguments for configuration.
Attributes Summary
Return the X input dataset (eventually modified by the model).
The asymmetry parameter for the AsLS method.
Computed baseline.
Breakpoints to define piecewise segments of the data, specified as a vector containing coordinate values or indices indicating the location of the breakpoints.
traitlets.config.Configobject.Dataset with baseline removed.
Whether to automatically include the features limits to the specified ranges.
The smoothness parameter for the AsLS method.
If
True, the baseline is determined on data transformed using the log-log-square transform.Return
logoutput.Maximum number of AsLS iteration.
The model used to determine the baseline.
For 2D datasets, if
Trueor if multivariate='svd' or 'nmf' , a multivariate method is used to fit a baseline on the principal components determined using a SVD decomposition ifmultivariate='svd'orTrue, or a NMF factorization ifmultivariate='nmf',followed by an inverse-transform to retrieve the baseline corrected dataset.Number of components to use for the multivariate method (n_observations >=
n_components).Object name
Polynom order to use for polynomial/pchip interpolation or detrend.
A sequence of features values or feature's regions which are assumed to belong to the baseline.
The width of the window used to determine the baseline using the SNIP algorithm.
The tolerance parameter for the AsLS method.
The actual ranges used during fitting.
Methods Summary
fit(X)Fit a baseline model on a
Xdataset.parameters([replace, removed, default])Alias for
paramsmethod.params([default])Return current or default configuration values.
plot(**kwargs)Plot the original, baseline and corrected dataset.
reset()Reset configuration parameters to their default values.
to_dict()Return config value in a dict form.
Return a dataset with baseline removed.
Attributes Documentation
- X
Return the X input dataset (eventually modified by the model).
- asymmetry
The asymmetry parameter for the AsLS method. It is typically between 0.001 and 0.1. 0.001 gives almost the same fit as the unconstrained least squares
- baseline
Computed baseline.
- breakpoints
Breakpoints to define piecewise segments of the data, specified as a vector containing coordinate values or indices indicating the location of the breakpoints. Breakpoints are useful when you want to compute separate baseline/trends for different segments of the data.
- config
traitlets.config.Configobject.
- corrected
Dataset with baseline removed.
- include_limits
Whether to automatically include the features limits to the specified ranges.
- lamb
The smoothness parameter for the AsLS method. Larger values make the baseline stiffer. Values should be in the range (0, 1e9).
- lls
If
True, the baseline is determined on data transformed using the log-log-square transform. This compress the dynamic range of signal and thus emphasize smaller features. This parameter is alwaysTruefor the ‘snip’ model.
- log
Return
logoutput.
- model
The model used to determine the baseline.
polynomial: the baseline correction is determined by a nth-degree polynomial fitted on the data belonging to the selectedranges. Theorderparameter to determine the degree of the polynomial.detrend: removes a constant, linear or polynomial trend to the data. The order of the trend is determined by theorderparameter.asls: the baseline is determined by an asymmetric least square algorithm.snip: the baseline is determined by a simple non-iterative peak detection algorithm.rubberband: the baseline is determined by a rubberband algorithm.
- multivariate
For 2D datasets, if
Trueor if multivariate=’svd’ or ‘nmf’ , a multivariate method is used to fit a baseline on the principal components determined using a SVD decomposition ifmultivariate='svd'orTrue, or a NMF factorization ifmultivariate='nmf',followed by an inverse-transform to retrieve the baseline corrected dataset. IfFalse, a sequential method is used which consists in fitting a baseline on each row (observations) of the dataset.
- n_components
Number of components to use for the multivariate method (n_observations >=
n_components).
- name
Object name
- order
Polynom order to use for polynomial/pchip interpolation or detrend.
If an integer is provided, it is the order of the polynom to fit, i.e. 1 for linear,
If a string if provided among
constant,linear,quadraticandcubic, it is equivalent to order O (constant) to 3 (cubic).If a string equal to
pchipis provided, the polynomial interpolation is replaced by a piecewise cubic hermite interpolation (seescipy.interpolate.PchipInterpolator)
- ranges
A sequence of features values or feature’s regions which are assumed to belong to the baseline. Feature ranges are defined as a list of 2 numerical values (start, end). Single values are internally converted to a pair (start=value, end=start). The limits of the spectra are automatically added during the fit process unless the
remove_limitparameter isTrue.
- snip_width
The width of the window used to determine the baseline using the SNIP algorithm.
- tol
The tolerance parameter for the AsLS method. Smaller values make the fitting better but potentially increases the number of iterations and the running time. Values should be in the range (0, 1).
- used_ranges
The actual ranges used during fitting.
Eventually the features limits are included and the list returned is trimmed, cleaned and ordered.
Methods Documentation
- fit(X)[source]
Fit a baseline model on a
Xdataset.- Parameters:
X (
NDDatasetor array-like of shape (n_observations, n_features)) – Training data.- Returns:
self – The fitted instance itself.
- parameters(replace="params", removed="0.8.0") def parameters(self, default=False)[source]
Alias for
paramsmethod.Deprecated since version 0.8.0: Use
paramsinstead.
- plot(**kwargs)[source]
Plot the original, baseline and corrected dataset.
- Parameters:
**kwargs (keyword parameters, optional) – See Other Parameters.
- Returns:
Axes– Matplotlib subplot axe.- Other Parameters:
colors (
tupleorndarrayof 3 colors, optional) – Colors for original , baseline and corrected data. in the case of 2D, The default colormap is used for the original data. By default, the three colors areNBlue,NGreenandNRed(which are colorblind friendly).offset (
float, optional, default:None) – Specify the separation (in percent) between the original and corrected data.nb_traces (
intor'all', optional) – Number of lines to display. Default is'all'.**others (Other keywords parameters) – Parameters passed to the internal
plotmethod of the datasets.
Examples using spectrochempy.Baseline