spectrochempy.Baseline
- class Baseline(log_level='WARNING', warm_start=False, *, asymmetry=0.05, include_limits=True, lamb=100000.0, lls=False, max_iter=50, model='polynomial', multivariate=False, n_components=5, order=1, ranges, snip_width=0, tol=0.001)[source][source]
Baseline Correction processor.
The baseline correction can be applied to 1D datasets consisting in a single row with n_features or to a 2D dataset with shape (n_observations, n_features).
When dealing with 2D datasets, the baseline correction can be applied either sequentially (default) or using a multivariate approach (parameter
multivariate
set toTrue
).The
'sequential'
approach which can be used for both 1D and 2D datasets consists in fitting the baseline sequentially for each observation row (spectrum).The
'multivariate'
approach can only be applied to 2D datasets (at least 3 observations). The 2D dataset is first dimensionally reduced into several principal components using a conventional Singular Value Decomposition SVD or a non-negative matrix factorization (NMF). Each component is then fitted before an inverse transform performed to recover the baseline correction.
In both approaches, various models can be used to estimate the baseline.
'detrend'
: remove trends from data. Depending on theorder
parameter, the detrend can beconstant
(mean removal),linear
(order=1),quadratic
(order=2) orcubic
(order=3).'asls'
: Asymmetric Least Squares Smoothing baseline correction. This method is based on the work of Eilers and Boelens ([Eilers and Boelens, 2005]).'snip'
: Simple Non-Iterative Peak (SNIP) detection algorithm ([Ryan et al., 1988]).'rubberband'
: Rubberband baseline correction.'polynomial'
: Fit a nth-degree polynomial to the data. The order of the polynomial is defined by theorder
parameter. The baseline is then obtained by evaluating the polynomial at each feature defined in predefinedranges
parameter.
By default,
ranges
is set to the feature limits (i.e.,ranges=[features[0], features[-1]]
) and the baseline is fitted on the full range of the dataset.- Parameters:
log_level (any of [
"INFO"
,"DEBUG"
,"WARNING"
,"ERROR"
], optional, default:"WARNING"
) – The log level at startup. It can be changed later on using theset_log_level
method or by changing thelog_level
attribute.warm_start (
bool
, optional, default:False
) – When fitting repeatedly on the same dataset, but for multiple parameter values (such as to find the value maximizing performance), it may be possible to reuse previous model learned from the previous parameter value, saving time.When
warm_start
isTrue
, the existing fitted model attributes is used to initialize the new model in a subsequent call tofit
.asymmetry (
float
, optional, default: 0.05) – The asymmetry parameter for the AsLS method. It is typically between 0.001 and 0.1. 0.001 gives almost the same fit as the unconstrained least squaresinclude_limits (
bool
, optional, default: True) – Whether to automatically include the features limits to the specified ranges.lamb (
float
, optional, default: 100000.0) – The smoothness parameter for the AsLS method. Larger values make the baseline stiffer. Values should be in the range (0, 1e9).lls (
bool
, optional, default: False) – IfTrue
, the baseline is determined on data transformed using the log-log-square transform. This compress the dynamic range of signal and thus emphasize smaller features. This parameter is alwaysTrue
for the ‘snip’ model.max_iter (
int
, optional, default: 50) – Maximum number of AsLS iteration.model (any value of [
'polynomial'
,'detrend'
,'asls'
,'snip'
,'rubberband'
], optional, default:'polynomial'
) – The model used to determine the baseline.polynomial
: the baseline correction is determined by a nth-degree polynomial fitted on the data belonging to the selectedranges
. Theorder
parameter to determine the degree of the polynomial.detrend
: removes a constant, linear or polynomial trend to the data. The order of the trend is determined by theorder
parameter.asls
: the baseline is determined by an asymmetric least square algorithm.snip
: the baseline is determined by a simple non-iterative peak detection algorithm.rubberband
: the baseline is determined by a rubberband algorithm.
multivariate (a boolean or any of [‘nmf’, ‘svd’] (case-insensitive), optional, default: False) – For 2D datasets, if
True
or if multivariate=’svd’ or ‘nmf’ , a multivariate method is used to fit a baseline on the principal components determined using a SVD decomposition ifmultivariate='svd'
orTrue
, or a NMF factorization ifmultivariate='nmf'
,followed by an inverse-transform to retrieve the baseline corrected dataset. IfFalse
, a sequential method is used which consists in fitting a baseline on each row (observations) of the dataset.n_components (
int
, optional, default: 5) – Number of components to use for the multivariate method (n_observations >=n_components
).order (an int or any of [‘constant’, ‘linear’, ‘quadratic’, ‘cubic’, ‘pchip’] (case-insensitive), optional, default: 1) – Polynom order to use for polynomial/pchip interpolation or detrend.
If an integer is provided, it is the order of the polynom to fit, i.e. 1 for linear,
If a string if provided among
constant
,linear
,quadratic
andcubic
, it is equivalent to order O (constant
) to 3 (cubic
).If a string equal to
pchip
is provided, the polynomial interpolation is replaced by a piecewise cubic hermite interpolation (seescipy.interpolate.PchipInterpolator
)
ranges (
list
, optional, default: []) – A sequence of features values or feature’s regions which are assumed to belong to the baseline. Feature ranges are defined as a list of 2 numerical values (start, end). Single values are internally converted to a pair (start=value, end=start). The limits of the spectra are automatically added during the fit process unless theremove_limit
parameter isTrue
.snip_width (
int
, optional, default: 0) – The width of the window used to determine the baseline using the SNIP algorithm.tol (
float
, optional, default: 0.001) – The tolerance parameter for the AsLS method. Smaller values make the fitting better but potentially increases the number of iterations and the running time. Values should be in the range (0, 1).
See also
get_baseline
Compute a baseline using the
Baseline
class.basc
Make a baseline correction using the
Baseline
class.asls
Perform an Asymmetric Least Squares Smoothing baseline correction.
snip
Perform a Simple Non-Iterative Peak (SNIP) detection algorithm.
rubberband
Perform a Rubberband baseline correction.
autosub
Perform an automatic subtraction of reference.
detrend
Remove polynomial trend along a dimension from dataset.
Initialize the BaseConfigurable class.
- Parameters:
log_level (int, optional) – The log level at startup. Default is logging.WARNING.
**kwargs (dict) – Additional keyword arguments for configuration.
Attributes Summary
Return the X input dataset (eventually modified by the model).
The asymmetry parameter for the AsLS method.
Computed baseline.
Breakpoints to define piecewise segments of the data, specified as a vector containing coordinate values or indices indicating the location of the breakpoints.
traitlets.config.Config
object.Dataset with baseline removed.
Whether to automatically include the features limits to the specified ranges.
The smoothness parameter for the AsLS method.
If
True
, the baseline is determined on data transformed using the log-log-square transform.Return
log
output.Maximum number of AsLS iteration.
The model used to determine the baseline.
For 2D datasets, if
True
or if multivariate='svd' or 'nmf' , a multivariate method is used to fit a baseline on the principal components determined using a SVD decomposition ifmultivariate='svd'
orTrue
, or a NMF factorization ifmultivariate='nmf'
,followed by an inverse-transform to retrieve the baseline corrected dataset.Number of components to use for the multivariate method (n_observations >=
n_components
).Object name
Polynom order to use for polynomial/pchip interpolation or detrend.
A sequence of features values or feature's regions which are assumed to belong to the baseline.
The width of the window used to determine the baseline using the SNIP algorithm.
The tolerance parameter for the AsLS method.
The actual ranges used during fitting.
Methods Summary
fit
(X)Fit a baseline model on a
X
dataset.parameters
([replace, removed, default])Alias for
params
method.params
([default])Return current or default configuration values.
plot
(**kwargs)Plot the original, baseline and corrected dataset.
reset
()Reset configuration parameters to their default values.
to_dict
()Return config value in a dict form.
Return a dataset with baseline removed.
Attributes Documentation
- X
Return the X input dataset (eventually modified by the model).
- asymmetry
The asymmetry parameter for the AsLS method. It is typically between 0.001 and 0.1. 0.001 gives almost the same fit as the unconstrained least squares
- baseline
Computed baseline.
- breakpoints
Breakpoints to define piecewise segments of the data, specified as a vector containing coordinate values or indices indicating the location of the breakpoints. Breakpoints are useful when you want to compute separate baseline/trends for different segments of the data.
- config
traitlets.config.Config
object.
- corrected
Dataset with baseline removed.
- include_limits
Whether to automatically include the features limits to the specified ranges.
- lamb
The smoothness parameter for the AsLS method. Larger values make the baseline stiffer. Values should be in the range (0, 1e9).
- lls
If
True
, the baseline is determined on data transformed using the log-log-square transform. This compress the dynamic range of signal and thus emphasize smaller features. This parameter is alwaysTrue
for the ‘snip’ model.
- log
Return
log
output.
- model
The model used to determine the baseline.
polynomial
: the baseline correction is determined by a nth-degree polynomial fitted on the data belonging to the selectedranges
. Theorder
parameter to determine the degree of the polynomial.detrend
: removes a constant, linear or polynomial trend to the data. The order of the trend is determined by theorder
parameter.asls
: the baseline is determined by an asymmetric least square algorithm.snip
: the baseline is determined by a simple non-iterative peak detection algorithm.rubberband
: the baseline is determined by a rubberband algorithm.
- multivariate
For 2D datasets, if
True
or if multivariate=’svd’ or ‘nmf’ , a multivariate method is used to fit a baseline on the principal components determined using a SVD decomposition ifmultivariate='svd'
orTrue
, or a NMF factorization ifmultivariate='nmf'
,followed by an inverse-transform to retrieve the baseline corrected dataset. IfFalse
, a sequential method is used which consists in fitting a baseline on each row (observations) of the dataset.
- n_components
Number of components to use for the multivariate method (n_observations >=
n_components
).
- name
Object name
- order
Polynom order to use for polynomial/pchip interpolation or detrend.
If an integer is provided, it is the order of the polynom to fit, i.e. 1 for linear,
If a string if provided among
constant
,linear
,quadratic
andcubic
, it is equivalent to order O (constant
) to 3 (cubic
).If a string equal to
pchip
is provided, the polynomial interpolation is replaced by a piecewise cubic hermite interpolation (seescipy.interpolate.PchipInterpolator
)
- ranges
A sequence of features values or feature’s regions which are assumed to belong to the baseline. Feature ranges are defined as a list of 2 numerical values (start, end). Single values are internally converted to a pair (start=value, end=start). The limits of the spectra are automatically added during the fit process unless the
remove_limit
parameter isTrue
.
- snip_width
The width of the window used to determine the baseline using the SNIP algorithm.
- tol
The tolerance parameter for the AsLS method. Smaller values make the fitting better but potentially increases the number of iterations and the running time. Values should be in the range (0, 1).
- used_ranges
The actual ranges used during fitting.
Eventually the features limits are included and the list returned is trimmed, cleaned and ordered.
Methods Documentation
- fit(X)[source][source]
Fit a baseline model on a
X
dataset.- Parameters:
X (
NDDataset
or array-like of shape (n_observations, n_features)) – Training data.- Returns:
self – The fitted instance itself.
- parameters(replace="params", removed="0.7.1") def parameters(self, default=False)[source]
Alias for
params
method.
- plot(**kwargs)[source][source]
Plot the original, baseline and corrected dataset.
- Parameters:
**kwargs (keyword parameters, optional) – See Other Parameters.
- Returns:
Axes
– Matplotlib subplot axe.- Other Parameters:
colors (
tuple
orndarray
of 3 colors, optional) – Colors for original , baseline and corrected data. in the case of 2D, The default colormap is used for the original data. By default, the three colors areNBlue
,NGreen
andNRed
(which are colorblind friendly).offset (
float
, optional, default:None
) – Specify the separation (in percent) between the original and corrected data.nb_traces (
int
or'all'
, optional) – Number of lines to display. Default is'all'
.**others (Other keywords parameters) – Parameters passed to the internal
plot
method of the datasets.
Examples using spectrochempy.Baseline