Warning

You are reading the documentation related to the development version. Go here if you are looking for the documentation of the stable release.

spectrochempy.PLSRegression

class PLSRegression(*, log_level='WARNING', warm_start=False, max_iter=500, n_components=2, scale=True, tol=1e-06)[source]

Partial Least Squares regression (PLSRegression).

The Partial Least Squares regression wraps the sklearn.cross_decomposition.PLSRegression model, with few additional methods.

Parameters
  • log_level (any of ["INFO", "DEBUG", "WARNING", "ERROR"], optional, default: "WARNING") – The log level at startup. It can be changed later on using the set_log_level method or by changing the log_level attribute.

  • warm_start (bool, optional, default: False) – When fitting repeatedly on the same dataset, but for multiple parameter values (such as to find the value maximizing performance), it may be possible to reuse previous model learned from the previous parameter value, saving time.

    When warm_start is True, the existing fitted model attributes is used to initialize the new model in a subsequent call to fit.

  • max_iter (int, optional, default: 500) – The maximum number of iterations of the power method when algorithm=’nipals’. Ignored otherwise.

  • n_components (int, optional, default: 2) – Number of components to keep. Should be in the range [1, min(n_samples, n_features, n_targets)].

  • scale (bool, optional, default: True) – Whether to scale X and Y.

  • tol (float, optional, default: 1e-06) – The tolerance used as convergence criteria in the power method:the algorithm stops whenever the squared norm of u_i - u_{i-1} is less than tol, where u corresponds to the left singular vector.

Attributes Summary

X

Return the X input dataset (eventually modified by the model).

Y

The Y input.

components

NDDataset with components in feature space (n_components, n_features).

config

traitlets.config.Config object.

log

Return log output.

max_iter

The maximum number of iterations of the power method when algorithm='nipals'.

n_components

Number of components to keep.

name

Object name

scale

Whether to scale X and Y.

tol

the algorithm stops whenever the squared norm of u_i - u_{i-1} is less than tol, where u corresponds to the left singular vector.

Methods Summary

fit(X, Y)

Fit the PLSRegression model on X and Y.

fit_transform(X, Y[, both])

Fit the model with X and Y and apply the dimensionality reduction on X and optionally on Y.

get_components([n_components])

Return the component's dataset: (selected n_components, n_features).

inverse_transform([X_transform, ...])

Transform data back to its original space.

parameters([replace, removed, default])

Alias for params method.

params([default])

Current or default configuration values.

parityplot(self[, Y, Y_hat, clear])

Plot the predicted (\(\hat{Y}\)) vs measured (\(Y\)) values.

plotmerit([X, X_hat])

Plot the input (X), reconstructed (X_hat) and residuals.

predict([X])

Predict targets of given observations.

reconstruct([X_transform])

Transform data back to its original space.

reduce([X])

Apply dimensionality reduction to X.

reset()

Reset configuration parameters to their default values

score([X, Y, sample_weight])

Return the coefficient of determination of the prediction.

to_dict()

Return config value in a dict form.

transform([X, Y, both])

Apply dimensionality reduction to Xand Y.

Attributes Documentation

X

Return the X input dataset (eventually modified by the model).

Y

The Y input.

components

NDDataset with components in feature space (n_components, n_features).

See also

get_components

Retrieve only the specified number of components.

config

traitlets.config.Config object.

log

Return log output.

max_iter

The maximum number of iterations of the power method when algorithm=’nipals’. Ignored otherwise.

n_components

Number of components to keep. Should be in the range [1, min(n_samples, n_features, n_targets)].

name

Object name

scale

Whether to scale X and Y.

tol

the algorithm stops whenever the squared norm of u_i - u_{i-1} is less than tol, where u corresponds to the left singular vector.

Type

The tolerance used as convergence criteria in the power method

Methods Documentation

fit(X, Y)[source]

Fit the PLSRegression model on X and Y.

Parameters
Returns

self – The fitted instance itself.

See also

fit_transform

Fit the model with an input dataset X and apply the dimensionality reduction on X.

fit_reduce

Alias of fit_transform (Deprecated).

fit_transform(X, Y, both=False)[source]

Fit the model with X and Y and apply the dimensionality reduction on X and optionally on Y.

Parameters
Returns

NDDataset – Dataset with shape (n_observations, n_components).

get_components(n_components=None)

Return the component’s dataset: (selected n_components, n_features).

Parameters

n_components (int, optional, default: None) – The number of components to keep in the output dataset. If None, all calculated components are returned.

Returns

NDDataset – Dataset with shape (n_components, n_features)

inverse_transform(X_transform=None, Y_transform=None, both=False, **kwargs)

Transform data back to its original space.

In other words, return reconstructed X and Y whose reduce/transform would be X_transform and Y_transform.

Parameters
  • X_transform (array-like of shape (n_observations, n_components), optional) – Reduced X data, where n_observations is the number of observations and n_components is the number of components. If X_transform is not provided, a transform of X provided in fit is performed first.

  • Y_transform (NDDataset or array-like of shape (n_observations, n_components), optional) – New data, where n_targets is the number of variables to predict. If Y_transform is not provided, a transform of Y provided in fit is performed first.

  • **kwargs (keyword parameters, optional) – See Other Parameters.

Returns

NDDataset – Dataset with shape (n_observations, n_features).

Other Parameters

n_components (int, optional) – The number of components to use for the reduction. If not given the number of components is eventually the one specified or determined in the fit process.

See also

reconstruct

Alias of inverse_transform (Deprecated).

parameters(replace="params", removed="0.7.1") def parameters(self, default=False)[source]

Alias for params method.

params(default=False)[source]

Current or default configuration values.

Parameters

default (bool, optional, default: False) – If default is True, the default parameters are returned, else the current values.

Returns

dict – Current or default configuration values.

parityplot(self, Y=None, Y_hat=None, clear=True, **kwargs)[source]

Plot the predicted (\(\hat{Y}\)) vs measured (\(Y\)) values.

\(Y\) and \(\hat{Y}\) can be passed as arguments. If not, the Y attribute is used for \(Y\)and \(\hat{Y}\)is computed by the inverse_transform method.

Parameters
  • Y (NDDataset, optional) – Measured values. If is not provided (default), the Y attribute is used and Y_hat is computed using inverse_transform.

  • Y_hat (NDDataset, optional) – Predicted values. if Y is provided, Y_hat must also be provided as computed externally.

  • clear (bool, optional) – Whether to plot on a new axes. Default is True.

  • **kwargs (keyword parameters, optional) – See Other Parameters.

Returns

Axes – Matplotlib subplot axe.

Other Parameters
  • s (float or array-like, shape (n, ), optional) – The marker size in points**2 (typographic points are 1/72 in.). Default is rcParams[‘lines.markersize’] ** 2.

  • c (array-like or list of colors or color, optional) – The marker colors. Possible values:

    • A scalar or sequence of n numbers to be mapped to colors using cmap and norm.

    • A 2D array in which the rows are RGB or RGBA.

    • A sequence of colors of length n.

    • A single color format string. see scatter for details.

  • marker (markerMarkerStyle, default: rcParams[“scatter.marker”] (default: ‘o’)) – The marker style. marker can be either an instance of the class or the text shorthand for a particular marker. See markers for more information.

  • cmap (str or Colormap, default: rcParams[“image.cmap”] (default: ‘viridis’)) – The Colormap instance or registered colormap name used to map scalar data to colors. This parameter is ignored if c is RGB(A).

  • norm (str or Normalize, optional) – The normalization method used to scale scalar data to the [0, 1] range before mapping to colors using cmap. By default, a linear scaling is used, mapping the lowest value to 0 and the highest to 1. If given, this can be one of the following:

    • An instance of Normalize or one of its subclasses (see Colormap Normalization).

    • A scale name, i.e. one of “linear”, “log”, “symlog”, “logit”, etc. For a list of available scales, call matplotlib.scale.get_scale_names(). In that case, a suitable Normalize subclass is dynamically generated and instantiated. This parameter is ignored if c is RGB(A).

  • vmin, vmax (float, optional) – When using scalar data and no explicit norm, vmin and vmax define the data range that the colormap covers. By default, the colormap covers the complete value range of the supplied data. It is an error to use vmin/vmax when a norm instance is given (but using a str norm name together with vmin/vmax is acceptable). This parameter is ignored if c is RGB(A).

  • alpha (float, default: 0.5) – The alpha blending value, between 0 (transparent) and 1 (opaque).

  • linewidths (float or array-like, default: rcParams[“lines.linewidth”] (default: 1.5)) – The linewidth of the marker edges. Note: The default edgecolors is ‘face’. You may want to change this as well.

  • edgecolors ({‘face’, ‘none’, None} or color or sequence of color, default: rcParams[“scatter.edgecolors”], (default: ‘face’)) – The edge color of the marker. Possible values: ‘face’: The edge color will always be the same as the face color. ‘none’: No patch boundary will be drawn. A color or sequence of colors. For non-filled markers, edgecolors is ignored. Instead, the color is determined like with ‘face’, i.e. from c, colors, or facecolors.

  • plotnonfinite (bool, default: False) – Whether to plot points with nonfinite c (i.e. inf, -inf or nan). If True the points are drawn with the bad colormap color (see Colormap.set_bad).

plotmerit(X=None, X_hat=None, **kwargs)[source]

Plot the input (X), reconstructed (X_hat) and residuals.

\(X\) and \(\hat{X}\) can be passed as arguments. If not, the X attribute is used for \(X\)and \(\hat{X}\)is computed by the inverse_transform method

Parameters
  • X (NDDataset, optional) – Original dataset. If is not provided (default), the X attribute is used and X_hat is computed using inverse_transform.

  • X_hat (NDDataset, optional) – Inverse transformed dataset. if X is provided, X_hat must also be provided as compuyed externally.

  • **kwargs (keyword parameters, optional) – See Other Parameters.

Returns

Axes – Matplotlib subplot axe.

Other Parameters
  • colors (tuple or ndarray of 3 colors, optional) – Colors for X , X_hat and residuals E . in the case of 2D, The default colormap is used for X . By default, the three colors are NBlue , NGreen and NRed (which are colorblind friendly).

  • offset (float, optional, default: None) – Specify the separation (in percent) between the \(X\) , \(X_hat\) and \(E\).

  • nb_traces (int or 'all', optional) – Number of lines to display. Default is 'all'.

  • **others (Other keywords parameters) – Parameters passed to the internal plot method of the X dataset.

predict(X=None)

Predict targets of given observations.

Parameters

X (NDDataset or array-like of shape (n_observations, n_features), optional) – New data, where n_observations is the number of observations and n_features is the number of features. if not provided, the input dataset of the fit method will be used.

Returns

NDDataset – Datasets with shape (n_observations,) or ( n_observations, n_targets).

reconstruct(X_transform=None, **kwargs)[source]

Transform data back to its original space.

In other words, return an input X_original whose reduce/transform would be X_transform.

Parameters
  • X_transform (array-like of shape (n_observations, n_components), optional) – Reduced X data, where n_observations is the number of observations and n_components is the number of components. If X_transform is not provided, a transform of X provided in fit is performed first.

  • **kwargs (keyword parameters, optional) – See Other Parameters.

Returns

NDDataset – Dataset with shape (n_observations, n_features).

Other Parameters

n_components (int, optional) – The number of components to use for the reduction. If not given the number of components is eventually the one specified or determined in the fit process.

See also

reconstruct

Alias of inverse_transform (Deprecated).

Notes

Deprecated in version 0.6.

reduce(X=None, **kwargs)[source]

Apply dimensionality reduction to X.

Parameters
Returns

NDDataset – Dataset with shape (n_observations, n_components).

Other Parameters

n_components (int, optional) – The number of components to use for the reduction. If not given the number of components is eventually the one specified or determined in the fit process.

Notes

Deprecated in version 0.6.

reset()[source]

Reset configuration parameters to their default values

score(X=None, Y=None, sample_weight=None)[source]

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\) , where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of Y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters
Returns

float\(R^2\) of predict(X) w.r.t Y.

to_dict()[source]

Return config value in a dict form.

Returns

dict – A regular dictionary.

transform(X=None, Y=None, both=False, **kwargs)

Apply dimensionality reduction to Xand Y.

Parameters
Returns

x_score, y_score (NDDataset or tuple of NDDataset) – Datasets with shape (n_observations, n_components).

Examples using spectrochempy.PLSRegression

PLS regression example

PLS regression example