
# PCA analysis example
In this example, we perform the PCA dimensionality reduction of a spectra
dataset


Import the spectrochempy API package



In [None]:
import spectrochempy as scp

Load a dataset



In [None]:
dataset = scp.read_omnic("irdata/nh4y-activation.spg")[::5]
print(dataset)
_ = dataset.plot()

Create a PCA object and fit the dataset so that the explained variance is greater or
equal to 99.9%



In [None]:
pca = scp.PCA(n_components=0.999)
pca.fit(dataset)

The number of fitted components is given by the n_components attribute
(We obtain 23 components)



In [None]:
pca.n_components

Transform the dataset to a lower dimensionality using all the fitted components



In [None]:
scores = pca.transform()
scores

Finally, display the results graphically
ScreePlot



In [None]:
_ = pca.screeplot()

Score Plot



In [None]:
_ = pca.scoreplot(scores, 1, 2)

Score Plot for 3 PC's in 3D



In [None]:
_ = pca.scoreplot(scores, 1, 2, 3)

Displays 4 loadings



In [None]:
_ = pca.loadings[:4].plot(legend=True)

Here we do a masking of the saturated region between 882 and 1280 cm^-1



In [None]:
dataset[
    :, 882.0:1280.0
] = scp.MASKED  # remember: use float numbers for slicing (not integer)
_ = dataset.plot()

Apply the PCA model



In [None]:
pca = scp.PCA(n_components=0.999)
pca.fit(dataset)
pca.n_components

As seen above, now only 4 components instead of 23 are necessary to 99.9% of
explained variance.



In [None]:
_ = pca.screeplot()

Displays the loadings



In [None]:
_ = pca.loadings.plot(legend=True)

Let's plot the scores



In [None]:
scores = pca.transform()
_ = pca.scoreplot(scores, 1, 2)

Our dataset has already two columns of labels for the spectra but there are little
too long for display on plots.



In [None]:
scores.y.labels

So we define some short labels for each component, and add them as a third column:



In [None]:
labels = [lab[:6] for lab in dataset.y.labels[:, 1]]
scores.y.labels = labels  # Note this does not replace previous labels,
# but adds a column.

now display thse



In [None]:
_ = pca.scoreplot(scores, 1, 2, show_labels=True, labels_column=2)

This ends the example ! The following line can be uncommented if no plot shows when
running the .py script



In [None]:
# scp.show()