.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "gettingstarted/examples/gallery/auto_examples_analysis/a_decomposition/plot_pca_iris.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_gettingstarted_examples_gallery_auto_examples_analysis_a_decomposition_plot_pca_iris.py: PCA example (iris dataset) -------------------------- In this example, we perform the PCA dimensionality reduction of the classical `iris` dataset (Ronald A. Fisher. "The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7, pp.179-188, 1936). .. GENERATED FROM PYTHON SOURCE LINES 17-18 First we laod the spectrochempy API package .. GENERATED FROM PYTHON SOURCE LINES 18-20 .. code-block:: default import spectrochempy as scp .. GENERATED FROM PYTHON SOURCE LINES 21-22 load a dataset from scikit-learn .. GENERATED FROM PYTHON SOURCE LINES 22-24 .. code-block:: default dataset = scp.load_iris() .. GENERATED FROM PYTHON SOURCE LINES 25-29 Create a PCA object Here, the number of components wich is used by the model is automatically determined using `n_components="mle"`\. Warning: `mle` cannot be used when n_observations < n_features. .. GENERATED FROM PYTHON SOURCE LINES 29-30 .. code-block:: default pca = scp.PCA(n_components="mle") .. GENERATED FROM PYTHON SOURCE LINES 31-32 Fit dataset with the PCA model .. GENERATED FROM PYTHON SOURCE LINES 32-33 .. code-block:: default pca.fit(dataset) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 34-35 The number of components found is 3: .. GENERATED FROM PYTHON SOURCE LINES 35-36 .. code-block:: default pca.n_components .. rst-class:: sphx-glr-script-out .. code-block:: none 3 .. GENERATED FROM PYTHON SOURCE LINES 37-38 It explain 99.5 % of the variance .. GENERATED FROM PYTHON SOURCE LINES 38-39 .. code-block:: default pca.cumulative_explained_variance[-1].value .. raw:: html
99.47878161267248 %


.. GENERATED FROM PYTHON SOURCE LINES 40-43 We can also specify the amount of explained variance to compute how much components are needed (a number between 0 and 1 for n_components is required to do this). we found 4 components in this case .. GENERATED FROM PYTHON SOURCE LINES 43-46 .. code-block:: default pca = scp.PCA(n_components=0.999) pca.fit(dataset) pca.n_components .. rst-class:: sphx-glr-script-out .. code-block:: none 4 .. GENERATED FROM PYTHON SOURCE LINES 47-49 the 4 components found are in the `components` attribute of pca. These components are often called loadings in PCA analysis. .. GENERATED FROM PYTHON SOURCE LINES 49-51 .. code-block:: default loadings = pca.components loadings .. raw:: html
name `IRIS` Dataset_PCA.components
author runner@fv-az1121-436
created 2024-05-13 03:07:31+02:00
history
2024-05-13 03:07:31+02:00> Created using method PCA.components
DATA
title size
values
[[ 0.3614 -0.08452 0.8567 0.3583]
[ 0.6566 0.7302 -0.1734 -0.07548]
[ -0.582 0.5979 0.07624 0.5458]
[ -0.3155 0.3197 0.4798 -0.7537]]
shape (k:4, x:4)
DIMENSION `k`
size 4
title components
labels
[ #0 #1 #2 #3]
DIMENSION `x`
size 4
title features
labels
[ sepal_length sepal width petal_length petal_width]


.. GENERATED FROM PYTHON SOURCE LINES 52-54 Note: it is equivalently possible to use the `loadings` attribute of pca, which produce the same results. .. GENERATED FROM PYTHON SOURCE LINES 54-55 .. code-block:: default pca.loadings .. raw:: html
name `IRIS` Dataset_PCA.get_components
author runner@fv-az1121-436
created 2024-05-13 03:07:31+02:00
history
2024-05-13 03:07:31+02:00> Created using method PCA.get_components
DATA
title
values
[[ 0.3614 -0.08452 0.8567 0.3583]
[ 0.6566 0.7302 -0.1734 -0.07548]
[ -0.582 0.5979 0.07624 0.5458]
[ -0.3155 0.3197 0.4798 -0.7537]]
shape (k:4, x:4)
DIMENSION `k`
size 4
title components
labels
[ #0 #1 #2 #3]
DIMENSION `x`
size 4
title features
labels
[ sepal_length sepal width petal_length petal_width]


.. GENERATED FROM PYTHON SOURCE LINES 56-58 To Reduce the data to a lower dimensionality using these three components, we use the transform methods. The results is often called `scores` for PCA analysis. .. GENERATED FROM PYTHON SOURCE LINES 58-60 .. code-block:: default scores = pca.transform() scores .. raw:: html
name `IRIS` Dataset_PCA.transform
author runner@fv-az1121-436
created 2024-05-13 03:07:31+02:00
history
2024-05-13 03:07:31+02:00> Created using method PCA.transform
DATA
title
values
[[ -2.684 0.3194 -0.02791 -0.002262]
[ -2.714 -0.177 -0.2105 -0.09903]
...
[ 1.901 0.1166 0.7233 -0.0446]
[ 1.39 -0.2827 0.3629 0.155]]
shape (y:150, k:4)
DIMENSION `k`
size 4
title components
labels
[ #0 #1 #2 #3]
DIMENSION `y`
size 150
title samples
labels
[ setosa setosa ... virginica virginica]


.. GENERATED FROM PYTHON SOURCE LINES 61-62 Again, we can also use the `scores` attribute to get this results .. GENERATED FROM PYTHON SOURCE LINES 62-64 .. code-block:: default scores = pca.scores scores .. raw:: html
name `IRIS` Dataset_PCA.transform
author runner@fv-az1121-436
created 2024-05-13 03:07:31+02:00
history
2024-05-13 03:07:31+02:00> Created using method PCA.transform
DATA
title
values
[[ -2.684 0.3194 -0.02791 -0.002262]
[ -2.714 -0.177 -0.2105 -0.09903]
...
[ 1.901 0.1166 0.7233 -0.0446]
[ 1.39 -0.2827 0.3629 0.155]]
shape (y:150, k:4)
DIMENSION `k`
size 4
title components
labels
[ #0 #1 #2 #3]
DIMENSION `y`
size 150
title samples
labels
[ setosa setosa ... virginica virginica]


.. GENERATED FROM PYTHON SOURCE LINES 65-68 The figures of merit (explained and cumulative variance) confirm that these 4 PC's explain 100% of the variance: .. GENERATED FROM PYTHON SOURCE LINES 68-69 .. code-block:: default pca.printev() .. rst-class:: sphx-glr-script-out .. code-block:: none PC Eigenvalue %variance %cumulative of cov(X) per PC variance #1 2.056e+00 92.462 92.462 #2 4.926e-01 5.307 97.769 #3 2.797e-01 1.710 99.479 #4 1.544e-01 0.521 100.000 .. GENERATED FROM PYTHON SOURCE LINES 70-73 These figures of merit can also be displayed graphically The ScreePlot .. GENERATED FROM PYTHON SOURCE LINES 73-74 .. code-block:: default _ = pca.screeplot() .. rst-class:: sphx-glr-horizontal * .. image-sg:: /gettingstarted/examples/gallery/auto_examples_analysis/a_decomposition/images/sphx_glr_plot_pca_iris_001.png :alt: Scree plot :srcset: /gettingstarted/examples/gallery/auto_examples_analysis/a_decomposition/images/sphx_glr_plot_pca_iris_001.png :class: sphx-glr-multi-img * .. image-sg:: /gettingstarted/examples/gallery/auto_examples_analysis/a_decomposition/images/sphx_glr_plot_pca_iris_002.png :alt: plot pca iris :srcset: /gettingstarted/examples/gallery/auto_examples_analysis/a_decomposition/images/sphx_glr_plot_pca_iris_002.png :class: sphx-glr-multi-img .. GENERATED FROM PYTHON SOURCE LINES 75-79 The score plots can be used for classification purposes. The first one - in 2D for the 2 first PC's - shows that the first PC allows distinguishing Iris-setosa (score of PC#1 < -1) from other species (score of PC#1 > -1), while more PC's are required to distinguish versicolor from viginica. .. GENERATED FROM PYTHON SOURCE LINES 79-80 .. code-block:: default _ = pca.scoreplot(scores, 1, 2, color_mapping="labels") .. image-sg:: /gettingstarted/examples/gallery/auto_examples_analysis/a_decomposition/images/sphx_glr_plot_pca_iris_003.png :alt: Score plot :srcset: /gettingstarted/examples/gallery/auto_examples_analysis/a_decomposition/images/sphx_glr_plot_pca_iris_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 81-83 The second one - in 3D for the 3 first PC's - indicates that a thid PC won't allow better distinguishing versicolor from viginica. .. GENERATED FROM PYTHON SOURCE LINES 83-86 .. code-block:: default ax = pca.scoreplot(scores, 1, 2, 3, color_mapping="labels") ax.view_init(10, 75) .. image-sg:: /gettingstarted/examples/gallery/auto_examples_analysis/a_decomposition/images/sphx_glr_plot_pca_iris_004.png :alt: Score plot :srcset: /gettingstarted/examples/gallery/auto_examples_analysis/a_decomposition/images/sphx_glr_plot_pca_iris_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 87-89 This ends the example ! The following line can be uncommented if no plot shows when running the .py script with python .. GENERATED FROM PYTHON SOURCE LINES 89-91 .. code-block:: default # scp.show() .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.455 seconds) .. _sphx_glr_download_gettingstarted_examples_gallery_auto_examples_analysis_a_decomposition_plot_pca_iris.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_pca_iris.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_pca_iris.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_