Warning

You are reading the documentation related to the development version. Go here if you are looking for the documentation of the stable release.

Slicing NDDatasets¶

This tutorial shows how to handle NDDatasets using python slicing. As prerequisite, the user is expected to have read the Import Tutorials.

[1]:

import numpy as np

import spectrochempy as scp

SpectroChemPy's API - v.0.6.9.dev9
© Copyright 2014-2024 - A.Travert & C.Fernandez @ LCS

What is the slicing ?¶

The slicing of a list or an array means taking elements from a given index (or set of indexes) to another index (or set of indexes). Slicing is specified using the colon operator : with a from and to index before and after the first column, and a step after the second column. Hence, a slice of the object X will be set as:

X[from:to:step]

and will extend from the ‘from’ index, ends one item before the ‘to’ index and with an increment of stepbetween each index. When not given the default values are respectively 0 (i.e. starts at the 1st index), length in the dimension (stops at the last index), and 1.

Let’s first illustrate the concept on a 1D example:

[2]:

X = np.arange(10)  # generates a 1D array of 10 elements from 0 to 9
print(X)
print(X[2:5])  # selects all elements from 2 to 4
print(X[::2])  # selects one out of two elements
print(X[:-3])  # a negative index will be counted from the end of the array
print(
    X[::-2]
)  # a negative step will slice backward, starting from 'to', ending at 'from'

[       0        1 ...        8        9]
[       2        3        4]
[       0        2        4        6        8]
[       0        1 ...        5        6]
[       9        7        5        3        1]

The same applies to multidimensional arrays by indicating slices separated by commas:

[3]:

X = np.random.rand(10, 10)  # generates a 10x10 array filled with random values
print(X.shape)
print(X[2:5, :].shape)  # slices along the 1st dimension, X[2:5,] is equivalent
print(
    X[2:5, ::2].shape
)  # same slice along 1st dimension and takes one 1 column out of two along the second

(10, 10)
(3, 10)
(3, 5)

Slicing of NDDatasets¶

Let’s import a group of IR spectra, look at its content and plot it:

[4]:

X = scp.read_omnic("irdata/CO@Mo_Al2O3.SPG", description="CO adsorption, diff spectra")
X.y = (X.y - X[0].y).to("minute")
X

[4]:

name	CO@Mo_Al2O3
author	runner@fv-az1501-19
created	2024-04-28 03:09:00+02:00
description	CO adsorption, diff spectra
history	2024-04-28 03:09:00+02:00> Imported from spg file /home/runner/.spectrochempy/testdata/irdata/CO@Mo_Al2O3.SPG. 2024-04-28 03:09:00+02:00> Sorted by date
DATA
title	absorbance
values	[[0.0008032 3.788e-05 ... 0.0003027 0.0003745] [-3.608e-05 -0.0001981 ... 0.0003089 0.00117] ... [0.0008357 -0.0001387 ... -0.0005221 -0.001121] [0.0005655 -0.000116 ... -0.00057 -0.0006307]] a.u.
shape	(y:19, x:3112)
DIMENSION `x`
size	3112
title	wavenumbers
coordinates	[ 4000 3999 ... 1001 999.9] cm⁻¹
DIMENSION `y`
size	19
title	acquisition timestamp (GMT)
coordinates	[ 0 4.517 ... 132 137] min
labels	[[ 2016-10-18 13:49:35+00:00 2016-10-18 13:54:06+00:00 ... 2016-10-18 16:01:33+00:00 2016-10-18 16:06:37+00:00] [ Résultat de Soustraction:04_Mo_Al2O3_calc_0.003torr_LT_after sulf_Oct 18 15:46:42 2016 (GMT+02:00) Résultat de Soustraction:04_Mo_Al2O3_calc_0.004torr_LT_after sulf_Oct 18 15:51:12 2016 (GMT+02:00) ... Résultat de Soustraction:04_Mo_Al2O3_calc_0.905torr_LT_after sulf_Oct 18 17:58:42 2016 (GMT+02:00) Résultat de Soustraction:04_Mo_Al2O3_calc_1.004torr_LT_after sulf_Oct 18 18:03:41 2016 (GMT+02:00)]]

[5]:

subplot = (
    X.plot()
)  # assignment avoids the display of the object address (<matplotlib.axes._subplots.AxesSubplot ...)

../../_images/userguide_processing_slicing_8_0.png

Slicing with indexes¶

The classical slicing, using integers, can be used. For instance, along the 1st dimension:

[6]:

print(X[:4])  # selects the first four spectra
print(X[-3:])  # selects the last three spectra
print(X[::2])  # selects one spectrum out of 2

NDDataset: [float64] a.u. (shape: (y:4, x:3112))
NDDataset: [float64] a.u. (shape: (y:3, x:3112))
NDDataset: [float64] a.u. (shape: (y:10, x:3112))

The same can be made along the second dimension, simultaneously or not with the first one. For instance

[7]:

print(
    X[:, ::2]
)  # all spectra, one wavenumber out of 2   (note the bug: X[,::2] generates an error)
print(
    X[0:3, 200:1000:2]
)  # 3 first spectra, one wavenumbers out of 2, from index 200 to 1000

NDDataset: [float64] a.u. (shape: (y:19, x:1556))
NDDataset: [float64] a.u. (shape: (y:3, x:400))

Would you easily guess which wavenumber range have been actually selected ?… probably not because the relationship between the index and the wavenumber is not straightforward as it depends on the value of the first wavenumber, the wavenumber spacing, and whether the wavenumbers are arranged in ascending or descending order… Here is the answer:

[8]:

X[
    :, 200:1000:2
].x  # as the Coord can be sliced, the same is obtained with: X.x[200:1000:2]

[8]:

size	400
title	wavenumbers
coordinates	[ 3807 3805 ... 3039 3037] cm⁻¹

Slicing with coordinates¶

Now the spectroscopist is generally interested in a particular region of the spectrum, for instance, 2300-1900 cm\(^{-1}\). Can you easily guess the indexes that one should use to spectrum this region ? probably not without a calculator…

Fortunately, a simple mechanism has been implemented in spectrochempy for this purpose: the use of floats instead of integers will slice the NDDataset at the corresponding coordinates. For instance to select the 2300-1900 cm\(^{-1}\) region:

[9]:

subplot = X[:, 2300.0:1900.0:].plot()

../../_images/userguide_processing_slicing_16_0.png

The same mechanism can be used along the first dimension (y ). For instance, to select and plot the same region and the spectra recorded between 80 and 180 minutes:

[10]:

subplot = X[
    80.0:180.0, 2300.0:1900.0
].plot()  # Note that a decimal point is enough to get a float
# a warning is raised if one or several values are beyond the limits

../../_images/userguide_processing_slicing_18_0.png

Similarly, the spectrum recorded at the time the closest to 60 minutes can be selected using a float:

[11]:

X[60.0].y  # X[60.] slices the spectrum,  .y returns the corresponding `y` axis.

[11]:

size	1
title	acquisition timestamp (GMT)
coordinates	[ 58.32] min
labels	[[ 2016-10-18 14:47:54+00:00] [ *Résultat de Soustraction:04_Mo_Al2O3_calc_0.021torr_LT_after sulf_Oct 18 16:45:00 2016 (GMT+02:00)]]

— End of Tutorial — (todo: add advanced slicing by array of indexes, array of bool, …)