Import IR Data

This tutorial shows the specifics related to infrared data import in Spectrochempy. As prerequisite, the user is expected to have read the Import Tutorial.

Let’s first import spectrochempy:

[1]:

import spectrochempy as scp

SpectroChemPy's API - v.0.7.1
© Copyright 2014-2025 - A.Travert & C.Fernandez @ LCS

Supported file formats

At the time of writing of this tutorial (Scpy v.0.2), spectrochempy has the following readers which are specific to IR data:

read_omnic() to open omnic (spa and spg) files
read_opus() to open Opus (.0, …) files
read_jcamp() to open an IR JCAMP-DX datafile
read() which is the generic reader. The type of data is then deduced from the file extension.

General purpose data exchange formats such as .csv or .mat will be treated in another tutorial (yet to come…) can also be read using:

read_csv() to open .csv files
read_matlab() to open .mat files

Import of OMNIC files

Thermo Scientific OMNIC software have two proprietary binary file formats:

.spa files that handle single spectra
.spg files which contain a group of spectra

Both have been reverse engineered, hence allowing extracting their key data. The Omnic reader of Spectrochempy ( read_omnic() ) has been developed based on posts in open forums on the .spa file format and extended to .spg file formats.

a) import spg file

Let’s import an .spg file from the datadir (see :ref:import.ipynb for details)): and display its main attributes:

[2]:

X = scp.read_omnic("irdata/CO@Mo_Al2O3.SPG")
X

[2]:

name	CO@Mo_Al2O3
author	runner@fv-az1774-299
created	2025-02-25 08:01:37+00:00
description	Omnic title: Group sust Mo_Al2O3_base line.SPG Omnic filename: /home/runner/.spectrochempy/testdata/irdata/CO@Mo_Al2O3.SPG
history	2025-02-25 08:01:37+00:00> Imported from spg file /home/runner/.spectrochempy/testdata/irdata/CO@Mo_Al2O3.SPG. 2025-02-25 08:01:37+00:00> Sorted by date
DATA
title	absorbance
values	[[0.0008032 3.788e-05 ... 0.0003027 0.0003745] [-3.608e-05 -0.0001981 ... 0.0003089 0.00117] ... [0.0008357 -0.0001387 ... -0.0005221 -0.001121] [0.0005655 -0.000116 ... -0.00057 -0.0006307]] a.u.
shape	(y:19, x:3112)
DIMENSION `x`
size	3112
title	wavenumbers
coordinates	[ 4000 3999 ... 1001 999.9] cm⁻¹
DIMENSION `y`
size	19
title	acquisition timestamp (GMT)
coordinates	[1.477e+09 1.477e+09 ... 1.477e+09 1.477e+09] s
labels	[[ 2016-10-18 13:49:35+00:00 2016-10-18 13:54:06+00:00 ... 2016-10-18 16:01:33+00:00 2016-10-18 16:06:37+00:00] [ Résultat de Soustraction:04_Mo_Al2O3_calc_0.003torr_LT_after sulf_Oct 18 15:46:42 2016 (GMT+02:00) Résultat de Soustraction:04_Mo_Al2O3_calc_0.004torr_LT_after sulf_Oct 18 15:51:12 2016 (GMT+02:00) ... Résultat de Soustraction:04_Mo_Al2O3_calc_0.905torr_LT_after sulf_Oct 18 17:58:42 2016 (GMT+02:00) Résultat de Soustraction:04_Mo_Al2O3_calc_1.004torr_LT_after sulf_Oct 18 18:03:41 2016 (GMT+02:00)]]

The displayed attributes are detailed in the following.

name is the name of the group of spectra as it appears in the .spg file. OMNIC sets this name to the .spg filename used at the creation of the group. In this example, the name (“Group sust Mo_Al2O3_base line.SPG”) differs from the filename (”CO@Mo_Al2O3.SPG”) because the latter has been changed from outside OMNIC (directly in the OS).
author is that of the creator of the NDDataset (not of the .spg file, which, to our knowledge, does not have this type of attribute). The string is composed of the username and of the machine name as given by the OS, e.g., ‘username@machinename’ . It can be accessed and changed using X.author .
created is the creation date of the NDDataset (again not that of the .spg file). It can be accessed (or even changed) using X.created .
description indicates the complete pathname of the .spg file. As the pathname is also given in the history (below), it can be a good practice to give a self-explaining description of the group, for instance:

[3]:

X.description = "CO adsorption on CoMo/Al2O3, difference spectra"
X.description

[3]:

'CO adsorption on CoMo/Al2O3, difference spectra'

or directly at the import:

[4]:

X = scp.read_omnic("irdata//CO@Mo_Al2O3.SPG", description="CO@CoMo/Al2O3, diff spectra")
X.description

[4]:

'CO@CoMo/Al2O3, diff spectra'

history records changes made to the dataset. Here, right after its creation, it has been sorted by date (see below).

Then come the attributes related to the data themselves:

title (not to be confused with the name of the dataset) describes the nature of data (here absorbance ).
values shows the data as quantity (with their units when they exist - here a.u. for absorbance units).
The numerical values ar accessed through the data attribute and the units throughout units attribute.

[5]:

X.values

[5]:

Magnitude

[[0.000803191214799881 3.787875175476074e-05 ... 0.000302683562040329 0.0003744959831237793] [-3.607943654060364e-05 -0.0001980997622013092 ... 0.0003089122474193573 0.0011698119342327118] ... [0.0008356980979442596 -0.0001386702060699463 ... -0.0005221068859100342 -0.001121222972869873] [0.0005654506385326385 -0.00011600926518440247 ... -0.0005699768662452698 -0.000630699098110199]]

Units

a.u.

[6]:

X.data

[6]:

array([[0.0008032, 3.788e-05, ..., 0.0003027, 0.0003745],
       [-3.608e-05, -0.0001981, ..., 0.0003089,  0.00117],
       ...,
       [0.0008357, -0.0001387, ..., -0.0005221, -0.001121],
       [0.0005655, -0.000116, ..., -0.00057, -0.0006307]])

[7]:

X.units  # TODO: correct this display

[7]:

a.u.

shape is the same as the ndarray shape attribute and gives the shape of the data array, here 19 x 3112.

Then come the attributes related to the dimensions of the dataset.

x : this dimension has one coordinate (a Coord object) made of the 3112 the wavenumbers.

[8]:

print(X.x)
X.x

Coord: [float64] cm⁻¹ (size: 3112)

[8]:

size	3112
title	wavenumbers
coordinates	[ 4000 3999 ... 1001 999.9] cm⁻¹

y : this dimension contains:
- one coordinate made of the 19 acquisition timestamps
- two labels
  - the acquisition date (UTC) of each spectrum
  - the name of each spectrum.

[9]:

print(X.y)
X.y

Coord: [float64] s (size: 19)

[9]:

size	19
title	acquisition timestamp (GMT)
coordinates	[1.477e+09 1.477e+09 ... 1.477e+09 1.477e+09] s
labels	[[ 2016-10-18 13:49:35+00:00 2016-10-18 13:54:06+00:00 ... 2016-10-18 16:01:33+00:00 2016-10-18 16:06:37+00:00] [ Résultat de Soustraction:04_Mo_Al2O3_calc_0.003torr_LT_after sulf_Oct 18 15:46:42 2016 (GMT+02:00) Résultat de Soustraction:04_Mo_Al2O3_calc_0.004torr_LT_after sulf_Oct 18 15:51:12 2016 (GMT+02:00) ... Résultat de Soustraction:04_Mo_Al2O3_calc_0.905torr_LT_after sulf_Oct 18 17:58:42 2016 (GMT+02:00) Résultat de Soustraction:04_Mo_Al2O3_calc_1.004torr_LT_after sulf_Oct 18 18:03:41 2016 (GMT+02:00)]]

dims : Note that the x and y dimensions are the second and first dimension respectively. Hence, X[i,j] will return the absorbance of the ith spectrum at the jth wavenumber. However, this is subject to change, for instance if you perform operation on your data such as Transposition. At any time the attribute dims gives the correct names (which can be modified) and order of the dimensions.

[10]:

X.dims

[10]:

['y', 'x']

Acquisition dates and `y` axis

The acquisition timestamps are the Unix times of the acquisition, i.e. the time elapsed in seconds since the reference date of Jan 1st 1970, 00:00:00 UTC.

[11]:

X.y.values

[11]:

Magnitude	[1476798575.0 1476798846.0 ... 1476806493.0 1476806797.0]
Units	s

In OMNIC, the acquisition time is that of the start of the acquisition. As such these may be not convenient to use directly (they are currently in the order of 1.5 billion…) With this respect, it can be convenient to shift the origin of time coordinate to that of the 1st spectrum, which has the index 0 :

[12]:

X.y = X.y - X.y[0]
X.y.values

[12]:

Magnitude	[0.0 271.0 ... 7918.0 8222.0]
Units	s

Note that you can also use the inplace subtract operator to perform the same operation.

[13]:

X.y -= X.y[0]

It is also possible to use the ability of SpectroChemPy to handle unit changes. For this one can use the to or ito (inplace) methods.

ipython3 val = val.to(some_units) val.ito(some_units) # the same inplace

[14]:

X.y.ito("minute")
X.y.values

[14]:

Magnitude	[0.0 4.517 ... 131.967 137.033]
Units	min

As shown above, the values of the Coord object are accessed through the values attribute. To get the last values corresponding to the last row of the X dataset, you can use:

[15]:

tf = X.y.values[-1]
tf

[15]:

137.033 min

Negative index in python indicates the position in a sequence from the end, so -1 indicate the last element.

Finally, if for instance you want the x time axis to be shifted by 2 minutes, it is also very easy to do so:

[16]:

X.y = X.y + 2
X.y.values

[16]:

Magnitude	[2.0 6.517 ... 133.967 139.033]
Units	min

or using the inplace add operator: ipython3 X.y += 2

The order of spectra

The order of spectra in OMNIC .spg files depends on the order in which the spectra were included in the OMNIC window before the group was saved. By default, spectrochempy reorders the spectra by acquisition date but the original OMNIC order can be kept using the sortbydate=True at the function call. For instance:

[17]:

X2 = scp.read_omnic("irdata/CO@Mo_Al2O3.SPG", sortbydate=False)

In the present case, this will change nothing because the spectra in the OMNIC file were already ordered by increasing data.

Finally, it is worth mentioning that a NDDataset can generally be manipulated as numpy ndarray. Hence, for instance, the following will inverse the order of the first dimension:

[18]:

X = X[::-1]  # reorders the NDDataset along the first dimension going backward
X.y.values  # displays the `y` dimension

[18]:

Magnitude	[139.033 133.967 ... 6.517 2.0]
Units	min

Note

Case of groups with different wavenumbers An OMNIC .spg file can contain spectra having different wavenumber axes (e.g. different spacings or wavenumber ranges). In its current implementation, the spg reader will purposely return an error because such spectra cannot be included in a single NDDataset which, by definition, contains items that share common axes or dimensions ! Future releases might include an option to deal with such a case and return a list of NDDatasets. Let us know if you are interested in such a feature, see [Bug reports and enhancement requests] (https://www.spectrochempy.fr/dev/dev/issues.html).

b) Import of .spa files

The import of a single spectrum follows exactly the same rules as that of the import of a group:

[19]:

Y = scp.read_omnic("irdata/subdir/7_CZ0-100_Pd_101.SPA")
Y

[19]:

name	7_CZ0-100 Pd_101
author	runner@fv-az1774-299
created	2025-02-25 08:01:38+00:00
description	# Omnic name: 7_CZ0-100 Pd_101 # Filename: 7_CZ0-100_Pd_101.SPA
history	2025-02-25 08:01:38+00:00> Imported from spa file(s) 2025-02-25 08:01:38+00:00> Data processing history from Omnic : ------------------------------------ Acquisition échantillon
DATA
title	absorbance
values	[[ 1.544 1.543 ... 2.1 2.091]] a.u.
shape	(y:1, x:5549)
DIMENSION `x`
size	5549
title	wavenumbers
coordinates	[ 6000 5999 ... 650.9 649.9] cm⁻¹
DIMENSION `y`
size	1
title	acquisition timestamp (GMT)
coordinates	[1.544e+09] s
labels	[[ 2018-11-30 07:10:57+00:00] [ /home/runner/.spectrochempy/testdata/irdata/subdir/7_CZ0-100_Pd_101.SPA]]

The omnic reader can also import several spa files together, providing that they share a common axis for the wavenumbers. This is the case of the following files in the irdata/subdir directory: “7_CZ0-100 Pd_101.SPA”, …, “7_CZ0-100 Pd_104.spa”. It is possible to import them in a single NDDataset by using the list of filenames in the function call:

[20]:

list_files = [
    "7_CZ0-100_Pd_101.SPA",
    "7_CZ0-100_Pd_102.SPA",
    "7_CZ0-100_Pd_103.SPA",
    "7_CZ0-100_Pd_104.SPA",
]
X = scp.read_omnic(list_files, directory="irdata/subdir")
print(X)

NDDataset: [float64] a.u. (shape: (y:4, x:5549))

When compatible .spa files are alone in a directory, a very convenient is to call the read_omnic method using only the directory path as argument that will gather the .spa files together:

[21]:

X = scp.read_omnic("irdata/subdir/1-20")
print(X)

NDDataset: [float64] a.u. (shape: (y:3, x:5549))

Warning

There is a difference in specifying the directory to read as an argument as above or as a keyword like here: ipython3 X = scp.read_omnic(directory='irdata/subdir') in the latter case, a dialog is opened to select files in the given directory, while in the former, the file are read silently and concatenated (if possible).

Import of Bruker OPUS files

Bruker OPUS files have also a proprietary file format. The Opus reader (read_opus() ) of spectrochempy is essentially a wrapper of the python module brukeropusreader developed by QED. It imports absorbance spectra (the AB block), acquisition times and name of spectra.

The use of read_opus() is similar to that of read_omnic() for .spa files. Hence, one can open sample Opus files contained in the datadir using:

[22]:

Z = scp.read_opus(["test.0000", "test.0001", "test.0002"], directory="irdata/OPUS")
print(Z)

NDDataset: [float64] a.u. (shape: (y:3, x:2567))

or:

[23]:

Z2 = scp.read_opus("irdata/OPUS")
print(Z2)

 WARNING | (UserWarning) '/home/runner/.spectrochempy/testdata/irdata/OPUS/background.0 is not an Absorbance spectrum. It cannot be read with the `read_opus` import method'

NDDataset: [float64] a.u. (shape: (y:4, x:2567))

Note above that a warning was issued because the irdata/OPUS contains a background file (single beam) which is not read by SpectroChemPy.

Finally, supplementary information can be obtained by the direct use of brukeropusreader .

For instance:

[24]:

from brukeropusreader import read_file  # noqa: E402

opusfile = scp.DATADIR / "irdata" / "OPUS" / "test.0000"  # the full path of the file
Z3 = read_file(opusfile)  # returns a dictionary of the data and metadata extracted
for key in Z3:
    print(key)

Z3["Optik"]  # looks what is the Optik block:

Text Information
Optik
Fourier Transformation
Acquisition
Sample
IgSm
IgSm Data Parameter
PhSm
ScSm
AB
Instrument (Rf)
Optik (Rf)
Acquisition (Rf)
Fourier Transformation (Rf)
IgRf
IgRf Data Parameter
ScRf Data Parameter
ScRf
History
AB Data Parameter
ScSm Data Parameter
PhSm Data Parameter
Instrument

[24]:

{'ACC': 'Compartiment Echantillon #1112355F',
 'APT': '1.5 mm',
 'BMS': 'KBr',
 'CHN': 'Sample Compartment',
 'DTC': 'LN-MCT Photovoltaic 1mm 8H Fast [Internal Pos.2]',
 'HPF': '0',
 'LPF': '40.',
 'OPF': 'Open',
 'PGN': '3',
 'RDX': '0',
 'SRC': 'MIR',
 'VEL': '120',
 'SON': 'Off'}

Import/Export of JCAMP-DX files

JCAMP-DX is an open format initially developed for IR data and extended to other spectroscopies. At present, the JCAMP-DX reader implemented in Spectrochempy is limited to IR data and AFFN encoding (see R. S. McDonald and Paul A. Wilks, JCAMP-DX: A Standard Form for Exchange of Infrared Spectra in Readable Form, Appl. Spec., 1988, 1, 151–162. doi:10.1366/0003702884428734 for details).

The JCAMP-DX reader of spectrochempy has been essentially written to read again the jcamp-dx files exported by spectrochempy write_jdx() writer.

Hence, for instance, the first dataset can be saved in the JCAMP-DX format:

[25]:

S0 = X[0]
print(S0)
S0.write_jcamp("CO@Mo_Al2O3_0.jdx", confirm=False)

NDDataset: [float64] a.u. (shape: (y:1, x:5549))

[25]:

PosixPath('/home/runner/work/spectrochempy/spectrochempy/docs/sources/userguide/importexport/CO@Mo_Al2O3_0.jdx')

Then used (and maybe changed) by a 3rd party software, and re-imported in spectrochempy:

[26]:

newS0 = scp.read_jcamp("CO@Mo_Al2O3_0.jdx")
print(newS0)

NDDataset: [float64] a.u. (shape: (y:1, x:5549))

It is important to note here that the conversion to JCAMP-DX changes the last digits of absorbance and wavenumbers:

[27]:

def difference(x, y):
    from numpy import abs
    from numpy import max

    nonzero = y.data != 0
    error = abs(x.data - y.data)
    max_relative_error = max(error[nonzero] / abs(y.data[nonzero]))
    return max(error), max_relative_error

[28]:

max_error, max_rel_error = difference(S0, newS0)
print(f"Max absolute difference in absorbance: {max_error:.3g}")
print(f"Max relative difference in absorbance: {max_rel_error:.3g}")

Max absolute difference in absorbance: 2.38e-07
Max relative difference in absorbance: 1.19e-07

[29]:

max_error, max_rel_error = difference(S0.x, newS0.x)
print(f"Max absolute difference in wavenumber: {max_error:.3g}")
print(f"Max relative difference in wavenumber: {max_rel_error:.3g}")

Max absolute difference in wavenumber: 0
Max relative difference in wavenumber: 0

But this is much beyond the experimental accuracy of the data and has