Import IR Data
This tutorial shows the specifics related to infrared data import in Spectrochempy. As prerequisite, the user is expected to have read the Import Tutorial.
Let’s first import spectrochempy:
[1]:
import spectrochempy as scp
|
SpectroChemPy's API - v.0.7.1 © Copyright 2014-2025 - A.Travert & C.Fernandez @ LCS |
Supported file formats
At the time of writing of this tutorial (Scpy v.0.2), spectrochempy has the following readers which are specific to IR data:
read_omnic()
to open omnic (spa and spg) filesread_opus()
to open Opus (.0, …) filesread_jcamp()
to open an IR JCAMP-DX datafileread()
which is the generic reader. The type of data is then deduced from the file extension.
General purpose data exchange formats such as .csv or .mat will be treated in another tutorial (yet to come…) can also be read using:
read_csv()
to open .csv filesread_matlab()
to open .mat files
Import of OMNIC files
Thermo Scientific OMNIC software have two proprietary binary file formats:
.spa files that handle single spectra
.spg files which contain a group of spectra
Both have been reverse engineered, hence allowing extracting their key data. The Omnic reader of Spectrochempy ( read_omnic()
) has been developed based on posts in open forums on the .spa file format and extended to .spg file formats.
a) import spg file
Let’s import an .spg file from the datadir
(see :ref:import.ipynb
for details)): and display its main attributes:
[2]:
X = scp.read_omnic("irdata/CO@Mo_Al2O3.SPG")
X
[2]:
name | CO@Mo_Al2O3 |
author | runner@fv-az1774-299 |
created | 2025-02-25 08:01:37+00:00 |
description | Omnic title: Group sust Mo_Al2O3_base line.SPG Omnic filename: /home/runner/.spectrochempy/testdata/irdata/CO@Mo_Al2O3.SPG |
history | 2025-02-25 08:01:37+00:00> Imported from spg file /home/runner/.spectrochempy/testdata/irdata/CO@Mo_Al2O3.SPG. 2025-02-25 08:01:37+00:00> Sorted by date |
DATA | |
title | absorbance |
values | [[0.0008032 3.788e-05 ... 0.0003027 0.0003745] [-3.608e-05 -0.0001981 ... 0.0003089 0.00117] ... [0.0008357 -0.0001387 ... -0.0005221 -0.001121] [0.0005655 -0.000116 ... -0.00057 -0.0006307]] a.u. |
shape | (y:19, x:3112) |
DIMENSION `x` | |
size | 3112 |
title | wavenumbers |
coordinates | [ 4000 3999 ... 1001 999.9] cm⁻¹ |
DIMENSION `y` | |
size | 19 |
title | acquisition timestamp (GMT) |
coordinates | [1.477e+09 1.477e+09 ... 1.477e+09 1.477e+09] s |
labels | [[ 2016-10-18 13:49:35+00:00 2016-10-18 13:54:06+00:00 ... 2016-10-18 16:01:33+00:00 2016-10-18 16:06:37+00:00] [ *Résultat de Soustraction:04_Mo_Al2O3_calc_0.003torr_LT_after sulf_Oct 18 15:46:42 2016 (GMT+02:00) *Résultat de Soustraction:04_Mo_Al2O3_calc_0.004torr_LT_after sulf_Oct 18 15:51:12 2016 (GMT+02:00) ... *Résultat de Soustraction:04_Mo_Al2O3_calc_0.905torr_LT_after sulf_Oct 18 17:58:42 2016 (GMT+02:00) *Résultat de Soustraction:04_Mo_Al2O3_calc_1.004torr_LT_after sulf_Oct 18 18:03:41 2016 (GMT+02:00)]] |
The displayed attributes are detailed in the following.
name
is the name of the group of spectra as it appears in the .spg file. OMNIC sets this name to the .spg filename used at the creation of the group. In this example, the name (“Group sust Mo_Al2O3_base line.SPG”) differs from the filename (”CO@Mo_Al2O3.SPG”) because the latter has been changed from outside OMNIC (directly in the OS).author
is that of the creator of the NDDataset (not of the .spg file, which, to our knowledge, does not have this type of attribute). The string is composed of the username and of the machine name as given by the OS, e.g., ‘username@machinename’ . It can be accessed and changed usingX.author
.created
is the creation date of the NDDataset (again not that of the .spg file). It can be accessed (or even changed) usingX.created
.description
indicates the complete pathname of the .spg file. As the pathname is also given in the history (below), it can be a good practice to give a self-explaining description of the group, for instance:
[3]:
X.description = "CO adsorption on CoMo/Al2O3, difference spectra"
X.description
[3]:
'CO adsorption on CoMo/Al2O3, difference spectra'
or directly at the import:
[4]:
X = scp.read_omnic("irdata//CO@Mo_Al2O3.SPG", description="CO@CoMo/Al2O3, diff spectra")
X.description
[4]:
'CO@CoMo/Al2O3, diff spectra'
history
records changes made to the dataset. Here, right after its creation, it has been sorted by date (see below).
Then come the attributes related to the data themselves:
title
(not to be confused with thename
of the dataset) describes the nature of data (here absorbance ).values
shows the data as quantity (with their units when they exist - here a.u. for absorbance units).The numerical values ar accessed through the
data
attribute and the units throughoutunits
attribute.
[5]:
X.values
[5]:
Magnitude | [[0.000803191214799881 3.787875175476074e-05 ... 0.000302683562040329 0.0003744959831237793] [-3.607943654060364e-05 -0.0001980997622013092 ... 0.0003089122474193573 0.0011698119342327118] ... [0.0008356980979442596 -0.0001386702060699463 ... -0.0005221068859100342 -0.001121222972869873] [0.0005654506385326385 -0.00011600926518440247 ... -0.0005699768662452698 -0.000630699098110199]] |
---|---|
Units | a.u. |
[6]:
X.data
[6]:
array([[0.0008032, 3.788e-05, ..., 0.0003027, 0.0003745],
[-3.608e-05, -0.0001981, ..., 0.0003089, 0.00117],
...,
[0.0008357, -0.0001387, ..., -0.0005221, -0.001121],
[0.0005655, -0.000116, ..., -0.00057, -0.0006307]])
[7]:
X.units # TODO: correct this display
[7]:
shape
is the same as the ndarrayshape
attribute and gives the shape of the data array, here 19 x 3112.
Then come the attributes related to the dimensions of the dataset.
x
: this dimension has one coordinate (aCoord
object) made of the 3112 the wavenumbers.
[8]:
print(X.x)
X.x
Coord: [float64] cm⁻¹ (size: 3112)
[8]:
size | 3112 |
title | wavenumbers |
coordinates | [ 4000 3999 ... 1001 999.9] cm⁻¹ |
y
: this dimension contains:one coordinate made of the 19 acquisition timestamps
two labels
the acquisition date (UTC) of each spectrum
the name of each spectrum.
[9]:
print(X.y)
X.y
Coord: [float64] s (size: 19)
[9]:
size | 19 |
title | acquisition timestamp (GMT) |
coordinates | [1.477e+09 1.477e+09 ... 1.477e+09 1.477e+09] s |
labels | [[ 2016-10-18 13:49:35+00:00 2016-10-18 13:54:06+00:00 ... 2016-10-18 16:01:33+00:00 2016-10-18 16:06:37+00:00] [ *Résultat de Soustraction:04_Mo_Al2O3_calc_0.003torr_LT_after sulf_Oct 18 15:46:42 2016 (GMT+02:00) *Résultat de Soustraction:04_Mo_Al2O3_calc_0.004torr_LT_after sulf_Oct 18 15:51:12 2016 (GMT+02:00) ... *Résultat de Soustraction:04_Mo_Al2O3_calc_0.905torr_LT_after sulf_Oct 18 17:58:42 2016 (GMT+02:00) *Résultat de Soustraction:04_Mo_Al2O3_calc_1.004torr_LT_after sulf_Oct 18 18:03:41 2016 (GMT+02:00)]] |
dims
: Note that thex
andy
dimensions are the second and first dimension respectively. Hence,X[i,j]
will return the absorbance of the ith spectrum at the jth wavenumber. However, this is subject to change, for instance if you perform operation on your data such as Transposition. At any time the attributedims
gives the correct names (which can be modified) and order of the dimensions.
[10]:
X.dims
[10]:
['y', 'x']
Acquisition dates and y
axis
The acquisition timestamps are the Unix times of the acquisition, i.e. the time elapsed in seconds since the reference date of Jan 1st 1970, 00:00:00 UTC.
[11]:
X.y.values
[11]:
Magnitude | [1476798575.0 1476798846.0 ... 1476806493.0 1476806797.0] |
---|---|
Units | s |
In OMNIC, the acquisition time is that of the start of the acquisition. As such these may be not convenient to use directly (they are currently in the order of 1.5 billion…) With this respect, it can be convenient to shift the origin of time coordinate to that of the 1st spectrum, which has the index 0
:
[12]:
X.y = X.y - X.y[0]
X.y.values
[12]:
Magnitude | [0.0 271.0 ... 7918.0 8222.0] |
---|---|
Units | s |
Note that you can also use the inplace subtract operator to perform the same operation.
[13]:
X.y -= X.y[0]
It is also possible to use the ability of SpectroChemPy to handle unit changes. For this one can use the to
or ito
(inplace) methods.
ipython3 val = val.to(some_units) val.ito(some_units) # the same inplace
[14]:
X.y.ito("minute")
X.y.values
[14]:
Magnitude | [0.0 4.517 ... 131.967 137.033] |
---|---|
Units | min |
As shown above, the values of the Coord
object are accessed through the values
attribute. To get the last values corresponding to the last row of the X
dataset, you can use:
[15]:
tf = X.y.values[-1]
tf
[15]:
Negative index in python indicates the position in a sequence from the end, so -1 indicate the last element.
Finally, if for instance you want the x
time axis to be shifted by 2 minutes, it is also very easy to do so:
[16]:
X.y = X.y + 2
X.y.values
[16]:
Magnitude | [2.0 6.517 ... 133.967 139.033] |
---|---|
Units | min |
or using the inplace add operator: ipython3 X.y += 2
The order of spectra
The order of spectra in OMNIC .spg files depends on the order in which the spectra were included in the OMNIC window before the group was saved. By default, spectrochempy reorders the spectra by acquisition date but the original OMNIC order can be kept using the sortbydate=True
at the function call. For instance:
[17]:
X2 = scp.read_omnic("irdata/CO@Mo_Al2O3.SPG", sortbydate=False)
In the present case, this will change nothing because the spectra in the OMNIC file were already ordered by increasing data.
Finally, it is worth mentioning that a NDDataset
can generally be manipulated as numpy ndarray. Hence, for instance, the following will inverse the order of the first dimension:
[18]:
X = X[::-1] # reorders the NDDataset along the first dimension going backward
X.y.values # displays the `y` dimension
[18]:
Magnitude | [139.033 133.967 ... 6.517 2.0] |
---|---|
Units | min |
Note
Case of groups with different wavenumbers An OMNIC .spg file can contain spectra having different wavenumber axes (e.g. different spacings or wavenumber ranges). In its current implementation, the spg reader will purposely return an error because such spectra cannot be included in a single NDDataset which, by definition, contains items that share common axes or dimensions ! Future releases might include an option to deal with such a case and return a list of NDDatasets. Let us know if you are interested in such a feature, see [Bug reports and enhancement requests] (https://www.spectrochempy.fr/dev/dev/issues.html).
b) Import of .spa files
The import of a single spectrum follows exactly the same rules as that of the import of a group:
[19]:
Y = scp.read_omnic("irdata/subdir/7_CZ0-100_Pd_101.SPA")
Y
[19]:
name | 7_CZ0-100 Pd_101 |
author | runner@fv-az1774-299 |
created | 2025-02-25 08:01:38+00:00 |
description | # Omnic name: 7_CZ0-100 Pd_101 # Filename: 7_CZ0-100_Pd_101.SPA |
history | 2025-02-25 08:01:38+00:00> Imported from spa file(s) 2025-02-25 08:01:38+00:00> Data processing history from Omnic : ------------------------------------ Acquisition échantillon |
DATA | |
title | absorbance |
values | [[ 1.544 1.543 ... 2.1 2.091]] a.u. |
shape | (y:1, x:5549) |
DIMENSION `x` | |
size | 5549 |
title | wavenumbers |
coordinates | [ 6000 5999 ... 650.9 649.9] cm⁻¹ |
DIMENSION `y` | |
size | 1 |
title | acquisition timestamp (GMT) |
coordinates | [1.544e+09] s |
labels | [[ 2018-11-30 07:10:57+00:00] [ /home/runner/.spectrochempy/testdata/irdata/subdir/7_CZ0-100_Pd_101.SPA]] |
The omnic reader can also import several spa files together, providing that they share a common axis for the wavenumbers. This is the case of the following files in the irdata/subdir directory: “7_CZ0-100 Pd_101.SPA”, …, “7_CZ0-100 Pd_104.spa”. It is possible to import them in a single NDDataset by using the list of filenames in the function call:
[20]:
list_files = [
"7_CZ0-100_Pd_101.SPA",
"7_CZ0-100_Pd_102.SPA",
"7_CZ0-100_Pd_103.SPA",
"7_CZ0-100_Pd_104.SPA",
]
X = scp.read_omnic(list_files, directory="irdata/subdir")
print(X)
NDDataset: [float64] a.u. (shape: (y:4, x:5549))
When compatible .spa files are alone in a directory, a very convenient is to call the read_omnic method using only the directory path as argument that will gather the .spa files together:
[21]:
X = scp.read_omnic("irdata/subdir/1-20")
print(X)
NDDataset: [float64] a.u. (shape: (y:3, x:5549))
Warning
There is a difference in specifying the directory to read as an argument as above or as a keyword like here: ipython3 X = scp.read_omnic(directory='irdata/subdir')
in the latter case, a dialog is opened to select files in the given directory, while in the former, the file are read silently and concatenated (if possible).
Import of Bruker OPUS files
Bruker OPUS files have also a proprietary file format. The Opus reader (read_opus()
) of spectrochempy is essentially a wrapper of the python module brukeropusreader developed by QED. It imports absorbance spectra (the AB block), acquisition times and name of spectra.
The use of read_opus()
is similar to that of read_omnic()
for .spa files. Hence, one can open sample Opus files contained in the datadir
using:
[22]:
Z = scp.read_opus(["test.0000", "test.0001", "test.0002"], directory="irdata/OPUS")
print(Z)
NDDataset: [float64] a.u. (shape: (y:3, x:2567))
or:
[23]:
Z2 = scp.read_opus("irdata/OPUS")
print(Z2)
WARNING | (UserWarning) '/home/runner/.spectrochempy/testdata/irdata/OPUS/background.0 is not an Absorbance spectrum. It cannot be read with the `read_opus` import method'
NDDataset: [float64] a.u. (shape: (y:4, x:2567))
Note above that a warning was issued because the irdata/OPUS
contains a background file (single beam) which is not read by SpectroChemPy.
Finally, supplementary information can be obtained by the direct use of brukeropusreader
.
For instance:
[24]:
from brukeropusreader import read_file # noqa: E402
opusfile = scp.DATADIR / "irdata" / "OPUS" / "test.0000" # the full path of the file
Z3 = read_file(opusfile) # returns a dictionary of the data and metadata extracted
for key in Z3:
print(key)
Z3["Optik"] # looks what is the Optik block:
Text Information
Optik
Fourier Transformation
Acquisition
Sample
IgSm
IgSm Data Parameter
PhSm
ScSm
AB
Instrument (Rf)
Optik (Rf)
Acquisition (Rf)
Fourier Transformation (Rf)
IgRf
IgRf Data Parameter
ScRf Data Parameter
ScRf
History
AB Data Parameter
ScSm Data Parameter
PhSm Data Parameter
Instrument
[24]:
{'ACC': 'Compartiment Echantillon #1112355F',
'APT': '1.5 mm',
'BMS': 'KBr',
'CHN': 'Sample Compartment',
'DTC': 'LN-MCT Photovoltaic 1mm 8H Fast [Internal Pos.2]',
'HPF': '0',
'LPF': '40.',
'OPF': 'Open',
'PGN': '3',
'RDX': '0',
'SRC': 'MIR',
'VEL': '120',
'SON': 'Off'}
Import/Export of JCAMP-DX files
JCAMP-DX is an open format initially developed for IR data and extended to other spectroscopies. At present, the JCAMP-DX reader implemented in Spectrochempy is limited to IR data and AFFN encoding (see R. S. McDonald and Paul A. Wilks, JCAMP-DX: A Standard Form for Exchange of Infrared Spectra in Readable Form, Appl. Spec., 1988, 1, 151–162. doi:10.1366/0003702884428734 for details).
The JCAMP-DX reader of spectrochempy has been essentially written to read again the jcamp-dx files exported by spectrochempy write_jdx()
writer.
Hence, for instance, the first dataset can be saved in the JCAMP-DX format:
[25]:
S0 = X[0]
print(S0)
S0.write_jcamp("CO@Mo_Al2O3_0.jdx", confirm=False)
NDDataset: [float64] a.u. (shape: (y:1, x:5549))
[25]:
PosixPath('/home/runner/work/spectrochempy/spectrochempy/docs/sources/userguide/importexport/CO@Mo_Al2O3_0.jdx')
Then used (and maybe changed) by a 3rd party software, and re-imported in spectrochempy:
[26]:
newS0 = scp.read_jcamp("CO@Mo_Al2O3_0.jdx")
print(newS0)
NDDataset: [float64] a.u. (shape: (y:1, x:5549))
It is important to note here that the conversion to JCAMP-DX changes the last digits of absorbance and wavenumbers:
[27]:
def difference(x, y):
from numpy import abs
from numpy import max
nonzero = y.data != 0
error = abs(x.data - y.data)
max_relative_error = max(error[nonzero] / abs(y.data[nonzero]))
return max(error), max_relative_error
[28]:
max_error, max_rel_error = difference(S0, newS0)
print(f"Max absolute difference in absorbance: {max_error:.3g}")
print(f"Max relative difference in absorbance: {max_rel_error:.3g}")
Max absolute difference in absorbance: 2.38e-07
Max relative difference in absorbance: 1.19e-07
[29]:
max_error, max_rel_error = difference(S0.x, newS0.x)
print(f"Max absolute difference in wavenumber: {max_error:.3g}")
print(f"Max relative difference in wavenumber: {max_rel_error:.3g}")
Max absolute difference in wavenumber: 0
Max relative difference in wavenumber: 0
But this is much beyond the experimental accuracy of the data and has