Reading datasets

In this example, we show the use of the generic read method to create dataset either from local or remote files.

First we need to import the spectrochempy API package

import spectrochempy as scp

Import dataset from local files

Read a IR data recorded in Omnic format (.spg extension). We just pass the file name as parameter.

dataset = scp.read("irdata/nh4y-activation.spg")
dataset
NDDataset: [float64] a.u. (shape: (y:55, x:5549))[nh4y-activation]
Summary
name
:
nh4y-activation
author
:
runner@runnervmrg6be
created
:
2026-03-29 02:33:38+00:00
description
:
Omnic title: NH4Y-activation.SPG
Omnic filename: /home/runner/.spectrochempy/testdata/irdata/nh4y-activation.spg
history
:
2026-03-29 02:33:38+00:00> Imported from spg file /home/runner/.spectrochempy/testdata/irdata/nh4y-activation.spg.
2026-03-29 02:33:38+00:00> Sorted by date
Data
title
:
absorbance
values
:
...
[[ 2.057 2.061 ... 2.013 2.012]
[ 2.033 2.037 ... 1.913 1.911]
...
[ 1.794 1.791 ... 1.198 1.198]
[ 1.816 1.815 ... 1.24 1.238]] a.u.
shape
:
(y:55, x:5549)
Dimension `x`
size
:
5549
title
:
wavenumbers
coordinates
:
[ 6000 5999 ... 650.9 649.9] cm⁻¹
Dimension `y`
size
:
55
title
:
acquisition timestamp (GMT)
coordinates
:
[1.468e+09 1.468e+09 ... 1.468e+09 1.468e+09] s
labels
:
...
[[ 2016-07-06 19:03:14+00:00 2016-07-06 19:13:14+00:00 ... 2016-07-07 04:03:17+00:00 2016-07-07 04:13:17+00:00]
[ vz0466.spa, Wed Jul 06 21:00:38 2016 (GMT+02:00) vz0467.spa, Wed Jul 06 21:10:38 2016 (GMT+02:00) ...
vz0520.spa, Thu Jul 07 06:00:41 2016 (GMT+02:00) vz0521.spa, Thu Jul 07 06:10:41 2016 (GMT+02:00)]]


_ = dataset.plot(style="paper")
plot generic read

When using read, we can pass filename as a str or a Path object.

from pathlib import Path

filename = Path("irdata/nh4y-activation.spg")
dataset = scp.read(filename)

Note that is the file is not found in the current working directory, SpectroChemPy will try to find it in the datadir directory defined in preferences :

PosixPath('/home/runner/.spectrochempy/testdata')

If the supplied argument is a directory, then the whole directory is read at once. By default, the different files will be merged along the first dimension (y). However, for this to work, the second dimension (x) must be compatible (same size) or else a WARNING appears. To avoid the warning and get individual spectra, you can set merge to False .

dataset_list = scp.read("irdata", merge=False)
dataset_list
List (len=3, type=NDDataset)
    0: NDDataset: [float64] a.u. (shape: (y:19, x:3112))[CO@Mo_Al2O3]
    Summary
    name
    :
    CO@Mo_Al2O3
    author
    :
    runner@runnervmrg6be
    created
    :
    2026-03-29 02:33:38+00:00
    description
    :
    Omnic title: Group sust Mo_Al2O3_base line.SPG
    Omnic filename: /home/runner/.spectrochempy/testdata/irdata/CO@Mo_Al2O3.SPG
    history
    :
    2026-03-29 02:33:38+00:00> Imported from spg file /home/runner/.spectrochempy/testdata/irdata/CO@Mo_Al2O3.SPG.
    2026-03-29 02:33:38+00:00> Sorted by date
    Data
    title
    :
    absorbance
    values
    :
    ...
    [[0.0008032 3.788e-05 ... 0.0003027 0.0003745]
    [-3.608e-05 -0.0001981 ... 0.0003089 0.00117]
    ...
    [0.0008357 -0.0001387 ... -0.0005221 -0.001121]
    [0.0005655 -0.000116 ... -0.00057 -0.0006307]] a.u.
    shape
    :
    (y:19, x:3112)
    Dimension `x`
    size
    :
    3112
    title
    :
    wavenumbers
    coordinates
    :
    [ 4000 3999 ... 1001 999.9] cm⁻¹
    Dimension `y`
    size
    :
    19
    title
    :
    acquisition timestamp (GMT)
    coordinates
    :
    [1.477e+09 1.477e+09 ... 1.477e+09 1.477e+09] s
    labels
    :
    ...
    [[ 2016-10-18 13:49:35+00:00 2016-10-18 13:54:06+00:00 ... 2016-10-18 16:01:33+00:00 2016-10-18 16:06:37+00:00]
    [ *Résultat de Soustraction:04_Mo_Al2O3_calc_0.003torr_LT_after sulf_Oct 18 15:46:42 2016 (GMT+02:00)
    *Résultat de Soustraction:04_Mo_Al2O3_calc_0.004torr_LT_after sulf_Oct 18 15:51:12 2016 (GMT+02:00) ...
    *Résultat de Soustraction:04_Mo_Al2O3_calc_0.905torr_LT_after sulf_Oct 18 17:58:42 2016 (GMT+02:00)
    *Résultat de Soustraction:04_Mo_Al2O3_calc_1.004torr_LT_after sulf_Oct 18 18:03:41 2016 (GMT+02:00)]]
    1: NDDataset: [float64] a.u. (shape: (y:55, x:5549))[nh4y-activation]
    Summary
    name
    :
    nh4y-activation
    author
    :
    runner@runnervmrg6be
    created
    :
    2026-03-29 02:33:38+00:00
    description
    :
    Omnic title: NH4Y-activation.SPG
    Omnic filename: /home/runner/.spectrochempy/testdata/irdata/nh4y-activation.spg
    history
    :
    2026-03-29 02:33:38+00:00> Imported from spg file /home/runner/.spectrochempy/testdata/irdata/nh4y-activation.spg.
    2026-03-29 02:33:38+00:00> Sorted by date
    Data
    title
    :
    absorbance
    values
    :
    ...
    [[ 2.057 2.061 ... 2.013 2.012]
    [ 2.033 2.037 ... 1.913 1.911]
    ...
    [ 1.794 1.791 ... 1.198 1.198]
    [ 1.816 1.815 ... 1.24 1.238]] a.u.
    shape
    :
    (y:55, x:5549)
    Dimension `x`
    size
    :
    5549
    title
    :
    wavenumbers
    coordinates
    :
    [ 6000 5999 ... 650.9 649.9] cm⁻¹
    Dimension `y`
    size
    :
    55
    title
    :
    acquisition timestamp (GMT)
    coordinates
    :
    [1.468e+09 1.468e+09 ... 1.468e+09 1.468e+09] s
    labels
    :
    ...
    [[ 2016-07-06 19:03:14+00:00 2016-07-06 19:13:14+00:00 ... 2016-07-07 04:03:17+00:00 2016-07-07 04:13:17+00:00]
    [ vz0466.spa, Wed Jul 06 21:00:38 2016 (GMT+02:00) vz0467.spa, Wed Jul 06 21:10:38 2016 (GMT+02:00) ...
    vz0520.spa, Thu Jul 07 06:00:41 2016 (GMT+02:00) vz0521.spa, Thu Jul 07 06:10:41 2016 (GMT+02:00)]]
    2: NDDataset: [float64] unitless (shape: (y:1, x:3736))[IR]
    Summary
    name
    :
    IR
    author
    :
    runner@runnervmrg6be
    created
    :
    2026-03-29 02:33:38+00:00
    description
    :
    "name" read from .csv file
    history
    :
    2026-03-29 02:33:38+00:00> Read from .csv file
    Data
    title
    :
    values
    :
    ...
    [[-0.09079 3.547 ... 4.317 -0.09079]]
    shape
    :
    (y:1, x:3736)
    Dimension `x`
    size
    :
    3736
    title
    :
    coordinates
    :
    [ 399.2 400.2 ... 4000 4001]
    Dimension `y`
    size
    :
    1
    title
    :
    coordinates
    :
    [ 0]


to get full details on the parameters that can be used, look at the API documentation: spectrochempy.read .

Import dataset from remote files

To download and read file from remote server you can use urls.

dataset_list = scp.read("http://www.eigenvector.com/data/Corn/corn.mat")

In this case the matlab data contains 7 arrays that have been automatically transformed to NDDataset .

for nd in dataset_list:
    print(f"{nd.name} : {nd.shape}")
mp6spec : (240, 700)
m5nbs : (3, 700)
mp6nbs : (8, 700)
propvals : (80, 4)

The eigenvector.com website contains the same data in a compressed (zipped) format: corn.mat_.zip . This can also be used by the read method.

dataset_list = scp.read(
    "https://eigenvector.com/wp-content/uploads/2019/06/corn.mat_.zip"
)
dataset_list
List (len=4, type=NDDataset)
    0: NDDataset: [float64] unitless (shape: (y:240, x:700))[mp6spec]
    Summary
    name
    :
    mp6spec
    author
    :
    *unknown*
    created
    :
    2026-03-29 02:33:40+00:00
    description
    :
    Concatenation of 3 datasets:
    ( m5spec, mp5spec, mp6spec )
    history
    :
    2026-03-29 02:33:40+00:00> Created by concatenate
    2026-03-29 02:33:40+00:00> Merged from several files
    Data
    title
    :
    values
    :
    ...
    [[ 0.04449 0.04438 ... 0.7309 0.7306]
    [-0.01244 -0.01251 ... 0.6849 0.6844]
    ...
    [-0.003701 -0.003818 ... 0.7128 0.7121]
    [-0.01526 -0.01538 ... 0.7028 0.7021]]
    shape
    :
    (y:240, x:700)
    Dimension `x`
    size
    :
    700
    title
    :
    coordinates
    :
    [ 1100 1102 ... 2496 2498]
    Dimension `y`
    size
    :
    240
    title
    :
    index
    coordinates
    :
    [ 0 0 ... 79 79]
    1: NDDataset: [float64] unitless (shape: (y:3, x:700))[m5nbs]
    Summary
    name
    :
    m5nbs
    author
    :
    *unknown*
    created
    :
    2026-03-29 02:33:40+00:00
    description
    :
    Concatenation of 1 datasets:
    ( m5nbs )
    history
    :
    2026-03-29 02:33:40+00:00> Created by concatenate
    2026-03-29 02:33:40+00:00> Merged from several files
    Data
    title
    :
    values
    :
    ...
    [[ 0.134 0.132 ... 0.09665 0.09771]
    [ 0.1374 0.1354 ... 0.09898 0.09999]
    [ 0.1437 0.1416 ... 0.1037 0.1048]]
    shape
    :
    (y:3, x:700)
    Dimension `x`
    size
    :
    700
    title
    :
    coordinates
    :
    [ 1100 1102 ... 2496 2498]
    Dimension `y`
    size
    :
    3
    title
    :
    index
    coordinates
    :
    [ 0 1 2]
    2: NDDataset: [float64] unitless (shape: (y:8, x:700))[mp6nbs]
    Summary
    name
    :
    mp6nbs
    author
    :
    *unknown*
    created
    :
    2026-03-29 02:33:40+00:00
    description
    :
    Concatenation of 2 datasets:
    ( mp5nbs, mp6nbs )
    history
    :
    2026-03-29 02:33:40+00:00> Created by concatenate
    2026-03-29 02:33:40+00:00> Merged from several files
    Data
    title
    :
    values
    :
    ...
    [[ 0.06679 0.0645 ... 0.03636 0.03724]
    [ 0.05404 0.0524 ... 0.02147 0.02236]
    ...
    [ 0.06115 0.05901 ... 0.03013 0.03099]
    [ 0.05311 0.05144 ... 0.02195 0.02284]]
    shape
    :
    (y:8, x:700)
    Dimension `x`
    size
    :
    700
    title
    :
    coordinates
    :
    [ 1100 1102 ... 2496 2498]
    Dimension `y`
    size
    :
    8
    title
    :
    index
    coordinates
    :
    [ 0 0 ... 3 3]
    3: NDDataset: [float64] unitless (shape: (y:80, x:4))[propvals]
    Summary
    name
    :
    propvals
    author
    :
    *unknown*
    created
    :
    2026-03-29 02:33:40+00:00
    description
    :
    Concatenation of 1 datasets:
    ( propvals )
    history
    :
    2026-03-29 02:33:40+00:00> Created by concatenate
    2026-03-29 02:33:40+00:00> Merged from several files
    Data
    title
    :
    values
    :
    ...
    [[ 10.45 3.687 8.746 64.84]
    [ 10.41 3.72 8.658 64.85]
    ...
    [ 10.59 3.176 8.132 65.21]
    [ 10.98 3.328 8.428 64.85]]
    shape
    :
    (y:80, x:4)
    Dimension `x`
    size
    :
    4
    title
    :
    labels
    :
    [ Moisture Oil Protein Starch ]
    Dimension `y`
    size
    :
    80
    title
    :
    index
    coordinates
    :
    [ 0 1 ... 78 79]


Plot each of the datasets

dataset_list[-1].plot()
dataset_list[-2].plot()
dataset_list[-3].plot()
dataset_list[-4].plot()
  • plot generic read
  • plot generic read
  • plot generic read
  • plot generic read
<Axes: xlabel='values $\\mathrm{}$', ylabel='values $\\mathrm{}$'>

This ends the example ! The following line can be uncommented if no plot shows when running the .py script with python

# scp.show()

Total running time of the script: (0 minutes 2.310 seconds)