Warning

You are reading the documentation related to the development version. Go here if you are looking for the documentation of the stable release.

The NDDataset object

The NDDataset is the main object use by SpectroChemPy.

Like numpy ndarrays, NDDataset have the capability to be sliced, sorted and subject to mathematical operations.

But, in addition, NDDataset may have units, can be masked and each dimensions can have coordinates also with units. This make NDDataset aware of units compatibility, e.g., for binary operation such as additions or subtraction or during the application of mathematical operations. In addition or in replacement of numerical data for coordinates, NDDataset can also have labeled coordinates where labels can be different kind of objects (strings, datetime, numpy nd.ndarray or other NDDatasets, etc…).

This offers a lot of flexibility in using NDDatasets that, we hope, will be useful for applications. See the Examples for additional information about such possible applications.

Below (and in the next sections), we try to give an almost complete view of the NDDataset features.

[1]:
import spectrochempy as scp
  SpectroChemPy's API - v.0.6.9.dev9
© Copyright 2014-2024 - A.Travert & C.Fernandez @ LCS

As we will make some reference to the numpy library, we also import it here.

[2]:
import numpy as np

We additionally import the three main SpectroChemPy objects that we will use through this tutorial

[3]:
from spectrochempy import Coord, CoordSet, NDDataset

For a convenient usage of units, we will also directly import [ur]((../units/units.ipynb), the unit registry which contains all available units.

[4]:
from spectrochempy import ur

Multidimensional array are defined in Spectrochempy using the NDDataset object.

NDDataset objects mostly behave as numpy’s numpy.ndarray (see for instance numpy quickstart tutorial).

However, unlike raw numpy’s ndarray, the presence of optional properties make them (hopefully) more appropriate for handling spectroscopic information, one of the major objectives of the SpectroChemPy package:

  • mask: Data can be partially masked at will

  • units: Data can have units, allowing units-aware operations

  • CoordSet: Data can have a set of coordinates, one or several by dimensions

Additional metadata can also be added to the instances of this class through the meta properties.

1D-Dataset (unidimensional dataset)

In the following example, a minimal 1D dataset is created from a simple list, to which we can add some metadata:

[5]:
d1D = NDDataset(
    [10.0, 20.0, 30.0],
    name="Dataset N1",
    author="Blake and Mortimer",
    description="A dataset from scratch",
    history="creation",
)
d1D
[5]:
name Dataset N1
author Blake and Mortimer
created 2024-04-28 03:10:31+02:00
description
A dataset from scratch
history
2024-04-28 03:10:31+02:00> Creation
DATA
title
values
[ 10 20 30]
size 3
[6]:
print(d1D)
NDDataset: [float64] unitless (size: 3)
[7]:
_ = d1D.plot(figsize=(3, 2))
../../../_images/userguide_objects_dataset_dataset_16_0.png

Except few additional metadata such author , created …, there is not much difference with respect to a conventional numpy.array. For example, one can apply numpy ufunc‘s directly to a NDDataset or make basic arithmetic operation with these objects:

[8]:
np.sqrt(d1D)
[8]:
name Dataset N1
author Blake and Mortimer
created 2024-04-28 03:10:31+02:00
description
A dataset from scratch
history
2024-04-28 03:10:31+02:00> Creation
2024-04-28 03:10:32+02:00> Ufunc sqrt applied.
DATA
title sqrt()
values
[ 3.162 4.472 5.477]
size 3
[9]:
d1D += d1D / 2.0
d1D
[9]:
name Dataset N1
author Blake and Mortimer
created 2024-04-28 03:10:31+02:00
description
A dataset from scratch
history
2024-04-28 03:10:31+02:00> Creation
2024-04-28 03:10:32+02:00> Inplace binary op: iadd with `Dataset N1`
DATA
title
values
[ 15 30 45]
size 3

As seen above, there are some attributes that are automatically added to the dataset:

  • id : This is a unique identifier for the object.

  • name: A short and unique name for the dataset. It will beequal to the automatic id if it is not provided.

  • author : Author determined from the computer name if not provided.

  • created : Date and time of creation.

  • modified: Date and time of modification.

These attributes can be modified by the user, but the id , created and modified attributes are read only.

Some other attributes are defined to describe the data: * title: A long name that will be used in plots or in some other operations. * history: history of operation achieved on the object since the object creation. * description: A comment or a description of the objects purpose or contents. * origin: An optional reference to the source of the data.

Here is an example of the use of the NDDataset attributes:

[10]:
d1D.title = "intensity"
d1D.name = "mydataset"
d1D.history = "created from scratch"
d1D.description = "Some experimental measurements"
d1D
[10]:
name mydataset
author Blake and Mortimer
created 2024-04-28 03:10:31+02:00
description
Some experimental measurements
history
2024-04-28 03:10:31+02:00> Creation
2024-04-28 03:10:32+02:00> Inplace binary op: iadd with `Dataset N1`
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title intensity
values
[ 15 30 45]
size 3

d1D is a 1D (1-dimensional) dataset with only one dimension.

Some attributes are useful to check this kind of information:

[11]:
d1D.shape  # the shape of 1D contain only one dimension size
[11]:
(3,)
[12]:
d1D.ndim  # the number of dimensions
[12]:
1
[13]:
d1D.dims  # the name of the dimension (it has been automatically attributed)
[13]:
['x']

Note: The names of the dimensions are set automatically. But they can be changed, with the limitation that the name must be a single letter.

[14]:
d1D.dims = ["q"]  # change the list of dim names.
[15]:
d1D.dims
[15]:
['q']

nD-Dataset (multidimensional dataset)

To create a nD NDDataset, we can provide a nD-array like object to the NDDataset instance constructor

[16]:
a = np.random.rand(2, 4, 6)
a
[16]:
array([[[  0.6994,   0.5258, ...,   0.4873,   0.9664],
        [  0.7777,  0.07211, ...,   0.3027,   0.2573],
        [  0.5151,   0.6055, ...,   0.7765,  0.07097],
        [  0.5552,   0.7423, ...,   0.6547,  0.05357]],

       [[  0.7435,   0.5264, ...,   0.3829,   0.7836],
        [  0.3068,  0.06962, ...,   0.7794,   0.3481],
        [  0.6328,   0.9421, ...,  0.04902,   0.2135],
        [  0.5029,  0.06343, ...,   0.8926,   0.5402]]])
[17]:
d3D = NDDataset(a)
d3D.title = "energy"
d3D.author = "Someone"
d3D.name = "3D dataset creation"
d3D.history = "created from scratch"
d3D.description = "Some example"
d3D.dims = ["u", "v", "t"]
d3D
[17]:
name 3D dataset creation
author Someone
created 2024-04-28 03:10:32+02:00
description
Some example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (u:2, v:4, t:6)

We can also add all information in a single statement

[18]:
d3D = NDDataset(
    a,
    dims=["u", "v", "t"],
    title="Energy",
    author="Someone",
    name="3D_dataset",
    history="created from scratch",
    description="a single statement creation example",
)
d3D
[18]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (u:2, v:4, t:6)

Three names are attributed at the creation (if they are not provided with the dims attribute, then the name are: ‘z’,’y’,’x’ automatically attributed)

[19]:
d3D.dims
[19]:
['u', 'v', 't']
[20]:
d3D.ndim
[20]:
3
[21]:
d3D.shape
[21]:
(2, 4, 6)

About the dates and times

The dates and times are stored internally as UTC (Coordinated_Universal_Time). Timezone information is stored in the timezone attribute. If not set, the default is to use the local timezone, which is probably the most common case.

[22]:
nd = NDDataset()
nd.created
[22]:
'2024-04-28 03:10:32+02:00'

In this case our local timezone has been used by default for the conversion from UTC datetime.

[23]:
nd.local_timezone
[23]:
'Europe/Paris'
[24]:
nd.timezone = "EST"
nd.created
[24]:
'2024-04-27 20:10:32-05:00'

For a list of timezone code (TZ) you can have a look at List_of_tz_database_time_zones.

About the history attribute

The history is saved internally into a list, but its has a different behaviour than the usual list. The first time a NDDataset is created, the list is empty

[25]:
nd = NDDataset()
nd.history
[25]:
[]

Assigning a string to the history attribute has two effects. The first one is that the string is appended automatically to the previous history list, and second it is preceeded by the time it has been added.

[26]:
nd.history = "some history"
nd.history = "another history to append"
nd.history = "..."
nd.history
[26]:
['2024-04-28 03:10:32+02:00> Some history',
 '2024-04-28 03:10:32+02:00> Another history to append',
 '2024-04-28 03:10:32+02:00> ...']

If you want to erase the history, assign an empty list

[27]:
nd.history = []
nd.history
[27]:
[]

If you want to replace the full history use bracket around your history line:

[28]:
nd.history = "Created form scratch"
nd.history = "a second ligne that will be erased"
nd.history = ["A more interesting message"]
nd.history
[28]:
['2024-04-28 03:10:32+02:00> A more interesting message']

Units

One interesting possibility for a NDDataset is to have defined units for the internal data.

[29]:
d1D.units = ur.eV  # ur is a registry containing all available units
[30]:
d1D  # note the eV symbol of the units added to the values field below
[30]:
name mydataset
author Blake and Mortimer
created 2024-04-28 03:10:31+02:00
description
Some experimental measurements
history
2024-04-28 03:10:31+02:00> Creation
2024-04-28 03:10:32+02:00> Inplace binary op: iadd with `Dataset N1`
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title intensity
values
[ 15 30 45] eV
size 3

This allows to make units-aware calculations:

[31]:
d1D**2  # note the results in eV^2
[31]:
name mydataset
author Blake and Mortimer
created 2024-04-28 03:10:31+02:00
description
Some experimental measurements
history
2024-04-28 03:10:31+02:00> Creation
2024-04-28 03:10:32+02:00> Inplace binary op: iadd with `Dataset N1`
2024-04-28 03:10:32+02:00> Created from scratch
2024-04-28 03:10:32+02:00> Binary operation pow with `2` has been performed
DATA
title intensity
values
[ 225 900 2025] eV²
size 3
[32]:
np.sqrt(d1D)  # note the result in e^0.5
[32]:
name mydataset
author Blake and Mortimer
created 2024-04-28 03:10:31+02:00
description
Some experimental measurements
history
2024-04-28 03:10:31+02:00> Creation
2024-04-28 03:10:32+02:00> Inplace binary op: iadd with `Dataset N1`
2024-04-28 03:10:32+02:00> Created from scratch
2024-04-28 03:10:32+02:00> Ufunc sqrt applied.
DATA
title sqrt(intensity)
values
[ 3.873 5.477 6.708] eV⁰⋅⁵
size 3
[33]:
time = 5.0 * ur.second
d1D / time  # here we get results in eV/s
[33]:
name mydataset
author Blake and Mortimer
created 2024-04-28 03:10:31+02:00
description
Some experimental measurements
history
2024-04-28 03:10:31+02:00> Creation
2024-04-28 03:10:32+02:00> Inplace binary op: iadd with `Dataset N1`
2024-04-28 03:10:32+02:00> Created from scratch
2024-04-28 03:10:32+02:00> Binary operation truediv with `5.0 s` has been performed
DATA
title intensity
values
[ 3 6 9] eV.s⁻¹
size 3

Conversion can be done between different units transparently

[34]:
d1D.to("J")
[34]:
name mydataset
author Blake and Mortimer
created 2024-04-28 03:10:31+02:00
description
Some experimental measurements
history
2024-04-28 03:10:31+02:00> Creation
2024-04-28 03:10:32+02:00> Inplace binary op: iadd with `Dataset N1`
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title intensity
values
[2.403e-18 4.807e-18 7.21e-18] J
size 3
[35]:
d1D.to("K")
[35]:
name mydataset
author Blake and Mortimer
created 2024-04-28 03:10:31+02:00
description
Some experimental measurements
history
2024-04-28 03:10:31+02:00> Creation
2024-04-28 03:10:32+02:00> Inplace binary op: iadd with `Dataset N1`
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title intensity
values
[1.741e+05 3.481e+05 5.222e+05] K
size 3

Coordinates

The above created d3D dataset has 3 dimensions, but no coordinate for these dimensions. Here arises a big difference with simple numpy-arrays: * We can add coordinates to each dimension of a NDDataset.

To get the list of all defined coordinates, we can use the coords attribute:

[36]:
d3D.coordset  # no coordinates, so it returns nothing (None)
[37]:
d3D.t  # the same for coordinate  t, v, u which are not yet set

To add coordinates, on way is to set them one by one:

[38]:
d3D.t = (
    Coord.arange(6) * 0.1
)  # we need a sequence of 6 values for `t` dimension (see shape above)
d3D.t.title = "time"
d3D.t.units = ur.seconds
d3D.coordset  # now return a list of coordinates
[38]:
DIMENSION `t`
size 6
title time
coordinates
[ 0 0.1 0.2 0.3 0.4 0.5] s
[39]:
d3D.t
[39]:
size 6
title time
coordinates
[ 0 0.1 0.2 0.3 0.4 0.5] s
[40]:
d3D.coordset("t")  # Alternative way to get a given coordinates
[40]:
size 6
title time
coordinates
[ 0 0.1 0.2 0.3 0.4 0.5] s
[41]:
d3D["t"]  # another alternative way to get a given coordinates
[41]:
size 6
title time
coordinates
[ 0 0.1 0.2 0.3 0.4 0.5] s

The two other coordinates u and v are still undefined

[42]:
d3D.u, d3D.v
[42]:
(Coord: empty, Coord: empty)

When the dataset is printed, only the information for the existing coordinates is given.

[43]:
d3D
[43]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (u:2, v:4, t:6)
DIMENSION `t`
size 6
title time
coordinates
[ 0 0.1 0.2 0.3 0.4 0.5] s

Programmatically, we can use the attribute is_empty or has_data to check this

[44]:
d3D.v.has_data, d3D.v.is_empty
[44]:
(False, True)

An error is raised when a coordinate doesn’t exist

[45]:
try:
    d3D.x
except KeyError as e:
    scp.error_(KeyError, e)
 ERROR | KeyError: 'Could not find `x` in coordinates names or titles'

In some case it can also be useful to get a coordinate from its title instead of its name (the limitation is that if several coordinates have the same title, then only the first ones that is found in the coordinate list, will be returned - this can be ambiguous)

[46]:
d3D["time"]
[46]:
size 6
title time
coordinates
[ 0 0.1 0.2 0.3 0.4 0.5] s
[47]:
d3D.time
[47]:
size 6
title time
coordinates
[ 0 0.1 0.2 0.3 0.4 0.5] s

Labels

It is possible to use labels instead of numerical coordinates. They are sequence of objects .The length of the sequence must be equal to the size of a dimension.

The labels can be simple strings, e.g.,

[48]:
tags = list("ab")
d3D.u.title = "some tags"
d3D.u.labels = tags  # TODO: avoid repetition
d3D
[48]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (u:2, v:4, t:6)
DIMENSION `t`
size 6
title time
coordinates
[ 0 0.1 0.2 0.3 0.4 0.5] s
DIMENSION `u`
size 2
title some tags
labels
[ a b]

or more complex objects.

For instance here we use datetime.timedelta objects:

[49]:
from datetime import timedelta

start = timedelta(0)
times = [start + timedelta(seconds=x * 60) for x in range(6)]
d3D.t = None
d3D.t.labels = times
d3D.t.title = "time"
d3D
[49]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (u:2, v:4, t:6)
DIMENSION `t`
size 6
title time
labels
[ 0:00:00 0:01:00 0:02:00 0:03:00 0:04:00 0:05:00]
DIMENSION `u`
size 2
title some tags
labels
[ a b]

In this case, getting a coordinate that doesn’t possess numerical data but labels, will return the labels

[50]:
d3D.time
[50]:
size 6
title time
labels
[ 0:00:00 0:01:00 0:02:00 0:03:00 0:04:00 0:05:00]

More insight on coordinates

Sharing coordinates between dimensions

Sometimes it is not necessary to have different coordinates for each axe. Some can be shared between axes.

For example, if we have a square matrix with the same coordinate in the two dimensions, the second dimension can refer to the first. Here we create a square 2D dataset, using the diag method:

[51]:
nd = NDDataset.diag((3, 3, 2.5))
nd
[51]:
name NDDataset_1c5749ba
author runner@fv-az1501-19
created 2024-04-28 03:10:32+02:00
history
2024-04-28 03:10:32+02:00> Created using method : diag
DATA
title
values
[[ 3 0 0]
[ 0 3 0]
[ 0 0 2.5]]
shape (y:3, x:3)

and then we add the same coordinate for both dimensions

[52]:
coordx = Coord.arange(3)
nd.set_coordset(x=coordx, y="x")
nd
[52]:
name NDDataset_1c5749ba
author runner@fv-az1501-19
created 2024-04-28 03:10:32+02:00
history
2024-04-28 03:10:32+02:00> Created using method : diag
DATA
title
values
[[ 3 0 0]
[ 0 3 0]
[ 0 0 2.5]]
shape (y:3, x:3)
DIMENSION `x`=`y`
size 3
title
coordinates
[ 0 1 2]

Setting coordinates using set_coordset

Let’s create 3 Coord objects to be used as coordinates for the 3 dimensions of the previous d3D dataset.

[53]:
d3D.dims = ["t", "v", "u"]
s0, s1, s2 = d3D.shape
coord0 = Coord.linspace(10.0, 100.0, s0, units="m", title="distance")
coord1 = Coord.linspace(20.0, 25.0, s1, units="K", title="temperature")
coord2 = Coord.linspace(0.0, 1000.0, s2, units="hour", title="elapsed time")

Syntax 1

[54]:
d3D.set_coordset(u=coord2, v=coord1, t=coord0)
d3D
[54]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (t:2, v:4, u:6)
DIMENSION `t`
size 2
title distance
coordinates
[ 10 100] m
DIMENSION `u`
size 6
title elapsed time
coordinates
[ 0 200 400 600 800 1000] h
DIMENSION `v`
size 4
title temperature
coordinates
[ 20 21.67 23.33 25] K

Syntax 2

[55]:
d3D.set_coordset({"u": coord2, "v": coord1, "t": coord0})
d3D
[55]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (t:2, v:4, u:6)
DIMENSION `t`
size 2
title distance
coordinates
[ 10 100] m
DIMENSION `u`
size 6
title elapsed time
coordinates
[ 0 200 400 600 800 1000] h
DIMENSION `v`
size 4
title temperature
coordinates
[ 20 21.67 23.33 25] K

Adding several coordinates to a single dimension

We can add several coordinates to the same dimension

[56]:
coord1b = Coord([1, 2, 3, 4], units="millitesla", title="magnetic field")
[57]:
d3D.set_coordset(u=coord2, v=[coord1, coord1b], t=coord0)
d3D
[57]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (t:2, v:4, u:6)
DIMENSION `t`
size 2
title distance
coordinates
[ 10 100] m
DIMENSION `u`
size 6
title elapsed time
coordinates
[ 0 200 400 600 800 1000] h
DIMENSION `v`
size 4
(_1)
title magnetic field
coordinates
[ 1 2 3 4] mT
(_2)
title temperature
coordinates
[ 20 21.67 23.33 25] K

We can retrieve the various coordinates for a single dimension easily:

[58]:
d3D.v_1
[58]:
size 4
title magnetic field
coordinates
[ 1 2 3 4] mT

Summary of the coordinate setting syntax

Some additional information about coordinate setting syntax

A. First syntax (probably the safer because the name of the dimension is specified, so this is less prone to errors!)

[59]:
d3D.set_coordset(u=coord2, v=[coord1, coord1b], t=coord0)
# or equivalent
d3D.set_coordset(u=coord2, v=CoordSet(coord1, coord1b), t=coord0)
d3D
[59]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (t:2, v:4, u:6)
DIMENSION `t`
size 2
title distance
coordinates
[ 10 100] m
DIMENSION `u`
size 6
title elapsed time
coordinates
[ 0 200 400 600 800 1000] h
DIMENSION `v`
size 4
(_1)
title magnetic field
coordinates
[ 1 2 3 4] mT
(_2)
title temperature
coordinates
[ 20 21.67 23.33 25] K

B. Second syntax assuming the coordinates are given in the order of the dimensions.

Remember that we can check this order using the dims attribute of a NDDataset

[60]:
d3D.dims
[60]:
['t', 'v', 'u']
[61]:
d3D.set_coordset((coord0, [coord1, coord1b], coord2))
# or equivalent
d3D.set_coordset(coord0, CoordSet(coord1, coord1b), coord2)
d3D
[61]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (t:2, v:4, u:6)
DIMENSION `t`
size 2
title distance
coordinates
[ 10 100] m
DIMENSION `u`
size 6
title elapsed time
coordinates
[ 0 200 400 600 800 1000] h
DIMENSION `v`
size 4
(_1)
title magnetic field
coordinates
[ 1 2 3 4] mT
(_2)
title temperature
coordinates
[ 20 21.67 23.33 25] K

C. Third syntax (from a dictionary)

[62]:
d3D.set_coordset({"t": coord0, "u": coord2, "v": [coord1, coord1b]})
d3D
[62]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (t:2, v:4, u:6)
DIMENSION `t`
size 2
title distance
coordinates
[ 10 100] m
DIMENSION `u`
size 6
title elapsed time
coordinates
[ 0 200 400 600 800 1000] h
DIMENSION `v`
size 4
(_1)
title magnetic field
coordinates
[ 1 2 3 4] mT
(_2)
title temperature
coordinates
[ 20 21.67 23.33 25] K

D. It is also possible to use directly the CoordSet property

[63]:
d3D.coordset = coord0, [coord1, coord1b], coord2
d3D
[63]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (t:2, v:4, u:6)
DIMENSION `t`
size 2
title distance
coordinates
[ 10 100] m
DIMENSION `u`
size 6
title elapsed time
coordinates
[ 0 200 400 600 800 1000] h
DIMENSION `v`
size 4
(_1)
title temperature
coordinates
[ 20 21.67 23.33 25] K
(_2)
title magnetic field
coordinates
[ 1 2 3 4] mT
[64]:
d3D.coordset = {"t": coord0, "u": coord2, "v": [coord1, coord1b]}
d3D
[64]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (t:2, v:4, u:6)
DIMENSION `t`
size 2
title distance
coordinates
[ 10 100] m
DIMENSION `u`
size 6
title elapsed time
coordinates
[ 0 200 400 600 800 1000] h
DIMENSION `v`
size 4
(_1)
title magnetic field
coordinates
[ 1 2 3 4] mT
(_2)
title temperature
coordinates
[ 20 21.67 23.33 25] K
[65]:
d3D.coordset = CoordSet(t=coord0, u=coord2, v=[coord1, coord1b])
d3D
[65]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (t:2, v:4, u:6)
DIMENSION `t`
size 2
title distance
coordinates
[ 10 100] m
DIMENSION `u`
size 6
title elapsed time
coordinates
[ 0 200 400 600 800 1000] h
DIMENSION `v`
size 4
(_1)
title magnetic field
coordinates
[ 1 2 3 4] mT
(_2)
title temperature
coordinates
[ 20 21.67 23.33 25] K

WARNING

Do not use list for setting multiples coordinates! use tuples

This raise an error (list have another signification: it’s used to set a “same dim” CoordSet see example A or B)

[66]:
try:
    d3D.coordset = [coord0, coord1, coord2]
except ValueError:
    scp.error_(
        ValueError,
        "Coordinates must be of the same size for a dimension with multiple coordinates",
    )
 ERROR | ValueError: Coordinates must be of the same size for a dimension with multiple coordinates

This works : it uses a tuple () , not a list []

[67]:
d3D.coordset = (
    coord0,
    coord1,
    coord2,
)  # equivalent to d3D.coordset = coord0, coord1, coord2
d3D
[67]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (t:2, v:4, u:6)
DIMENSION `t`
size 2
title distance
coordinates
[ 10 100] m
DIMENSION `u`
size 6
title elapsed time
coordinates
[ 0 200 400 600 800 1000] h
DIMENSION `v`
size 4
title temperature
coordinates
[ 20 21.67 23.33 25] K

E. Setting the coordinates individually

Either a single coordinate

[68]:
d3D.u = coord2
d3D
[68]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (t:2, v:4, u:6)
DIMENSION `t`
size 2
title distance
coordinates
[ 10 100] m
DIMENSION `u`
size 6
title elapsed time
coordinates
[ 0 200 400 600 800 1000] h
DIMENSION `v`
size 4
title temperature
coordinates
[ 20 21.67 23.33 25] K

or multiple coordinates for a single dimension

[69]:
d3D.v = [coord1, coord1b]
d3D
[69]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (t:2, v:4, u:6)
DIMENSION `t`
size 2
title distance
coordinates
[ 10 100] m
DIMENSION `u`
size 6
title elapsed time
coordinates
[ 0 200 400 600 800 1000] h
DIMENSION `v`
size 4
(_1)
title temperature
coordinates
[ 20 21.67 23.33 25] K
(_2)
title magnetic field
coordinates
[ 1 2 3 4] mT

or using a CoordSet object.

[70]:
d3D.v = CoordSet(coord1, coord1b)
d3D
[70]:
name 3D_dataset
author Someone
created 2024-04-28 03:10:32+02:00
description
a single statement creation example
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]]
shape (t:2, v:4, u:6)
DIMENSION `t`
size 2
title distance
coordinates
[ 10 100] m
DIMENSION `u`
size 6
title elapsed time
coordinates
[ 0 200 400 600 800 1000] h
DIMENSION `v`
size 4
(_1)
title magnetic field
coordinates
[ 1 2 3 4] mT
(_2)
title temperature
coordinates
[ 20 21.67 23.33 25] K

Methods to create NDDataset

There are many ways to create NDDataset objects.

Let’s first create 2 coordinate objects, for which we can define labels and units! Note the use of the function linspaceto generate the data.

[71]:
c0 = Coord.linspace(
    start=4000.0, stop=1000.0, num=5, labels=None, units="cm^-1", title="wavenumber"
)
[72]:
c1 = Coord.linspace(
    10.0, 40.0, 3, labels=["Cold", "RT", "Hot"], units="K", title="temperature"
)

The full coordset will be the following

[73]:
cs = CoordSet(c0, c1)
cs
[73]:
DIMENSION `x`
size 3
title temperature
coordinates
[ 10 25 40] K
labels
[ Cold RT Hot]
DIMENSION `y`
size 5
title wavenumber
coordinates
[ 4000 3250 2500 1750 1000] cm⁻¹

Now we will generate the full dataset, using a fromfunction method. All needed information are passed as parameter of the NDDataset instance constructor.

Create a dataset from a function

[74]:
def func(x, y, extra):
    return x * y / extra
[75]:
ds = NDDataset.fromfunction(
    func,
    extra=100 * ur.cm**-1,  # extra arguments passed to the function
    coordset=cs,
    name="mydataset",
    title="absorbance",
    units=None,
)  # when None, units will be determined from the function results

ds.description = """Dataset example created for this tutorial.
It's a 2-D dataset"""

ds.author = "Blake & Mortimer"
ds
[75]:
name mydataset
author Blake & Mortimer
created 2024-04-28 03:10:32+02:00
description
Dataset example created for this tutorial.
It's a 2-D dataset
history
2024-04-28 03:10:32+02:00> Created using method : fromfunction
DATA
title absorbance
values
[[ 400 1000 1600]
[ 325 812.5 1300]
...
[ 175 437.5 700]
[ 100 250 400]] K
shape (y:5, x:3)
DIMENSION `x`
size 3
title temperature
coordinates
[ 10 25 40] K
labels
[ Cold RT Hot]
DIMENSION `y`
size 5
title wavenumber
coordinates
[ 4000 3250 2500 1750 1000] cm⁻¹

Using numpy-like constructors of NDDatasets

[76]:
dz = NDDataset.zeros(
    (5, 3), coordset=cs, units="meters", title="Datasets with only zeros"
)
[77]:
do = NDDataset.ones(
    (5, 3), coordset=cs, units="kilograms", title="Datasets with only ones"
)
[78]:
df = NDDataset.full(
    (5, 3), fill_value=1.25, coordset=cs, units="radians", title="with only float=1.25"
)
df
[78]:
name NDDataset_1c888368
author runner@fv-az1501-19
created 2024-04-28 03:10:32+02:00
history
2024-04-28 03:10:32+02:00> Created using method : full
DATA
title with only float=1.25
values
[[ 1.25 1.25 1.25]
[ 1.25 1.25 1.25]
...
[ 1.25 1.25 1.25]
[ 1.25 1.25 1.25]] rad
shape (y:5, x:3)
DIMENSION `x`
size 3
title temperature
coordinates
[ 10 25 40] K
labels
[ Cold RT Hot]
DIMENSION `y`
size 5
title wavenumber
coordinates
[ 4000 3250 2500 1750 1000] cm⁻¹

As with numpy, it is also possible to take another dataset as a template:

[79]:
df = NDDataset.full_like(d3D, dtype="int", fill_value=2)
df
[79]:
name 3D_dataset
author runner@fv-az1501-19
created 2024-04-28 03:10:32+02:00
history
2024-04-28 03:10:32+02:00> Created from scratch
2024-04-28 03:10:32+02:00> Created using method : full_like
DATA
title Energy
values
[[[ 2 2 ... 2 2]
[ 2 2 ... 2 2]
[ 2 2 ... 2 2]
[ 2 2 ... 2 2]]

[[ 2 2 ... 2 2]
[ 2 2 ... 2 2]
[ 2 2 ... 2 2]
[ 2 2 ... 2 2]]]
shape (t:2, v:4, u:6)
DIMENSION `t`
size 2
title distance
coordinates
[ 10 100] m
DIMENSION `u`
size 6
title elapsed time
coordinates
[ 0 200 400 600 800 1000] h
DIMENSION `v`
size 4
(_1)
title magnetic field
coordinates
[ 1 2 3 4] mT
(_2)
title temperature
coordinates
[ 20 21.67 23.33 25] K
[80]:
nd = NDDataset.diag((3, 3, 2.5))
nd
[80]:
name NDDataset_1c8d7ef4
author runner@fv-az1501-19
created 2024-04-28 03:10:32+02:00
history
2024-04-28 03:10:32+02:00> Created using method : diag
DATA
title
values
[[ 3 0 0]
[ 0 3 0]
[ 0 0 2.5]]
shape (y:3, x:3)

Copying existing NDDataset

To copy an existing dataset, this is as simple as:

[81]:
d3D_copy = d3D.copy()

or alternatively:

[82]:
d3D_copy = d3D[:]

Finally, it is also possible to initialize a dataset using an existing one:

[83]:
d3Dduplicate = NDDataset(d3D, name="duplicate of %s" % d3D.name, units="absorbance")
d3Dduplicate
[83]:
name duplicate of 3D_dataset
author runner@fv-az1501-19
created 2024-04-28 03:10:32+02:00
history
2024-04-28 03:10:32+02:00> Created from scratch
DATA
title Energy
values
[[[ 0.6994 0.5258 ... 0.4873 0.9664]
[ 0.7777 0.07211 ... 0.3027 0.2573]
[ 0.5151 0.6055 ... 0.7765 0.07097]
[ 0.5552 0.7423 ... 0.6547 0.05357]]

[[ 0.7435 0.5264 ... 0.3829 0.7836]
[ 0.3068 0.06962 ... 0.7794 0.3481]
[ 0.6328 0.9421 ... 0.04902 0.2135]
[ 0.5029 0.06343 ... 0.8926 0.5402]]] a.u.
shape (t:2, v:4, u:6)
DIMENSION `t`
size 2
title distance
coordinates
[ 10 100] m
DIMENSION `u`
size 6
title elapsed time
coordinates
[ 0 200 400 600 800 1000] h
DIMENSION `v`
size 4
(_1)
title magnetic field
coordinates
[ 1 2 3 4] mT
(_2)
title temperature
coordinates
[ 20 21.67 23.33 25] K

Importing from external dataset

NDDataset can be created from the importation of external data

A test’s data folder contains some data for experimenting some features of datasets.

[84]:
# let check if this directory exists and display its actual content:
datadir = scp.preferences.datadir
if datadir.exists():
    print(datadir.name)
testdata

Let’s load grouped IR spectra acquired using OMNIC:

[85]:
nd = NDDataset.read_omnic(datadir / "irdata/nh4y-activation.spg")
nd.preferences.reset()
_ = nd.plot()
../../../_images/userguide_objects_dataset_dataset_162_0.png

Even if we do not specify the datadir, the application first look in the directory by default.

Now, lets load a NMR dataset (in the Bruker format).

[86]:
path = datadir / "nmrdata" / "bruker" / "tests" / "nmr" / "topspin_2d"

# load the data directly (no need to create the dataset first)
nd2 = NDDataset.read_topspin(path, expno=1, remove_digital_filter=True)

# view it...
nd2.x.to("s")
nd2.y.to("ms")

ax = nd2.plot(method="map")
 WARNING | (UserWarning) (196608,)cannot be shaped into(147, 1024)
../../../_images/userguide_objects_dataset_dataset_165_1.png