IRATE Format Specification

This document describes the specific required structure of the IRATE format for format version 0.

All IRATE files are HDF5 files, and hence usually have either an .h5 or .hdf5 extension.

IRATE File Format

The main data file for an IRATE format is referred to as simply an “IRATE file”. These files may store any number of the actual outputs of a simulation (typically meaning multiple snapshots), associated halo and/or galaxy catalogs and merger trees, and any other data that might be associated with such a simulation (e.g. black hole catalogs).

Note

An IRATE file is intended to hold at most one simulation. Multiple simulations should be stored as multiple separate IRATE files. A single simulation can, however, be spread over multiple files - see the irate.core.scatter_files and irate.core.gather_files functions for examples. In that case, the root file that contains the main heirarchy is the “IRATE file” and the others are ancillary files.

To conform to the IRATE standard, such a file must satisfy the following conditions:

  • The root of the file must have an integer attribute named ‘IRATEVersion’ that specifies the version of the IRATE format that the file obeys. The format version for this documentation is 0. The format version for the currently-installed IRATE tools can always be accessed as an integer via irate.formatversion.

  • At the root of the file, there must be a Group ‘Cosmology’. This Group must have the following HDF5 attributes to specify the cosmology that defines the data:

    • ‘HubbleParam’
    • ‘OmegaMatter’
    • ‘OmegaLambda’
    • ‘OmegaBaryon’
    • ‘PowerSpectrumIndex’
    • ‘sigma_8’

    Furthermore, if the cosmology used has an accepted name (e.g. WMAP-7), it is strongly recommended that the Group have an additional attribute, ‘Name’, for human readability; such an attribute, however, is not required.

    Some cosmologies may include additional parameters, in which case such parameters can be included as attributes of the ‘Cosmology’ group, or as datasets if such information can only be stored in array form. The naming conventions used above are recommended for custom parameters.

  • The root of the file must also contain a Group named ‘SimulationProperties’. Various properties of the simulation, such as the box size and assorted flags, should be provided in this Group. If it’s possible, they should be given as attributes; however, it is accepted in the format that this group contain datasets as well.

  • Also at the root of the file, there may be any number of Groups with names of the form ‘Snapshot#####’, where the # is typically a number identifying the output in the context of the simulation, padded to be five digits long (e.g. Snapshot 35 would be saved under /Snapshot00035). Each Snapshot Group should have either an attribute named ‘Redshift’ or an attribute named ‘ScaleFactor’ (or both), but if there’s neither particle nor grid data contained within the the snapshot, it’s not required. It must contain only other Groups, which may be ‘ParticleData’ or ‘GridData’ (whose individual requirements are discussed in Particle Data and Grid Data, repectively), along with any number of halo or galaxy catalogs (described below in Halo Catalogs and Galaxy Catalogs).

    Todo

    Developers, Should redshift be required? It’s not provided by halo catalogs usually, so we’d be requiring users to manually type it in.)

    Todo

    Developers, Is requiring that the simulation groups be called “Snapshot#” too restrictive? Should some other naming convention be required, instead? Or just say any groups not explicitly called for here will be treated as snapshots regardless of their names (that’s in conflict with the second bullet point below)?

  • The root of the file may (but is not required to) contain a ‘MergerTrees’ Group, which holds information about the merger trees in the simulation. If present, this group must obey the format specified in Merger Trees.

  • The root of the file may also contain any other Groups that are desired, but their form is not specified in the format. Additionally, it is strongly recommended that they follow the same conventions with regards to units and naming structure that are laid out elsewhere in this documentation.

    Todo

    Developers, do we want to allow this, or should there be nothing else allowed at the root level?

  • There must not be spaces in any group names so as not to confuse some HDF5 tools that don’t play well with spaces.

Note

All group and attribute names are case-sensitive.

Unit Information

For all datasets that have units associated with them, those units should be stored either in the individual datasets as attributes, or as attributes of the Group that contains the datasets. In either case, it should be presented in both human readable and in the form of a conversion factor to CGS units. If a dataset does not have units, it will be assumed to be dimensionless.

Todo

Developers, how do you like this method of including units sound? Its based on Andrew’s and the yt/GDF format scheme...

If the units are attached directly to the Dataset that they relate to, they must be named ‘unitname’ and ‘unitcgs’; if they are instead attached to a Group above them, the names should be prepended with the exact name of the Dataset that they relate to; e.g. the units for the Dataset ‘R200b’ would be named ‘R200bunitname’ and ‘R200bunitcgs’, if they are attributes to the group that contains that Dataset.

The ‘unitname’ attribute should be a string defining the unit, e.g. ‘kpc/h’. The unitcgs attribute must be a three element array, where the stored values are, in order, the numerical conversion factor to CGS, the value of the exponent on the Hubble Parameter that the conversion factor should be multiplied by, and lastly the value of the exponent on the scale factor that the conversion factor should be multipled by.

For example, if ‘unitname’ is ‘comoving Mpc/h’, ‘unitcgs’ should be an array containing [3.0857e24, -1, 1].

Note that the core library provides utilities for accessing units - see irate.core.get_units(), irate.core.set_units(), and irate.core.get_cgs_factor().

Other Metadata

Other metadata associated with individual datasets should be included in the same fashion as units. That is, they should either be attributes directly attached to the dataset with the metadata field name, or they can be attributes of groups further up the hierarchy, following the simple naming convention datasetnamemetadataname. The core library provides utilities for accessing or setting metadata in irate.core.get_metadata() and irate.core.set_metadata().

Particle Data

The ParticleData Group, if it exists, must contain at least one group, of which the most common are ‘Dark’, ‘Gas’, and ‘Star’; these contain the data for dark matter, stars, and gas, respectively. Users are free to use other names for particle blocks, e.g. if the users want to separate high resolution from low resolution particles, but any Group containing dark matter particles must have a (case-sensitive) name that begins with ‘Dark’ (e.g. ‘Dark_HighRes’), any Group containing gas particles must have a name that begins with ‘Gas’, and any Group containing star particles must have a name beginning with ‘Star’. Users are free to store other particle types in IRATE files; it is strongly recommended that they follow the same convention laid out here (e.g. ‘BlackHole’). Tools that read in IRATE files, such as halo finders, will assume the type of particle based on the group name.

Any groups within /Snapshot#/ParticleData/ may contain only data sets. For particle data, the following Dataset objects must be present in each group that exists, even if they have 0 particles:

  • ‘Position’ (N x d)
  • ‘Velocity’ (N x d)
  • ‘Mass’ (N)
  • ‘ID’ (N)

where d is the dimensionality (presumably pretty much always 3) and N is the total number of particles. Additional data sets (e.g. ‘Metallicity’,’Entropy’, ‘Density’, etc.) may be present, but the above 4 are the minimum required. Any other data sets are encouraged to either be shape N for scalar data, or N x d for vector data.

Grid Data

The grid data specification has not yet been defined.

Halo Catalogs

Halo catalogs are stored as a Group that must have names that begin with the phrase ‘HaloCatalog’, For example, both ‘HaloCatalog_AHF1’ and ‘HaloCatalog_Rockstar’ are valid names; ‘AHFCatalog’ and ‘Catalog_Rockstar’, however, are not.

Todo

Developers, does this sound ok?

Any halo catalogs that are contained within a Snapshot Group should have, as attributes, any parameters that are relevant to the halo finder, such as FOF linking lengths, overdensity criterion, or the code used to produce that catalog (though the former may be obvious from the name of the group).

Any halo catalogs must contain a Dataset with the Name ‘Center’ that has shape N x d, where N is the number of halos in thecatalog, and d is the dimensionality (typically 3). All other datasets in the catalog should have a matching first dimension, and should be in the same order. That is, the ith entry in ‘Center’ should correspond to the same halo as the ith entry in any of the other datasets.

If particle data is included with the halo catalog, it must be saved in a Group inside the halo catalog with the name ‘HaloParticleData’. This group must contain at least two datasets. The first of these should be named ‘HaloParticleIDs’, while the second should be named ‘ParticlePerHalo’.

‘HaloParticleIDs’ should contain integer particle IDs in order such that all particles in the first halo come first, followed by those in the second halo, and so on. Here, halo order is the same as the order of the halos in the ‘Center’ dataset. Note that the number of elements of this dataset is not neccesarily the same as the number of total particles, because some particles may be members of multiple halos, in which case they appear on ‘HaloParticleIDs’ more than once.

The ‘ParticlePerHalo’ Dataset, on the other hand, must be of a length matching the first dimension of of the ‘Center’ dataset, and should give the (integer) number of particles in each halo. The sum of all of the values in this dataset must match the size of the ‘HaloParticleIDs’ dataset. This allows ‘HaloParticleIDs’ and ‘ParticlesPerHalo’ to provide all the information needed determine which particles are in which halos.

Many users will find it convenient to store the type of particle as well. This should be saved in a third Dataset named ‘HaloParticleTypes’, but this dataset is not required by the format. If it is present, it should be of the same size as ‘HaloParticleIDs’.

Galaxy Catalogs

The specifications for galaxy catalgos have not yet been defined. But they should follow conventions as closely matched to the halo catalogs as possible.

Merger Trees

Merger tree specifications have not yet been defined.

Examples

Here we provide the structure of a sample IRATE Format file in the form output by the h5dump utility (included in libhdf5 library). Note that the ‘Halo’, ‘Bulge’, and ‘Disk’ groups are not actually a part of the specification, but are examples of possible ways one might wish to sub-divide the particle data. Also note that a typical IRATE file will contain many more datasets, particularly in the catalogs, which have been removed from here for the sake of brevity:

HDF5 "SampleIRATEfile.hdf5" {
FILE_CONTENTS {
 group      /                       (Contains attribute defining the version of the IRATE format that this file conforms to)
 group      /Cosmology              (Contains attributes defining the cosmology of the simulation)
 group      /SimulationProperties   (Contains attributes defining non-cosmological properties of the simulation)
 group      /Snapshot00144          (Contains attributes defining redshift, scale factor, or both)
 group      /Snapshot00144/HaloCatalog_AHF      (Should contain attributes defining the parameters of the halo finding)
 dataset    /Snapshot00144/HaloCatalog_AHF/Center   (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/Ekin     (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/Epot     (Contains attributes with unit information)
 group      /Snapshot00144/HaloCatalog_AHF/HaloParticleData
 ext link   /Snapshot00144/HaloCatalog_AHF/HaloParticleData/HaloParticleTypes -> SampleIRATEfile-00144particles.hdf5 /HaloParticleTypes
 ext link   /Snapshot00144/HaloCatalog_AHF/HaloParticleData/HaloParticleIDs -> SampleIRATEfile-00144particles.hdf5 /HaloParticleIDs
 ext link   /Snapshot00144/HaloCatalog_AHF/HaloParticleData/ParticlesPerHalo -> SampleIRATEfile-00144particles.hdf5 /ParticlesPerHalo
 dataset    /Snapshot00144/HaloCatalog_AHF/L        (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/Mvir     (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/Phi      (Contains attributes with unit information)
 group      /Snapshot00144/HaloCatalog_AHF/RadialProfiles
 dataset    /Snapshot00144/HaloCatalog_AHF/RadialProfiles/L         (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/RadialProfiles/M_in_r    (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/RadialProfiles/dens      (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/RadialProfiles/npart
 dataset    /Snapshot00144/HaloCatalog_AHF/RadialProfiles/r         (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/RadialProfiles/vcirc     (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/Rmax         (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/Rvir         (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/Velocity     (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/Vmax         (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_AHF/fMhires
 dataset    /Snapshot00144/HaloCatalog_AHF/lambda
 dataset    /Snapshot00144/HaloCatalog_AHF/nbins
 dataset    /Snapshot00144/HaloCatalog_AHF/npart
 group      /Snapshot00144/HaloCatalog_Rockstar     (Should contain attributes defining the parameters of the halo finding)
 dataset    /Snapshot00144/HaloCatalog_Rockstar/Center      (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_Rockstar/M200b       (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_Rockstar/R200b       (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_Rockstar/Rmax        (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_Rockstar/Spin
 dataset    /Snapshot00144/HaloCatalog_Rockstar/Velocity    (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_Rockstar/Vmax        (Contains attributes with unit information)
 dataset    /Snapshot00144/HaloCatalog_Rockstar/npart
 group      /Snapshot00144/ParticleData                     (Contains attributes with unit information for all datasets within it)
 group      /Snapshot00144/ParticleData/Dark_Bulge
 dataset    /Snapshot00144/ParticleData/Dark_Bulge/ID
 dataset    /Snapshot00144/ParticleData/Dark_Bulge/Mass
 dataset    /Snapshot00144/ParticleData/Dark_Bulge/Position
 dataset    /Snapshot00144/ParticleData/Dark_Bulge/Velocity
 group      /Snapshot00144/ParticleData/Dark_Disk
 dataset    /Snapshot00144/ParticleData/Dark_Disk/ID
 dataset    /Snapshot00144/ParticleData/Dark_Disk/Mass
 dataset    /Snapshot00144/ParticleData/Dark_Disk/Position
 dataset    /Snapshot00144/ParticleData/Dark_Disk/Velocity
 group      /Snapshot00144/ParticleData/Dark_Halo
 dataset    /Snapshot00144/ParticleData/Dark_Halo/ID
 dataset    /Snapshot00144/ParticleData/Dark_Halo/Mass
 dataset    /Snapshot00144/ParticleData/Dark_Halo/Position
 dataset    /Snapshot00144/ParticleData/Dark_Halo/Velocity
 group      /Snapshot00153                      (Contains attributes defining redshift, scale factor, or both)
 group      /Snapshot00153/HaloCatalog_AHF      (Should contain attributes defining the parameters of the halo finding)
 dataset    /Snapshot00153/HaloCatalog_AHF/Center       (Contains attributes with unit information)
 group      /Snapshot00153/HaloCatalog_AHF/HaloParticleData
 ext link   /Snapshot00153/HaloCatalog_AHF/HaloParticleData/HaloParticleTypes -> SampleIRATEfile-00153particles.hdf5 /HaloParticleTypes
 ext link   /Snapshot00153/HaloCatalog_AHF/HaloParticleData/HaloParticleIDs -> SampleIRATEfile-00153particles.hdf5 /HaloParticleIDs
 ext link   /Snapshot00153/HaloCatalog_AHF/HaloParticleData/ParticlesPerHalo -> SampleIRATEfile-00153particles.hdf5 /ParticlesPerHalo
 dataset    /Snapshot00153/HaloCatalog_AHF/L            (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_AHF/Mvir         (Contains attributes with unit information)
 group      /Snapshot00153/HaloCatalog_AHF/RadialProfiles
 dataset    /Snapshot00153/HaloCatalog_AHF/RadialProfiles/M_in_r    (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_AHF/RadialProfiles/r         (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_AHF/RadialProfiles/vcirc     (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_AHF/Rmax         (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_AHF/Rvir         (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_AHF/Velocity     (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_AHF/Vmax         (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_AHF/nbins
 dataset    /Snapshot00153/HaloCatalog_AHF/npart
 group      /Snapshot00153/HaloCatalog_Rockstar     (Should contain attributes defining the parameters of the halo finding)
 dataset    /Snapshot00153/HaloCatalog_Rockstar/Center          (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_Rockstar/M200b           (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_Rockstar/Mbound200b      (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_Rockstar/R200b           (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_Rockstar/Rmax            (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_Rockstar/Velocity        (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_Rockstar/Vmax            (Contains attributes with unit information)
 dataset    /Snapshot00153/HaloCatalog_Rockstar/npart           (Contains attributes with unit information)
 group      /Snapshot00153/ParticleData             (Contains attributes with unit information for all datasets within it)
 group      /Snapshot00153/ParticleData/Dark_Bulge
 dataset    /Snapshot00153/ParticleData/Dark_Bulge/ID
 dataset    /Snapshot00153/ParticleData/Dark_Bulge/Mass
 dataset    /Snapshot00153/ParticleData/Dark_Bulge/Position
 dataset    /Snapshot00153/ParticleData/Dark_Bulge/Velocity
 group      /Snapshot00153/ParticleData/Dark_Disk
 dataset    /Snapshot00153/ParticleData/Dark_Disk/ID
 dataset    /Snapshot00153/ParticleData/Dark_Disk/Mass
 dataset    /Snapshot00153/ParticleData/Dark_Disk/Position
 dataset    /Snapshot00153/ParticleData/Dark_Disk/Velocity
 group      /Snapshot00153/ParticleData/Dark_Halo
 dataset    /Snapshot00153/ParticleData/Dark_Halo/ID
 dataset    /Snapshot00153/ParticleData/Dark_Halo/Mass
 dataset    /Snapshot00153/ParticleData/Dark_Halo/Position
 dataset    /Snapshot00153/ParticleData/Dark_Halo/Velocity
 }
}
...