IRATE Format Specification ========================== This document describes the specific required structure of the IRATE format for format version |formatversion|. All IRATE files are `HDF5 `_ files, and hence usually have either an ``.h5`` or ``.hdf5`` extension. IRATE File Format ----------------- The main data file for an IRATE format is referred to as simply an "IRATE file". These files may store any number of the actual outputs of a simulation (typically meaning multiple snapshots), associated halo and/or galaxy catalogs and merger trees, and any other data that might be associated with such a simulation (e.g. black hole catalogs). .. note:: An IRATE file is intended to hold at most *one* simulation. Multiple simulations should be stored as multiple separate IRATE files. A single simulation *can*, however, be spread over multiple files - see the `irate.core.scatter_files` and `irate.core.gather_files` functions for examples. In that case, the root file that contains the main heirarchy is the "IRATE file" and the others are ancillary files. To conform to the IRATE standard, such a file must satisfy the following conditions: * The root of the file must have an integer attribute named 'IRATEVersion' that specifies the version of the IRATE format that the file obeys. The format version for this documentation is |formatversion|. The format version for the currently-installed IRATE tools can always be accessed as an integer via :attr:`irate.formatversion`. * At the root of the file, there must be a :class:`~h5py.Group` 'Cosmology'. This :class:`~h5py.Group` must have the following HDF5 attributes to specify the cosmology that defines the data: * 'HubbleParam' * 'OmegaMatter' * 'OmegaLambda' * 'OmegaBaryon' * 'PowerSpectrumIndex' * 'sigma_8' Furthermore, if the cosmology used has an accepted name (e.g. WMAP-7), it is strongly recommended that the :class:`~h5py.Group` have an additional attribute, 'Name', for human readability; such an attribute, however, is not required. Some cosmologies may include additional parameters, in which case such parameters can be included as attributes of the 'Cosmology' group, or as datasets if such information can only be stored in array form. The naming conventions used above are recommended for custom parameters. * The root of the file must also contain a :class:`~h5py.Group` named 'SimulationProperties'. Various properties of the simulation, such as the box size and assorted flags, should be provided in this :class:`~h5py.Group`. If it's possible, they should be given as attributes; however, it is accepted in the format that this group contain datasets as well. * Also at the root of the file, there may be any number of Groups with names of the form 'Snapshot#####', where the # is typically a number identifying the output in the context of the simulation, padded to be five digits long (e.g. Snapshot 35 would be saved under /Snapshot00035). Each Snapshot :class:`~h5py.Group` should have either an attribute named 'Redshift' or an attribute named 'ScaleFactor' (or both), but if there's neither particle nor grid data contained within the the snapshot, it's not required. It must contain only other Groups, which may be 'ParticleData' or 'GridData' (whose individual requirements are discussed in :ref:`particle-data-descr` and :ref:`grid-data-descr`, repectively), along with any number of halo or galaxy catalogs (described below in :ref:`halo-catalogs-descr` and :ref:`galaxy-catalogs-descr`). .. todo:: Developers, Should redshift be required? It's not provided by halo catalogs usually, so we'd be requiring users to manually type it in.) .. todo:: Developers, Is requiring that the simulation groups be called "Snapshot#" too restrictive? Should some other naming convention be required, instead? Or just say any groups not explicitly called for here will be treated as snapshots regardless of their names (that's in conflict with the second bullet point below)? * The root of the file may (but is not required to) contain a 'MergerTrees' :class:`~h5py.Group`, which holds information about the merger trees in the simulation. If present, this group must obey the format specified in :ref:`merger-trees-descr`. * The root of the file may also contain any other Groups that are desired, but their form is not specified in the format. Additionally, it is strongly recommended that they follow the same conventions with regards to units and naming structure that are laid out elsewhere in this documentation. .. todo:: Developers, do we want to allow this, or should there be nothing else allowed at the root level? * There must not be spaces in any group names so as not to confuse some HDF5 tools that don't play well with spaces. .. note:: All group and attribute names are *case-sensitive*. .. _units-descr: Unit Information ~~~~~~~~~~~~~~~~ For all datasets that have units associated with them, those units should be stored either in the individual datasets as attributes, or as attributes of the :class:`~h5py.Group` that contains the datasets. In either case, it should be presented in both human readable and in the form of a conversion factor to CGS units. If a dataset does not have units, it will be assumed to be dimensionless. .. todo:: Developers, how do you like this method of including units sound? Its based on Andrew's and the yt/GDF format scheme... If the units are attached directly to the :class:`~h5py.Dataset` that they relate to, they must be named 'unitname' and 'unitcgs'; if they are instead attached to a :class:`~h5py.Group` above them, the names should be prepended with the exact name of the :class:`~h5py.Dataset` that they relate to; e.g. the units for the :class:`~h5py.Dataset` 'R200b' would be named 'R200bunitname' and 'R200bunitcgs', if they are attributes to the group that contains that :class:`~h5py.Dataset`. The 'unitname' attribute should be a string defining the unit, e.g. 'kpc/h'. The unitcgs attribute must be a three element array, where the stored values are, in order, the numerical conversion factor to CGS, the value of the exponent on the Hubble Parameter that the conversion factor should be multiplied by, and lastly the value of the exponent on the scale factor that the conversion factor should be multipled by. For example, if 'unitname' is 'comoving Mpc/h', 'unitcgs' should be an array containing [3.0857e24, -1, 1]. Note that the core library provides utilities for accessing units - see :func:`irate.core.get_units`, :func:`irate.core.set_units`, and :func:`irate.core.get_cgs_factor`. .. _metadata-descr: Other Metadata ~~~~~~~~~~~~~~ Other metadata associated with individual datasets should be included in the same fashion as units. That is, they should either be attributes directly attached to the dataset with the metadata field name, or they can be attributes of groups further up the hierarchy, following the simple naming convention `datasetnamemetadataname`. The core library provides utilities for accessing or setting metadata in :func:`irate.core.get_metadata` and :func:`irate.core.set_metadata`. .. _particle-data-descr: Particle Data ~~~~~~~~~~~~~ The ParticleData :class:`~h5py.Group`, if it exists, must contain at least one group, of which the most common are 'Dark', 'Gas', and 'Star'; these contain the data for dark matter, stars, and gas, respectively. Users are free to use other names for particle blocks, e.g. if the users want to separate high resolution from low resolution particles, but any :class:`~h5py.Group` containing dark matter particles must have a (case-sensitive) name that begins with 'Dark' (e.g. 'Dark_HighRes'), any :class:`~h5py.Group` containing gas particles must have a name that begins with 'Gas', and any :class:`~h5py.Group` containing star particles must have a name beginning with 'Star'. Users are free to store other particle types in IRATE files; it is strongly recommended that they follow the same convention laid out here (e.g. 'BlackHole'). Tools that read in IRATE files, such as halo finders, will assume the type of particle based on the group name. Any groups within /Snapshot#/ParticleData/ may contain only data sets. For particle data, the following :class:`~h5py.Dataset` objects must be present in each group that exists, even if they have 0 particles: * 'Position' (`N` x `d`) * 'Velocity' (`N` x `d`) * 'Mass' (`N`) * 'ID' (`N`) where `d` is the dimensionality (presumably pretty much always 3) and `N` is the total number of particles. Additional data sets (e.g. 'Metallicity','Entropy', 'Density', etc.) may be present, but the above 4 are the minimum required. Any other data sets are encouraged to either be shape `N` for scalar data, or `N` x `d` for vector data. .. _grid-data-descr: Grid Data ~~~~~~~~~ The grid data specification has not yet been defined. .. _halo-catalogs-descr: Halo Catalogs ~~~~~~~~~~~~~ Halo catalogs are stored as a :class:`~h5py.Group` that must have names that begin with the phrase 'HaloCatalog', For example, both 'HaloCatalog_AHF1' and 'HaloCatalog_Rockstar' are valid names; 'AHFCatalog' and 'Catalog_Rockstar', however, are not. .. todo:: Developers, does this sound ok? Any halo catalogs that are contained within a Snapshot :class:`~h5py.Group` should have, as attributes, any parameters that are relevant to the halo finder, such as FOF linking lengths, overdensity criterion, or the code used to produce that catalog (though the former may be obvious from the name of the group). Any halo catalogs must contain a :class:`~h5py.Dataset` with the Name 'Center' that has shape N x d, where N is the number of halos in thecatalog, and d is the dimensionality (typically 3). All other datasets in the catalog should have a matching first dimension, and should be in the same order. That is, the ith entry in 'Center' should correspond to the same halo as the ith entry in any of the other datasets. If particle data is included with the halo catalog, it must be saved in a :class:`~h5py.Group` inside the halo catalog with the name 'HaloParticleData'. This group must contain at least two datasets. The first of these should be named 'HaloParticleIDs', while the second should be named 'ParticlePerHalo'. 'HaloParticleIDs' should contain integer particle IDs in order such that all particles in the first halo come first, followed by those in the second halo, and so on. Here, halo order is the same as the order of the halos in the 'Center' dataset. Note that the number of elements of this dataset is not neccesarily the same as the number of total particles, because some particles may be members of multiple halos, in which case they appear on 'HaloParticleIDs' more than once. The 'ParticlePerHalo' :class:`~h5py.Dataset`, on the other hand, must be of a length matching the first dimension of of the 'Center' dataset, and should give the (integer) number of particles in each halo. The sum of all of the values in this dataset must match the size of the 'HaloParticleIDs' dataset. This allows 'HaloParticleIDs' and 'ParticlesPerHalo' to provide all the information needed determine which particles are in which halos. Many users will find it convenient to store the type of particle as well. This should be saved in a third :class:`~h5py.Dataset` named 'HaloParticleTypes', but this dataset is not required by the format. If it is present, it should be of the same size as 'HaloParticleIDs'. .. _galaxy-catalogs-descr: Galaxy Catalogs ~~~~~~~~~~~~~~~ The specifications for galaxy catalgos have not yet been defined. But they should follow conventions as closely matched to the halo catalogs as possible. .. _merger-trees-descr: Merger Trees ~~~~~~~~~~~~ Merger tree specifications have not yet been defined. Examples -------- Here we provide the structure of a sample IRATE Format file in the form output by the ``h5dump`` utility (included in libhdf5 library). Note that the 'Halo', 'Bulge', and 'Disk' groups are not actually a part of the specification, but are examples of possible ways one might wish to sub-divide the particle data. Also note that a typical IRATE file will contain many more datasets, particularly in the catalogs, which have been removed from here for the sake of brevity:: HDF5 "SampleIRATEfile.hdf5" { FILE_CONTENTS { group / (Contains attribute defining the version of the IRATE format that this file conforms to) group /Cosmology (Contains attributes defining the cosmology of the simulation) group /SimulationProperties (Contains attributes defining non-cosmological properties of the simulation) group /Snapshot00144 (Contains attributes defining redshift, scale factor, or both) group /Snapshot00144/HaloCatalog_AHF (Should contain attributes defining the parameters of the halo finding) dataset /Snapshot00144/HaloCatalog_AHF/Center (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_AHF/Ekin (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_AHF/Epot (Contains attributes with unit information) group /Snapshot00144/HaloCatalog_AHF/HaloParticleData ext link /Snapshot00144/HaloCatalog_AHF/HaloParticleData/HaloParticleTypes -> SampleIRATEfile-00144particles.hdf5 /HaloParticleTypes ext link /Snapshot00144/HaloCatalog_AHF/HaloParticleData/HaloParticleIDs -> SampleIRATEfile-00144particles.hdf5 /HaloParticleIDs ext link /Snapshot00144/HaloCatalog_AHF/HaloParticleData/ParticlesPerHalo -> SampleIRATEfile-00144particles.hdf5 /ParticlesPerHalo dataset /Snapshot00144/HaloCatalog_AHF/L (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_AHF/Mvir (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_AHF/Phi (Contains attributes with unit information) group /Snapshot00144/HaloCatalog_AHF/RadialProfiles dataset /Snapshot00144/HaloCatalog_AHF/RadialProfiles/L (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_AHF/RadialProfiles/M_in_r (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_AHF/RadialProfiles/dens (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_AHF/RadialProfiles/npart dataset /Snapshot00144/HaloCatalog_AHF/RadialProfiles/r (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_AHF/RadialProfiles/vcirc (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_AHF/Rmax (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_AHF/Rvir (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_AHF/Velocity (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_AHF/Vmax (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_AHF/fMhires dataset /Snapshot00144/HaloCatalog_AHF/lambda dataset /Snapshot00144/HaloCatalog_AHF/nbins dataset /Snapshot00144/HaloCatalog_AHF/npart group /Snapshot00144/HaloCatalog_Rockstar (Should contain attributes defining the parameters of the halo finding) dataset /Snapshot00144/HaloCatalog_Rockstar/Center (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_Rockstar/M200b (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_Rockstar/R200b (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_Rockstar/Rmax (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_Rockstar/Spin dataset /Snapshot00144/HaloCatalog_Rockstar/Velocity (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_Rockstar/Vmax (Contains attributes with unit information) dataset /Snapshot00144/HaloCatalog_Rockstar/npart group /Snapshot00144/ParticleData (Contains attributes with unit information for all datasets within it) group /Snapshot00144/ParticleData/Dark_Bulge dataset /Snapshot00144/ParticleData/Dark_Bulge/ID dataset /Snapshot00144/ParticleData/Dark_Bulge/Mass dataset /Snapshot00144/ParticleData/Dark_Bulge/Position dataset /Snapshot00144/ParticleData/Dark_Bulge/Velocity group /Snapshot00144/ParticleData/Dark_Disk dataset /Snapshot00144/ParticleData/Dark_Disk/ID dataset /Snapshot00144/ParticleData/Dark_Disk/Mass dataset /Snapshot00144/ParticleData/Dark_Disk/Position dataset /Snapshot00144/ParticleData/Dark_Disk/Velocity group /Snapshot00144/ParticleData/Dark_Halo dataset /Snapshot00144/ParticleData/Dark_Halo/ID dataset /Snapshot00144/ParticleData/Dark_Halo/Mass dataset /Snapshot00144/ParticleData/Dark_Halo/Position dataset /Snapshot00144/ParticleData/Dark_Halo/Velocity group /Snapshot00153 (Contains attributes defining redshift, scale factor, or both) group /Snapshot00153/HaloCatalog_AHF (Should contain attributes defining the parameters of the halo finding) dataset /Snapshot00153/HaloCatalog_AHF/Center (Contains attributes with unit information) group /Snapshot00153/HaloCatalog_AHF/HaloParticleData ext link /Snapshot00153/HaloCatalog_AHF/HaloParticleData/HaloParticleTypes -> SampleIRATEfile-00153particles.hdf5 /HaloParticleTypes ext link /Snapshot00153/HaloCatalog_AHF/HaloParticleData/HaloParticleIDs -> SampleIRATEfile-00153particles.hdf5 /HaloParticleIDs ext link /Snapshot00153/HaloCatalog_AHF/HaloParticleData/ParticlesPerHalo -> SampleIRATEfile-00153particles.hdf5 /ParticlesPerHalo dataset /Snapshot00153/HaloCatalog_AHF/L (Contains attributes with unit information) dataset /Snapshot00153/HaloCatalog_AHF/Mvir (Contains attributes with unit information) group /Snapshot00153/HaloCatalog_AHF/RadialProfiles dataset /Snapshot00153/HaloCatalog_AHF/RadialProfiles/M_in_r (Contains attributes with unit information) dataset /Snapshot00153/HaloCatalog_AHF/RadialProfiles/r (Contains attributes with unit information) dataset /Snapshot00153/HaloCatalog_AHF/RadialProfiles/vcirc (Contains attributes with unit information) dataset /Snapshot00153/HaloCatalog_AHF/Rmax (Contains attributes with unit information) dataset /Snapshot00153/HaloCatalog_AHF/Rvir (Contains attributes with unit information) dataset /Snapshot00153/HaloCatalog_AHF/Velocity (Contains attributes with unit information) dataset /Snapshot00153/HaloCatalog_AHF/Vmax (Contains attributes with unit information) dataset /Snapshot00153/HaloCatalog_AHF/nbins dataset /Snapshot00153/HaloCatalog_AHF/npart group /Snapshot00153/HaloCatalog_Rockstar (Should contain attributes defining the parameters of the halo finding) dataset /Snapshot00153/HaloCatalog_Rockstar/Center (Contains attributes with unit information) dataset /Snapshot00153/HaloCatalog_Rockstar/M200b (Contains attributes with unit information) dataset /Snapshot00153/HaloCatalog_Rockstar/Mbound200b (Contains attributes with unit information) dataset /Snapshot00153/HaloCatalog_Rockstar/R200b (Contains attributes with unit information) dataset /Snapshot00153/HaloCatalog_Rockstar/Rmax (Contains attributes with unit information) dataset /Snapshot00153/HaloCatalog_Rockstar/Velocity (Contains attributes with unit information) dataset /Snapshot00153/HaloCatalog_Rockstar/Vmax (Contains attributes with unit information) dataset /Snapshot00153/HaloCatalog_Rockstar/npart (Contains attributes with unit information) group /Snapshot00153/ParticleData (Contains attributes with unit information for all datasets within it) group /Snapshot00153/ParticleData/Dark_Bulge dataset /Snapshot00153/ParticleData/Dark_Bulge/ID dataset /Snapshot00153/ParticleData/Dark_Bulge/Mass dataset /Snapshot00153/ParticleData/Dark_Bulge/Position dataset /Snapshot00153/ParticleData/Dark_Bulge/Velocity group /Snapshot00153/ParticleData/Dark_Disk dataset /Snapshot00153/ParticleData/Dark_Disk/ID dataset /Snapshot00153/ParticleData/Dark_Disk/Mass dataset /Snapshot00153/ParticleData/Dark_Disk/Position dataset /Snapshot00153/ParticleData/Dark_Disk/Velocity group /Snapshot00153/ParticleData/Dark_Halo dataset /Snapshot00153/ParticleData/Dark_Halo/ID dataset /Snapshot00153/ParticleData/Dark_Halo/Mass dataset /Snapshot00153/ParticleData/Dark_Halo/Position dataset /Snapshot00153/ParticleData/Dark_Halo/Velocity } } ...