Oak Ridge National Laboratory
Tuesday, May 22, 2018
What is Scientific Exascale data? We use it in Data Intensive Science, which is an acknowledgement that as simulations and experiments continue to generate larger amounts of data, we must turn our attention on how to move, store, manage, analyze and visualize this data in a timely fashion. We are already seeing Petascale simulations produce close to 100 PB per simulation, and we are hearing simulations for Exascale computing trying to approach 100 EB of data per week. Clearly the cost of “write once read never” is becoming too expensive and we must start to create software eco-systems to help us cope with this flood of data from scientific instruments and calculations. We have built the idea of I/O staging in the Adaptable I/O system (ADIOS) to ingest, reduce, and move data on HPC systems and over the WAN to other computational resources. My talk focuses on creating a software ecosystem which employs these techniques to cope with the extreme amounts of data being produced. Furthermore, Exascale data must be re-purposed in time in order to validate the results against physics experiments, such as the ITER fusion tokamak. This creates new challenges which must be explored and developed into an overarching infrastructure for scientific data. Our goal is to create an I/O framework that addresses most of the use-cases arising from both the Exascale challenges and the new scientific instruments coming on-line in the next 10 years.
Dr. Klasky is a distinguished scientist and the group leader for Scientific Data in the Computer Science and Mathematics Division at the Oak Ridge National Laboratory. He also holds an appointment at the University of Tennessee, and Georgia Tech University. Dr. Klasky is a world expert in scientific computing and scientific data management, co- authoring over 200 papers, and a PI on over 8 multi-million dollar projects in the department of energy.