Contact

Position:
Leibniz-Institut for Astrophysics Potsdam (AIP), Germany
Address
Germany

Miscellaneous Information

Miscellaneous Information

Abstract Reference: 30769
Identifier: O7.2
Presentation: Oral communication
Key Theme: 2 Management of Scientific and Data Analysis Projects 

Hosting astronomical data in sharded SQL databases

Authors:
Galkin Anastasia, Klar Jochen, Riebe Kristin, Enke Harry, 

At Leibniz-­Institute for Astrophysics Potsdam (AIP) we host and publish terabytes of cosmological simulations and observational data using sharded MariaDB (spin-off of MySQL) nodes. A dedicated web application allows the scientists all around the world to run SQL queries which include specific astrophysical functions and get their desired data in reasonable time.

The setup is based on sharded MariaDB database nodes, orchestrated by a head node which runs the MariaDB Spider engine. ParallelQuery (PaQu) reformulates the SQL queries for the use in distributed environment. On toThe web framework Daiquiri offers a dedicated web interface for each of the hosted scientific database. All software we develop is open source.

To name a few scientific databases in use since 2013:

- AIP is one of the four data centers to store the Gaia data starting from the data release in September 2016. A mock dataset is already published for Gaia internal test purposes. https://gaia.aip.de/

- MultiDark and Bolshoi simulation results can be accessed on https://cosmosim.org

- Digitalized archive for astronomical photographic plates: https://www.plate­archive.org

Outline

This contribution will answer following questions:

- Why would it make sense to store terabytes of astronomical data in a SQL database?

- How to work with billions of lines in MySQL/MariaDB and be fast?

A short overview over the running applications and planned future development will be given.

These themes will be explained in detail:

- Spider and Federated Storage Engines in MySQL/MariaDB

- Reormulating SQL queries using PaQu

- Reusable web applications with the Daiquiri framework

 

Links

- Supercomputing and E-Science AIP: http://escience.aip.de

- Daiquiri documentation: http://escience.aip.de/daiquiri

- PaQu parallel queries for the techie:

https://www.cosmosim.org/cms/documentation/data-access/paqu-parallel-queries-for-the-techie/