Organised by:
in collaboration with:



social media:


Accesso Utenti

Fukuoka University, Fukuoka
Miscellaneous Information:

Abstract Reference: 31029
Identifier: P3.4
Presentation: Poster presentation
Key Theme: 3 New Trends in HPC and Distributed Computing 

Pre-feasibility Study of Astronomical Data Archive Systems Powered By Public Cloud Computing and Hadoop Hive

Eguchi Satoshi

The size of astronomical observational data is increasing yearly. For example, while Atacama Large Millimeter/submillimeter Array is expected to generate 200 TB raw data every year, Large Synoptic Survey Telescope is estimated to produce 15 TB raw data every night. Since the increasing rate of computing is much lower than that of astronomical data, to provide high performance computing (HPC) resources together with scientific data will be common in the next decade. However, the installation and maintenance costs of a HPC system can be burdensome for the provider. I note public cloud computing for an alternative way to get sufficient computing resources inexpensively. I build Hadoop and Hive clusters by utilizing a virtual private server (VPS) service and Amazon Elastic MapReduce (EMR), and measure their performances. The VPS cluster behaves differently day by day, while the EMR clusters are relatively stable. Since partitioning is essential for Hive, several partitioning algorithms are evaluated. In this poster, I report the results of the benchmarks and the performance optimization in cloud computing environment.