Contact

Position:
INAF - Osservatorio Astronomico di Teramo, Teramo
Address
Italy

Miscellaneous Information

Miscellaneous Information

Abstract Reference: 31277
Identifier: P8.7
Presentation: Poster presentation
Key Theme: 8 Other

Case study: the use of Nagios to monitoting the complex facility of Telescope Manager (TM) of Square Kilometer Array (SKA)

Authors:
Canzari Matteo, Di Carlo Matteo, Dolci Mauro, Smareglia Riccardo

SKA (Square Kilometer Array) is a project to design and build a large radio-telescope, composed using thousands of antennae and related support systems (timing generation, signal real-time processing and so on). The orchestration of this large and complex facility is performed by the Telescope Manager (TM), a suite of software applications aimed to manage observations (preparation and execution), signal processing and scientific data delivery, as well as gathering all status and performance data from the facility. In order to ensure the proper and uninterrupted operation of TM, a local monitoring and control system (TM.LMC) is being developed. Among its responsibilities, monitoring, lifecycle control and fault management are of the utmost importance. For the very central activity of LMC monitoring, Nagios has been proposed as the good solution to monitor TM resources, services and the status of processes both at generic level (directly achieved by Nagios) and at performance level. For this specific purpose, a configurable custom agent is under development. It is handled by lifecycle manager (that realizes the ability to control a software application in the following phases of its lifetime: configuration, start, stop, update, upgrade or downgrade), sends monitoring data and generates alarms based on the logic of the specific TM applications that can be based on the Tango-controls framework (a CORBA-based interface to control each element of the telescope, which the SKA overall architecture is based on), together with standard web-based applications or generic scripts. Integrating monitoring data and custom plugin have been possible to handle fault management, in order to prevent an abnormal situation or restore normal behavior in case of not managed failure. Also, thanks to the highly flexibility of Nagios, an integration with Logging System has been developed in order to retrieve the working of Telescope Manager using his logging data.