The assessment process is even more difficult when Commercia
The assessment process is even more difficult when Commercial-Off-The-Shelf (COTS) components – whose internals are partially or totally unknown – come into play. COTS are being increasingly used by the industry to reduce costs and to shorten development (and possibly deployment) time. However, since COTS are general-purpose components that have not been designed and developed for robust operation, obtaining a predictable operation profile (for individual components and – even more – for the resulting system) is a challenging endeavour. The research amyloid definition synthesis has proposed techniques – mostly guidelines – to help reliability engineers carry out the assessment of safety properties , , , , . However, the aforementioned studies are mainly qualitative. It is also worth emphasizing that the existing literature – with a few exceptions (e.g. ) – does not refer to real industry applications, which limits the applicability of the proposed techniques to commercial setups. Conversely, we contribute a methodological framework that can be used in practice as a reference for certifying a wide class of emerging critical systems, virtually any system for which: (i) the general architecture has already been designed, (ii) business constraints impose that (radical) changes to the architecture be avoided, and (iii) the main COTS components that must be integrated have already been chosen. We demonstrate, with respect to a real industrial system, that it is possible to achieve SIL2 via proper configuration of system parameters and set-up of rejuvenation procedures. The paper in fact addresses SIL2 assessment of a real COTS-based SRS, specifically the two-nodes cluster server hosting the Train Management System (TMS) application of Hitachi Ansaldo STS (ASTS). The study enabled ASTS to identify the conditions under which the architecture of the Active/Standby cluster – incorporating COTS software and hardware – can be certified as SIL2 compliant, as specified in EN50129. In order to achieve this goal, the system is assessed by means of a hybrid approach, i.e., a combination of analytical modeling and experimental evaluation. Analytical modeling is done using PRISM (the well-known and widely used tool for formal model checking), while experimental evaluation relies on direct measurements on the real system. The hybrid approach we present consists of several phases, which can be grouped in five main iterative stages, i.e.: (1) Identifying the THR and safety bottlenecks of the system in its current configuration through formal models; (2) Defining possible corrective actions; (3) Weighing their effectiveness, i.e. evaluating the potential impact on the THR cluster model; (4) Validating the results of previous phases through an experimental campaign on the real system. (5) Using experimental estimates within models to calculate the final THR. In the specific case of the ASTS TMS system, the safety bottlenecks turned out to be COTS Operating System (OS) and Cluster Resource Manager (CRM). Extended versions of the cluster model were drawn to evaluate the effectiveness of mitigation actions. The less costly one (in terms of resources and effort), which also provided results of THR in SIL2, was software rejuvenation,cotroneo. A Quantitative Accelerated Life Test (QALT) and an Accelerated Degradation Test (ADT) were used to estimate the experimental Mean Time To Failure (MTTF) and Time To Aging-Related Failure (TTARF) of a single server node in the short-term and in stressed execution, respectively. Such measurements were used as input to the rejuvenation-extended cluster model, and the final value of THR was found. This paper makes three important contributions: The remainder of this work is organized as follows. Section 2 gives an insight into previous work. Section 3 describes the ASTS use case. Section 4 presents the approach adopted. Sections 5 and 6 describe the cluster modeling and the mitigation strategies proposed, respectively. These are followed by Section 7, that describes the experimental validation phase. Finally, Section 8 concludes the document with some final remarks.