1897 gave us Edison’s Kinetoscope—a precursor to the videos that occupy everyone’s time and the discovery of the electron by English physicist J.J. Thomson. And….Swedish engineer S.A. Andrée innovated with then stateoftheart transportation technologies.
Andrée, Nils Strindberg, and Knut Fraenkel set out on a real mission of scientific discovery by attempting to float over the North Pole in a 97foot tall, varnished silk balloon filled with hydrogen. Unfortunately for the crew, their beloved Örnen could not achieve the buoyancy needed to attain cruising altitude or to maintain any speed. After bumping along on the ice in a suspended basket, the Swedes landed the balloon halfway between their launch point and the North Pole. All died of exposure.
Design Reliability in an Era of Predictability
Despite all the inventions of the era, the crew of the Örnen could not rely on sensors connected to the IoT or predictive maintenance. Nor could they consider reliability predictions that calculated and forecast the design feasibility, potential failure areas, system design factors, and reliability improvement issues for hydrogen balloon components functioning in very cold conditions.
Different measures help evaluate the possibility of success and assist with determining if a system requires redundancy in the form of backup systems, components, subsystems, assemblies, and components. Reliability or R(t) defines the probability that a component or system remains operable. In this context, reliability occurs as a probability from zero to one.
The concept of reliability varies slightly between repairable items such as an aerospace guidance system and nonrepairable items such as semiconductors that we happily throw away after the first failure. We define a system as repairable if we can restore the system to its normal operating point through component replacement or through repairs when a failure happens. For nonrepairable items, reliability is the probability that the item will perform its desired function without failure for a stated period of time under specific conditions. For repairable items, we see reliability as the probability that the component or system will not fail during the time interval zero to t1.
Feet Don’t Fail Me Now!
Every reliability prediction has a basis in failure rates. A conditional failure rate tells us about the anticipated number of times that a component or system will fail within a specific time period. Calculations based on complex models measure the reliability of the item. A reliability prediction model may include temperature, environmental, mechanical stress, and other types of data.
Let’s consider how these predictors work for repairable and nonrepairable items. Although the definitions differ, both types of items have decreasing, constant, and increasing failure rates.
Determining the patterns becomes more problematic when we consider complex systems that consist of repairable and nonrepairable items. Because of those factors, tracking reliability involves arranging assembled components in a logical series structure. That is, the reliability of an assembly equals the sum of the individual component failure rates.
However, the complexity of the system also tells us that two subsystems probably will not enter a failed state simultaneously. As a result, engineering teams build stochastic life models of a complex repairable system based on the accelerated life models of components. Stochastic models describe random events occurring within a continuum. With all this in mind, the system life model includes algorithms and software tools that determine stress on the components and the average availability and availability distribution for any number of failure modes regardless of complexity or size.
A Mean, Mean World
Rather than assume that a balloon will transport us across the Artic, we use a range of measures for checking the reliability—or the functional versus the nonfunctional stateof repairable products, hardware modules, nonrepairable systems, and devices. Those measures include:

Mean Time Between Failure (MTBF)

Failure in Time (FIT)

Mean Time to Repair (MTTR)

Mean Time to Failure (MTTF)
Below you’ll find a synopsis of each of these measures.
Mean Time Between Failure (MTBF)
Mean Time Between Failure (MTBF) measures the amount of time that passes before a repairable or nonrepairable component, assembly, or system fails. Why do we care? In brief, MTBF can tell us when conditional or preventive maintenance should occur. With the amount of time usually given in hours, MTBF analyzes actual failures in a large group of repairable products. “Mean time” represents the statistical value or “mean” over a long period of time and with a large number of units. Rather than showing the typical life of a product, MTBF represents a statistical measure over a large family of products.
We can view this in terms of the expected amount of time between two consecutive failures.
MTBF = Number of hours of operational time / Total number of failures.
Mean Time to Repair (MTTR)
Mean Time to Repair (MTTR) applies only to repairable items and equals the total amount of time used to perform all corrective or preventative maintenance repairs divided by the total number of the repairs. In effect, MTTR compares the expected span of time from a failure to the repair or:
MTTR = Total maintenance time / Total number of repairs.
Given this calculation, MTTR measures the efficiency of repair programs and the ability of organizations to respond to a repair issue. MTTR can work as a factor for determining the repair or replacement of assets, establishing repair inventories, and for rental/purchase decisions
Mean Time to Failure (MTTF)
Mean Time to Failure (MTTF) evaluates the reliability of nonrepairable items and equals the mean time expected until the first failure of a component, assembly, or system. For repairable items, MTTF equals the expected span of time from repair to the first or next failure.
MTTF = Total hours of operation / Total number of units.
Failure in Time (FIT)
The Failure in Time (FIT) measure aligns with how organizations report MTBF information. A FIT analysis shows the number of expected failures per one billion hours of operation for a semiconductor device. Both measures provide information about performance as well as the availability and reliability of components.
Systems, despite how carefully they are assembled, can still surprise us with their fallibility.
Someone Has to Pay
We can also consider reliability in terms of populations and lifecycle costs. In terms of population, the operation and maintenance of systems may differ by system or component type. For example, robotic systems include different types of repairable components that have different operations and maintenance needs than the systems used for aerospace vehicles. The size of the system population, the time needed to replace components and the number of maintenance channels and routines impacts the lifecycle costs for the systems. Establishing a designed MTBF and a MTTR that meets design parameters assists with predicting and lessening lifecycle costs.
With any system design working towards improved reliability and more intentional lifetime management, Cadence's suite of design and analysis tools will help you achieve your means. Furthermore, utilizing an industry standard, customizable layout tool in OrCAD will be assuredly a great first step in getting your design out the door appropriately.
If you’re looking to learn more about how Cadence has the solution for you, talk to us and our team of experts.
About the Author
Follow on Linkedin Visit Website More Content by Cadence PCB Solutions