Introduction to Latch-X
Latch-X is a modeling framework for availability and reliability analysis of complex systems using YAML-defined components, Bayesian Networks (BN), and Monte Carlo (MC) simulation.
Availability vs Reliability (what we mean in this docs)
-
Availability (A or A∞) — repairable systems. Probability the service is up1.
By default we mean steady-state availability (A or A∞). When we discuss time-varying behavior, we’ll write A(t) for point availability or Ā[t₁,t₂] for interval/mission availability. -
Reliability (R(t)) — non-repairable (mission) view. Probability a component/system operates without failure over mission time t.
In general, reliability is “one minus the failure distribution”2.
Exponential is most common; normal/lognormal are supported in Latch-X too34.
In short:
Availability → “How often is it up?” (A steady-state; A(t) if time-varying)
Reliability → “Chance it survives the next t hours?” (R(t))
How Latch-X supports both
Repairable (Availability mode)
- Model components with
mttfandmttr. - Use BN for steady-state availability and contribution analysis.
- Use MC for temporal uptime/downtime distributions and RTO metrics.
Non-repairable (Reliability mode)
- Model components with
mttfonly (no repair). - Use BN for mission reliability calculations.
- Use MC for failure time distributions and mission success probability.
What Latch-X Analysis Answers
Both availability and reliability analysis help answer critical system questions:
- What's our expected system availability over time?
- What's the reliability (survival probability) for a mission duration?
- Which components are most critical to system uptime?
- How would losing a specific component affect the overall system?
- What's the probability of meeting our SLA commitments?
- Where should we invest in redundancy or improvements?
Core concepts
Components
The building blocks of your system. Each component can fail independently with its own failure rate (MTTF) and repair time (MTTR).
Examples
Web servers, databases, load balancers, network switches, power supplies
Dependencies
Relationships between components that determine how failures propagate through your system.
Examples
Web servers depend on databases, load balancers depend on web servers, everything depends on power
Availability vs Reliability Modeling
Different modeling approaches for different analysis goals:
Availability modeling (repairable systems)
- Components have both MTTF and MTTR parameters
- Formula shown in 1
- Answers: "What fraction of time is the system operational?"
Reliability modeling (mission systems)
- Set
repair_enabled: false(repairs are ignored). - For normal nodes, the schema still requires
mttfandmttrwhen modeling with times; the engine ignoresmttrin this mode.
(Alternatively, useprobinstead of time-based params.) - Latch nodes:
probormttf + max_delay(nomttr). - Formula examples in 2 and 3
- Answers: "What's the probability of no failures during mission time t?"
Failure propagation
How the failure of one component affects other components and the overall system through dependency relationships.
Example
Database failure → Web server failure → Load balancer failure → System unavailable
Analysis approaches
BN Engine (Bayesian Network)
Mathematical, exact analysis providing fast results in seconds. Supports both availability (steady-state) and reliability (mission-time) calculations.
Key benefits
- Fast: Results in seconds
- Exact: Mathematically precise probabilities
- Dual mode: Steady-state availability OR mission reliability
- Versatile: Impact analysis, root cause analysis
MC Engine (Monte Carlo)
Statistical, simulation-based analysis providing detailed temporal insights for both availability and reliability scenarios.
Key benefits
- Detailed: Statistical confidence intervals
- Temporal: Timing patterns and distributions
- Dual mode: Availability (repair cycles) OR reliability (failure times)
- Realistic: Models actual event sequences
- Validation: Cross-checks BN results
Getting started
- Access the Dashboard to see available tools
- Create your first model using AI Prompt or manual creation
- Define Components with realistic MTTF/MTTR values
- Configure Analysis settings
- Run BN Engine analysis for quick results
Next steps
-
A∞ ≈ MTTF / (MTTF + MTTR)— exact for a two-state exponential failure/repair model; a practical shortcut otherwise. ↩↩ -
R(t) = 1 - F(t), whereF(t)is the cumulative distribution function (CDF) of time-to-failure (TTF). ↩↩ -
- Exponential TTF:
R(t) = exp(−t / MTTF) - Normal TTF (mean mu, std sigma):
R(t) = 1 − Phi((t − mu)/sigma) - Lognormal TTF (log-mean mu, log-std sigma):
R(t) = 1 − Phi((ln t − mu)/sigma)(valid fort > 0)
- Exponential TTF:
-
In YAML, set
mttf_dist: exp|norm|lognorm(and includesigmafornorm|lognorm).mttr_distsupportsdelta|exp|norm|lognorm. ↩