This framework operationalises the RISL Governing Document — translating principles into repeatable field execution. It is built from lived industrial leadership experience and is designed to perform under real operating pressure, not conference-room conditions.
- Define recurring operational failures
- Establish execution controls
- Provide internal alignment reference
- Support proposal and scope definition
Living DocumentRev1 20262506854 Ontario Inc.
RISL operates where the gap between a controlled operation and a catastrophic one is measured in procedure adherence, leadership presence, and the integrity of a single work order. The Forensic Execution Intelligence framework was built for industries where unplanned downtime costs $25,000–$150,000 CAD per hour and where the same deficiency domains — maintenance drift, workforce capability failure, communication breakdown, documentation decay — appear regardless of sector. Nuclear-grade standards are applied as the benchmark of procedural rigour across all engagements, not as a target market.
Primary Sectors
Oil & Gas — Upstream, Midstream, Downstream. Rotating equipment, pressure systems, process safety, turnaround execution.
Power Generation — Thermal, Hydro, Renewable. Grid reliability, outage management, fatigue risk in shift operations.
Mining & Minerals Processing — Continuous process plants, mobile equipment, contractor workforce management.
Chemical & Petrochemical — Process safety management, HAZOP follow-through, MOC discipline.
Heavy Manufacturing — Production reliability, OEE improvement, workforce competency verification.
Utilities & Infrastructure — Asset life extension, compliance-driven maintenance, workforce knowledge transfer.
Geographic Reach
Canada — Ontario (HQ), Alberta, Saskatchewan. Federal and provincial regulatory frameworks: CAN/CSA, CNSC, TSSA, OHS Act jurisdictions.
Trinidad & Tobago — Strategic base. Petroleum Act, OSHA 2004 (T&T), OSH Agency compliance.
Caribbean Basin — Guyana (rapidly expanding O&G sector), Barbados, Jamaica, regional energy utilities.
Standards Applied — ASME, ISO, API, OSHA, CNSC REGDOC-2.1.2, SMRP, NFPA — jurisdiction-matched to every engagement.
The Common Thread Across All Sectors
Every sector served by RISL shares the same root cause profile: work that was authorised but not governed, leaders who were present but not engaged, and systems that captured data but did not enforce consequence. The RISL Forensic Baseline (Section 3) makes this profile measurable on Day 1 of any engagement — before any recommendation is made.
Reinforces Sec 3Sec 8Sec 10Sec 17
Industrial operations do not typically fail because of technology. They fail because of systemic breakdowns in how work is governed, how risk is communicated, and how capability is built and sustained. These five domains form the diagnostic entry point — the Forensic Baseline — that RISL establishes before any recommendation is made. You cannot fix what you have not measured. You cannot measure what you have not defined. The Baseline defines it.
3.1 — Maintenance & Reliability Failure
Emergency work above 15% of total maintenance volume is a signal of systemic PM breakdown — not bad luck. The SMRP Best Practices benchmark for world-class operations sets emergency and urgent labour hours at ≤2% of total labour hours. The ≤10% threshold is the acceptable operating floor — not world-class. Reactive maintenance costs 3–4× more per repair event than planned work per the U.S. Department of Energy, with broader industry studies extending to 5×, and every reactive event resets the equipment degradation curve. RISL measures: PM compliance rate, emergency work ratio, wrench time, work order backlog age, and repeat failure rate. Each metric has a world-class target. The gap between your current state and that target is the financial opportunity RISL will quantify on Day 1.
Reference: SMRP Best Practices · ISO 55001 Asset Management · INPO AP-913 Equipment Reliability · Section 10 (Execution Discipline)
3.2 — Safety Degradation
Safety does not degrade in a single event. It degrades through the accumulation of tolerated deviations — procedures bypassed without consequence, stop-work authority not exercised, concerns raised but not acted on. The CSB has documented this pattern in every major process safety incident on record. RISL verifies whether your safety systems are functional — not whether the paperwork exists. A LOTO procedure that lives in a binder and not on the equipment is not a control. A stop-work authority that workers fear to exercise is not protection. RISL tests both at the workface, not from the office.
Reference: ISO 45001:2018 · OSHA 1910.119 PSM · REGDOC-2.1.2 Safety Culture · Section 8 (Stop-Work Authority) · Section 9 (Leadership Presence)
3.3 — Workforce & Leadership Failure
RISL defines two categories of workforce failure. The first is competency gap — the technician cannot perform the task to the required standard. The second, and more dangerous, is the tribal knowledge trap — the technician can perform the task, but only because they memorised a workaround that exists nowhere in the procedure. When that technician leaves, the knowledge leaves. RISL maps both categories, identifies the single points of failure, and enforces documented knowledge transfer before the gap becomes an incident. The INPO AP-913 Equipment Reliability process and the IAEA Safety Culture framework both identify organisational knowledge management as a Tier 1 reliability driver.
Reference: INPO AP-913 · IAEA GS-G-3.5 · ANSI/ASSE Z490.1 · Section 14 (Workforce Capability) · Section 21 (Communication Integration)
3.4 — Systems & Data Failure
The CMMS is only as reliable as the data entered into it. Work orders closed with generic descriptions, PM tasks marked complete without physical verification, and equipment histories built on "done" rather than "done and measured" are not records — they are liability. The Davis-Besse nuclear event (2002) is the definitive case study: years of boric acid corrosion were documented in the CMMS as acceptable and closed. The data existed. The discipline to act on it did not. RISL's Three-Way Match Audit (Plan vs. Execution vs. CMMS record) directly addresses this gap — making the invisible visible before it becomes catastrophic.
Reference: ISO 55001 Cl.9.1 · SMRP Best Practice 4.1 · NRC Davis-Besse (2002) · Section 10 (Three-Way Match) · Section 20 (Documentation Integrity)
3.5 — Cost & Capital Inefficiency
The financial cost of maintenance failure is not abstract. RISL uses the $1.8M Threshold: a conservatively modelled 12-month value of preventing a single chronic equipment failure in a Tier 1 facility — factoring in production loss, rework, emergency parts premium, contractor overtime, and regulatory exposure. The daily loss from reactive maintenance drift is calculated in Section 8 of every engagement as the Forensic Baseline financial anchor. Capital inefficiency — deferred maintenance compounding into capital replacement, misallocated labour, and purchased-as-needed parts premiums — is quantified and presented to the Finance Lead at Sovereign Handover.
Reference: SMRP RAV Metric · ISO 55001 Financial Risk · Section 4 (Measurable Outcomes) · Section 17 (Financial Efficiency)
Battle Pack 01–04Reinforces Sec 8Sec 10Sec 14Sec 17Sec 20Sec 21
Most catastrophic outcomes are not caused by lack of information — they result from delayed or compromised decisions. When pressure increases, decisions must tighten, not slow down. The Risk-Ownership Decision Model mandates that decisions are made at the lowest competent level, with risk explicitly named and safety as a hard constraint.
- Decision rights pre-defined
- Risk explicitly named — What / Consequence / Controls
- Time is a variable, not an excuse
- All decisions documented and closed
Reinforces Sec 8Sec 9Sec 10Sec 13
World class organisations treat stop-work authority as absolute — work stops when critical controls are degraded, with no justification required beyond the identified risk. RISL verifies whether your frontline actually exercises this authority without fear, or whether retaliation culture has rendered it theoretical. An unexercised stop-work authority is not a cultural achievement — it is a warning sign.
- Anyone can escalate — based on risk, not rank
- Restart only after controls verified
- Leadership responds with inquiry, not defensiveness
- Type A risk → Named Owner within 60 minutes
60-Minute RuleType A / B / C Risk
World class maintenance organisations do not allow work to begin without a verified Ready for Execution status — parts, permits, and tools confirmed at the workface, not assumed from the office. RISL measures your operation against this standard using the Three-Way Match Audit: does the Plan (what the Planner said), the Execution (what the Technician did), and the CMMS record (what was captured) align? Where they do not, the gap is a documented finding — not a conversation.
The CMMS Gap — RISL Core Principle
The CMMS issues the instruction. The CMMS receives the result. Everything in between — execution, behaviour, deviation, and decision — is invisible to the system unless enforced at the workface. This gap is where every incident, every cost overrun, and every equipment failure originates. RISL operates in that gap. We verify what the CMMS cannot see.
Procedure Use & Adherence — Step by Step as Written
Every safety-critical procedure is executed step by step as written — no step skipped, no step assumed complete, no improvisation regardless of experience level. A technician who bypasses a step is creating an uncontrolled deviation.
Place Keeping
Each completed step is physically marked in the procedure at the workface. The physical mark is the evidence. A procedure with no markings is a procedure that was not used — regardless of what the technician says.
- Ready-for-Execution gate enforced
- Three-Way Match audit: Plan / Execute / CMMS
- Backlog prioritised by risk — not by noise
- Deviations logged and root-caused immediately
- Procedure Use & Adherence — step by step as written
- Place Keeping — physical step marking at workface
- CMMS close-out accuracy verified by supervisor
Reinforces 3.1–3.5Sec 9Sec 11Sec 24 — HuP
ISO 31000:2018 defines risk as "the effect of uncertainty on objectives." RISL applies this definition operationally — risk is not an abstract category on a matrix. It is a specific condition, on a specific piece of equipment or in a specific process, with a specific consequence range, assigned to a specific human being who has the authority and obligation to resolve it within a defined timeframe. Anonymous risk ownership — "the team will monitor it," "maintenance is aware," "it's in the CMMS" — is not risk management. It is risk diffusion. And diffused risk is uncontrolled risk.
The RISL Risk Taxonomy — Type A / B / C
Named Owner assigned within 60 minutes. Work stops. Type A covers any condition that presents an immediate threat to personnel safety, structural integrity, or regulatory compliance. The 60-Minute Rule is non-negotiable — it is a response time standard with documented accountability. If a Named Owner cannot be identified within 60 minutes, the escalation moves to the next authority level automatically. No exceptions for shift change, holidays, or competing priorities. Type A risk examples: active LOTO failure, pressure boundary breach, confined space atmospheric hazard, structural integrity compromise in a critical asset.
Named Owner assigned, resolution plan documented within 24 hours. Type B covers conditions that degrade operational reliability or present an escalating risk if not addressed — but do not require immediate work stoppage. Vibration signature trending toward ISO 10816 Alert threshold, PM task overdue beyond 10% of interval, permit-to-work documentation gap on a non-critical job. Type B risks that are not resolved within their window escalate automatically to Type A status — the clock does not pause because the shift changed.
Named Owner, documented in CMMS, monitored on defined frequency. Type C covers conditions identified through predictive analysis — oil analysis trending, vibration baseline shift, thermal anomaly — that indicate a future failure if left unaddressed. Type C is the category where proactive maintenance captures its full value. A Type C finding that is documented, monitored, and addressed before it becomes a failure event is the PM system functioning as designed. A Type C finding that is documented and ignored is a Davis-Besse in progress.
PM Deferral Engine — No Anonymous Deferrals
Every PM deferral requires a Named Approver — a specific human being, identified by name and role, who has reviewed the risk of deferral and accepted accountability for the consequence if it produces a failure. Anonymous deferrals — approved by "maintenance planning" or "the system" — do not exist in the RISL framework. The PM Deferral Engine records: the task deferred, the reason, the Named Approver, the risk classification of the deferral, and the new scheduled date. This record cannot be altered without a second Named Approver. The Davis-Besse case (NRC, 2002) is the definitive lesson in what happens when PM deferrals are processed without named accountability.
Residual Risk Assessment — Post-Fix Verification
Closing a risk finding is not equivalent to eliminating the risk. RISL requires a Residual Risk Assessment after every Type A and Type B closure: has the corrective action actually addressed the root cause, or has it temporarily controlled the symptom? The Three-Way Match Audit (Section 10) is applied to risk closure in the same way it is applied to work order execution: what the risk register said was done, what was physically done, and what the CMMS records must align. A closure that does not pass this three-way check is not a closure — it is a deferral with a completed checkbox. ISO 31000:2018 Clause 6.6 (Risk Treatment) and Clause 6.7 (Monitoring and Review) mandate this verification as a continuous process requirement.
No-Fault Escalation — Silence in the Presence of Risk is a Finding
The No-Fault Escalation Model means that any worker who identifies a Type A or Type B risk condition and escalates it — regardless of seniority, regardless of the business impact — is protected from retaliation and is acting in full compliance with the RISL governance standard. The inverse is also enforced: any worker who identifies a risk condition and does not escalate it has committed a governance failure — not a cultural one. Silence in the presence of known risk is not deference. It is a breach. The Challenger Presidential Commission, the Piper Alpha Cullen Inquiry, and the INPO SOER 10-2 all document the organisational conditions that turn a worker's silence into a fatality. RISL's Ghost Audit (Section 12) specifically measures whether escalated concerns are acted upon — because an organisation that does not act on raised concerns will produce an organisation that stops raising them.
- Type A — Named Owner within 60 minutes, work stops, no exceptions
- Type B — Named Owner, resolution plan within 24 hours, automatic escalation if unresolved
- Type C — Documented in CMMS, named owner, monitored on defined frequency
- PM Deferral requires Named Approver — no anonymous deferrals permitted
- Residual Risk Assessment mandatory after every Type A and Type B closure
- Risk documented in CMMS for cross-shift visibility — not stored in a supervisor's notebook
- No-Fault Escalation — silence in the presence of known risk is a documented breach
ISO 31000:2018Type A 60-min RuleNRC Davis-BesseINPO SOER 10-2Reinforces Sec 8Sec 10Sec 12Sec 20
Shutdowns are the highest-risk periods for both budget and safety. Scope creep is the #1 budget killer. The Scope Deep Freeze locks the worklist 4 weeks before Day 1. No new work is added after freeze without the Principal's signature and a verified Risk/Benefit analysis. Hourly visibility — not daily meetings — governs execution.
Scope & Planning Controls
Scope Deep Freeze — worklist locked 4 weeks prior. No additions without Principal signature and Risk/Benefit analysis.
3-Week Rolling Look-Ahead — updated daily. Barrier identification is the primary output. Every job visible 21 days in advance.
Float Job Register — non-critical path jobs pre-approved, kitted, and staged. When a Critical Path (CP) job slips, the Float Register is the first resource — a pre-kitted float job fills the crew gap without hunting for parts or permits.
Execution & Resource Control
Manpower Allocation — crew matched by skill and certification. Overmanning is a safety risk. Undermanning is a schedule risk. Both are documented before Day 1.
Critical Path Method (Critical Path Method (CPM)) — every CP job flagged with hourly production value. 30-minute slippage triggers automatic reallocation. Delay cost accumulates in real time.
Kit Verification — all parts in Battle Box 48 hours before job start. Zero hunting at the workface.
24/48-Hour Look-Ahead — Critical Path Execution Control
48-Hour Window — Readiness Lock
All jobs scheduled for execution within 48 hours are confirmed ready — parts staged, permits identified, crew allocated, isolation requirements verified. Any job that cannot confirm readiness at the 48-hour mark is pulled from the schedule. A float job replaces it immediately. No crew arrives at the workface to discover missing materials.
24-Hour Window — Sequence Lock & Float Pull
The next day's execution sequence is locked and communicated 24 hours before start. If any critical path job is behind schedule, the Float Job Register is reviewed and resources are reallocated to support CP completion. Float jobs absorb displaced crew — no crew is ever idle, and the critical path stays protected. This decision is documented — not verbal.
- Scope Deep Freeze — 4 weeks prior, Principal signature required
- 3-Week Rolling Look-Ahead — daily update, barrier identification
- 48-Hour Readiness Lock — missing readiness = float job replacement
- 24-Hour Sequence Lock — CP slippage triggers float resource pull
- Float Job Register — pre-kitted, ready to deploy on CP slip
- Manpower Allocation — skill and certification matched before Day 1
- CPM — 30-min slippage triggers reallocation and delay logging
- Pre-Startup Safety Review (PSSR) — Go/No-Go before restart: LOTO, Tool Count, Bump Test
Scope Freeze
48-Hr Readiness
24-Hr Sequence Lock
Float Jobs
CPM
Look-Ahead
Manpower
PSSR
The DACI model governs all high-impact decisions: Driver (RISL Principal/Consultant), Approver (single client-side authority), Contributors (forensic subject-matter experts), and Informed (stakeholders). Segregation of Duties ensures no single role controls the entire asset lifecycle. Every high-consequence decision requires a Digital Evidence Pack.
- Field: decisions affecting < 4 hours production
- Management: > 4 hours or > $10,000
- Principal: legal, environmental, multi-site consequences
- Single 'A' (Approver) for every critical decision
DACIRACISoDDecision Logs
The Value Realization Delta measures the gap between the Forensic Baseline (Section 3) and the Current State. Reliability Value = (Avoided Downtime Hours) × (Hourly Production Revenue). The Sovereign Handover Protocol ensures the client can maintain the system without external support — transferring accountability, evidence, and RACI ownership formally.
- Value Delta calculated and agreed with Finance Lead
- Sustainability Scorecard identifies regression risks
- Named Successor for framework — internal to client
- ROI Briefing delivered to Owner/CEO
Value DeltaSovereign HandoverSec 22.1
A document older than 12 months is a liability. Annual Forensic Review against latest ISO 55001 and ISO 45001. Trigger-Based Updates within 30 days of any Black Swan event. Only the Principal holds Master Edit rights — all field copies are Read-Only. 5% of completed checklists pulled quarterly for deep-dive integrity audit. A "Say/Do" ratio below 90% triggers mandatory Section 19 intervention.
- Annual review against ISO 55001 + 45001
- 5% quarterly deep-dive audit of checklists
- Say/Do ratio < 90% → Sec 19 escalation
- Version control — Principal holds master edit rights
ISO 19011:2018Version ControlNDA Protected
Outcomes defined across Safety, Reliability, Financial, and Workforce dimensions. Leading indicators prioritised over lagging post-mortems. Baseline performance established and validated before intervention begins. Client leadership trained to own outcome tracking — dependency on RISL deliberately reduced over time.
- Safety: reduced high-risk exposures
- Reliability: reduced unplanned downtime
- Financial: maintenance cost stabilisation
- Workforce: improved leadership effectiveness
Diagnostic & Stabilisation (30–90 days), Embedded Execution Support (3–12 months), Turnaround / Critical Event Support (event-based), and Capability Transfer & Sustainment (phased exit). Scope, authority, and decision rights defined upfront. No parallel governance structures.
- Diagnostic & Stabilisation
- Embedded Leadership Support
- Critical Event Execution
- Capability Transfer & Exit
These are not values on a poster. They are operating principles that govern every decision, every recommendation, and every finding that RISL produces. Each one has an enforcement implication — a specific behaviour that it demands and a specific failure mode it refuses to tolerate. The difference between an organisation with good values and an organisation with good outcomes is whether the values are enforced when they are inconvenient.
01
Safety Before Schedule — Always
Schedule pressure is the single most documented precursor to catastrophic industrial failure. Texas City, Challenger, Deepwater Horizon — all share a common finding: production targets overrode safety gates. RISL treats schedule compression of safety controls as a Type A risk event, not a management decision. The stop-work authority exists precisely for this condition. It is exercised without negotiation.
02
Execution Over Theory
A recommendation that cannot survive contact with the workface is not a recommendation — it is a report. RISL validates every standard and every finding at the point of work, not from the boardroom. The procedure that looks complete on paper but has never been tested at the workface is not a procedure. It is a risk waiting to be revealed by the next incident investigation.
03
Accountability Without Blame
Blame cultures hide failures. Accountability cultures surface them. The IAEA Safety Culture framework (INSAG-15) identifies the ability to question and report without fear of reprisal as a Tier 1 safety culture indicator. RISL's No-Fault Escalation Model enforces this: when a concern is raised, the system responds with investigation — not retaliation. Named ownership of risk is the mechanism. It ensures accountability lands on a human being with authority to act, not on a process that distributes responsibility until it vanishes.
04
Discipline Beats Heroics
An organisation that depends on individual heroes to prevent failures has already failed at system design. The hero who catches the problem before it becomes an incident is concealing a systemic deficiency that will eventually produce an incident on a shift when the hero is not there. RISL replaces hero-dependency with enforced procedure — step by step as written, place-keeping confirmed, Three-Way Match verified. The procedure is the hero. The discipline that enforces it is the culture.
05
Facts Over Opinions — Measurements Over Assumptions
ASME PCC-1 does not provide a torque recommendation for guidance. It defines a legal boundary for bolted joint integrity. ISO 10816 does not suggest a vibration threshold. It defines the condition at which bearing failure becomes statistically predictable. Every physical constant in the RISL framework is sourced from a published, enforceable standard. When a measurement is taken, it is compared to that standard — and the result is a finding, not a discussion.
06
Capability Must Be Built — Not Borrowed
A client who requires RISL to return every quarter has not been served — they have been made dependent. The Sovereign Handover is built into every engagement: the point at which the client's own people own the governance system, can audit it without assistance, and can sustain it under pressure. Building that capability — through structured workforce enablement, documented knowledge transfer, and leadership coaching — is the primary deliverable. The intervention that does not produce an organisation that no longer needs the intervention is not a success.
IAEA INSAG-15ASME PCC-1ISO 10816Reinforces Sec 8Sec 9Sec 14Sec 22
Leadership presence at the workface is not a cultural nicety. It is the primary mechanism by which Normalised Deviance — the gradual drift of acceptable practice away from the written standard — is detected and stopped before it reaches incident level. The Piper Alpha inquiry (Cullen, 1990), the Davis-Besse NRC inspection (2002), and the Texas City CSB investigation (2005) each identified absence of effective supervisory presence at the point of work as a contributing factor. Leaders in offices do not see what is happening to equipment, to procedures, and to safety controls in real time. RISL measures whether your leaders are physically where the risk is.
The 60/40 Rule — Enforced Standard
Frontline leaders spend minimum 60% of every shift at the workface — not in pre-job meetings, not completing paperwork, not in the supervisor's office. The 40% covers administrative functions, safety review, and shift handover. This is not a target. It is a minimum compliance threshold. RISL verifies this through direct observation and time-on-site mapping. A leader who cannot account for 60% workface time has a finding — not a discussion point.
The INPO leadership standard for nuclear operations, which sets the benchmark for all RISL field leadership requirements, specifies that supervisory observation frequency is a direct predictor of both safety event rate and maintenance quality outcomes.
Reference: INPO 12-012 · IAEA GS-G-3.5 · REGDOC-2.1.2 Sec 5.3
Three-Tier Site Walk — Structure
The RISL Three-Tier Site Walk is not a tour. It is a structured observation with defined outputs at each level:
Tier 1 — Safety & Hygiene: Physical conditions. LOTO status. PPE compliance. Housekeeping as a predictor of procedural discipline.
Tier 2 — Technical Integrity: Procedure use at the workface. Place-keeping marks. Torque verification. Equipment condition vs. work order description.
Tier 3 — Engagement & Barrier Removal: Crew understanding of the task, the hazards, and the stop-work trigger. Active coaching. Barrier identification and escalation.
Visible Felt Leadership (VFL) — Measured as a KPI
Visible Felt Leadership is not about being seen. It is about workers observing that their leaders understand the technical work, engage with safety findings without defensiveness, and respond to concerns with action rather than acknowledgement. VFL frequency — number of structured site walks per supervisor per week — is tracked as a leading KPI in every RISL engagement. A declining VFL rate is an early warning signal for safety culture degradation, captured weeks before a lagging indicator (an incident) would reveal the same information. RISL uses the VFL metric as the primary leading indicator for Section 12 (Sustaining Discipline) assessments.
Normalisation of Deviance — The Invisible Drift
Sociologist Diane Vaughan, in her forensic analysis of the Challenger disaster, named the process by which unacceptable risk becomes accepted through repeated exposure without consequence. Every bypass tolerated, every shortcut observed and not corrected, every near-miss that does not produce a finding contributes to a cultural baseline that drifts further from the written standard. Leadership presence is the only real-time mechanism to detect and interrupt this drift. A leader at the workface who observes a bypass and says nothing has just moved the acceptable line. A leader who corrects it — visibly, immediately, and without blame — has reinforced it. This is why 60% is the minimum, not the goal.
- 60% workface time — minimum compliance threshold, not cultural aspiration
- Three-Tier Site Walk — structured observation with documented outputs
- VFL frequency tracked as a leading KPI — declining rate is an early warning
- Coaching is immediate and at the workface — not deferred to the end-of-shift review
- Leadership absence from the workface is documented as a Section 9 finding
- Normalisation of Deviance risk escalated under Section 12 (Habitual Regression)
INPO 12-012IAEA GS-G-3.5REGDOC-2.1.2Reinforces Sec 8Sec 12Sec 21 FM2
Most industrial organisations measure their maintenance and safety performance using lagging indicators — injury rates, unplanned downtime events, equipment failures, cost overruns. These are accurate records of what has already happened. They cannot prevent the next event. World-class operations, as defined by the SMRP Best Practices framework and the INPO AP-913 Equipment Reliability process, use a balanced scorecard of leading and lagging indicators — where the leading indicators are monitored daily and trigger interventions before the failure they predict becomes a documented incident.
Leading Indicators — Predict & Prevent
PM Compliance Rate — World-class target: ≥95%. Below 90% signals systematic deferral.
Emergency Work Ratio — World-class: <10% of total work. Above 25% is systemic breakdown.
Wrench Time — Direct productive time at workface. World-class: 55–65%. Industry average: 25–35%.
VFL Frequency — Site walks per supervisor per week. Declining rate = early warning.
Three-Way Match Rate — % of work orders where Plan, Execution, and CMMS record align.
Stop-Work Activation Rate — An organisation where stop-work is never used has not eliminated risk. It has suppressed the signal.
Reference: SMRP Best Practices · INPO AP-913 · ISO 55001 Cl.9.1
Lagging Indicators — Confirm & Quantify
MTBF (Mean Time Between Failures) — Reliability trajectory. Declining MTBF is a maintenance failure signal.
MTTR (Mean Time to Repair) — Execution effectiveness and parts/permit readiness proxy.
OEE (Overall Equipment Effectiveness) — Availability × Performance × Quality. World-class: ≥85%.
Repeat Failure Rate — A repair that fails within 30 days of completion is a Root Cause Analysis finding — not a maintenance request.
Total Maintenance Cost as % of RAV — World-class: 2–3% of Replacement Asset Value per year. Industry median: 4–8%.
Reference: SMRP RAV Metric · ISO 55001 Cl.6.2 · INPO AP-913
Root Cause — Systemic, Not Individual
RISL applies a three-layer Root Cause Analysis to every Tier A finding: Direct Cause (what physically failed), Contributing Cause (why the failure was not prevented), and Root Cause (what in the management system allowed the contributing cause to persist). Blaming a technician for a failure produced by a broken procedure, inadequate training, or absent supervision is not accountability — it is the suppression of a systemic finding. The CCPS Process Safety Management framework and the IAEA's post-incident investigation methodology both require this three-layer analysis as a minimum standard. Individual accountability for deliberate violations is separate from and does not replace systemic root cause investigation.
Lessons Learned — Embedded Into Standard Work, Not Filed in a Drawer
A lesson-learned that is filed as a report and reviewed at the next annual safety day is not a learning system — it is an archive. World-class learning integration, as required by INPO AP-913 and the IAEA Safety Culture framework, demands that findings from post-task reviews, near-miss investigations, and audit findings are incorporated into standard operating procedures within a defined timeframe — typically 30 days for Tier A findings, 90 days for Tier B. RISL verifies this closure rate as a Section 15 (Continuous Improvement) metric and cross-references it with Section 12 (Ghost Audit) to confirm that lessons survive beyond the session that produced them.
- Leading indicators monitored daily — not monthly in a management review
- Emergency work ratio tracked against SMRP world-class threshold (≤2% of total labour hours — SMRP Best Practices; ≤10% is the acceptable floor)
- Repeat failure within 30 days = mandatory Root Cause Analysis, not re-work
- Root cause analysis — three layers minimum: Direct / Contributing / Systemic
- Lessons embedded into SOPs within 30 days of Tier A finding closure
- Ghost Audit (Section 12) verifies that improvements survive 60 days after RISL departs
SMRP Best PracticesINPO AP-913ISO 55001CCPSReinforces Sec 4Sec 12Sec 15
New processes and safety controls erode over time without deliberate reinforcement. The "Ghost Audit" (Battle Pack 05) is the primary test: does the improvement survive 60 days after RISL leaves? Habitual Regression is the primary threat to any intervention's longevity.
- Ghost Audit — 60-day regression check
- Stability metrics embedded in reviews
- Leadership coaching cadence maintained
- PDCA cycle governs all improvement work
Is your workforce competency validated at the workface — or assumed from a training record?
"I have been doing this for 20 years" is often a mask for
"I have been doing it wrong for 20 years."
A training record proves attendance. It does not prove competency. The gap between
those two statements is where every repeat failure, every procedure violation, and
every normalised deviance event begins.
The Cost of the Competency Gap
Research published by the SMRP (Society for Maintenance and Reliability Professionals)
consistently shows that organisations operating below world class training standards
carry a skills-gap premium of 15–25% of their total maintenance labour spend
in rework, repeat failures, and extended job durations. On a site spending $5M annually
on maintenance labour, that is $750,000 to $1.25M per year in preventable waste —
invisible in the payroll system, visible only in the failure data.
The INPO (Institute of Nuclear Power Operations) AP-913 Equipment
Reliability process and the IAEA Nuclear Safety Culture framework
both identify inadequate training and competency verification as primary contributing
factors in 60–70% of significant industrial events. The training record said the
person was qualified. The workface showed otherwise.
The RISL Proficiency Scale
Competency in a RISL engagement is verified at the workface against three defined levels —
not assumed from a certificate, a training record, or years of service.
LEVEL 1
Aware
Understands the standard. Can identify compliance or deviation when observed.
Cannot execute independently. Requires direct supervision on safety-critical tasks.
LEVEL 2
Enabled
Can execute the standard independently under normal conditions.
Requires support under adverse or novel conditions. Verified by direct observation.
LEVEL 3
Sovereign
Executes under adverse conditions. Coaches others to the standard.
Identifies systemic improvements. The internal guardian of the standard.
World Class Training Standards — What They Require
ANSI/ASSE Z490.1 — Criteria for Accepted Practices in Safety, Health and Environmental Training
The American National Standard for safety training quality. Requires that training programmes
include a documented needs analysis, measurable learning objectives, verified competency
assessment (not just attendance), and a periodic evaluation of training effectiveness.
RISL aligns every training programme to Z490.1 — which means training is designed
to produce verified behaviour change, not compliance checkboxes.
Reference: ANSI/ASSE Z490.1-2016 · American Society of Safety Professionals · assp.org
INPO AP-913 — Equipment Reliability Process
The nuclear industry's gold standard for equipment reliability — used by every nuclear
operating company in North America. AP-913 requires that maintenance personnel are
trained to task-specific competency levels verified by direct observation, not just
classroom attendance. It establishes the linkage between maintenance training quality
and equipment reliability performance — a linkage that most non-nuclear industries
have never formalised. RISL applies the AP-913 competency verification standard
across all sectors.
Reference: INPO AP-913 REV 3 · Institute of Nuclear Power Operations · inpo.info
API RP 755 — Fatigue Risk Management Systems for Personnel in the Refining and Petrochemical Industries
API RP 755 establishes that fatigue is a competency impairment — and that organisations
are required to manage it as a systematic risk, not an individual responsibility.
A technician who is fatigued is not operating at the competency level their training
record implies. RISL incorporates fatigue risk assessment into the workforce readiness
check for all high-consequence tasks, consistent with RP 755 requirements.
Reference: API RP 755 · American Petroleum Institute · api.org
IAEA Safety Culture Framework — INSAG-15 & GS-G-3.5
The International Atomic Energy Agency's safety culture framework identifies
continuous learning and competency development as non-negotiable organisational
characteristics. GS-G-3.5 (The Management System for Nuclear Installations) requires
that competency is systematically managed — planned, developed, assessed, and maintained
throughout the lifecycle of every role. This is the standard RISL applies when assessing
whether a client organisation's training system is genuinely world class or merely
documented compliance.
Reference: IAEA GS-G-3.5 · INSAG-15 · International Atomic Energy Agency · iaea.org
19 Mandatory Safety Training Domains — Verified at the Workface
Every technician must be verified competent across all 19 domains before assignment to any high-consequence task.
Training record alone is not sufficient — competency is confirmed by direct observation at the workface.
- ▸ Blood Borne Pathogens
- ▸ Confined Space Entry
- ▸ Electrical Safety
- ▸ Emergency Response & Evacuation
- ▸ Environmental Compliance
- ▸ Ergonomics
- ▸ Eye Protection
- ▸ Fall Protection
- ▸ Fire Safety
- ▸ Hazard Communication (HAZCOM / SDS)
- ▸ Hearing Conservation
- ▸ Ladder Safety
- ▸ Lockout / Tagout (LOTO)
- ▸ Personal Protective Equipment (PPE)
- ▸ Process Safety Management (PSM)
- ▸ Respiratory Protection
- ▸ Rigging
- ▸ Safety Systems & Devices
- ▸ Scaffolding
- ■ ISO 45001 training matrix alignment — all 19 domains mapped to role and frequency
- ■ 70% of coaching at the workface — not the classroom
- ■ CMMS digital literacy verified — data entry accuracy is a competency requirement
- ■ Knowledge transfer plan mandatory for retiring staff — tribal knowledge is a single point of failure
- ■ Fatigue risk managed per API RP 755 on all high-consequence tasks
Continuous Improvement (CI) at RISL targets the 3Ms: Muda (waste — excessive inventory, over-maintained assets), Mura (unevenness — inconsistent technician performance), and Muri (overburden — pushing assets beyond OEM specifications, the leading cause of Infant Mortality on restart). Every Type A failure must produce a Lessons Learned session resulting in a physical change to the Master Schedule.
- Lessons Learned → Standard Operating Procedure (SOP) update within 72h
- Predictive Maintenance (PdM) data used to extend PM intervals
- Technicians rewarded for systemic gap identification
- Horizontal proliferation to identical assets
RISL bridges the gap between the maintenance shop and the CFO's office. The Total Cost of Ownership model analyses Opex vs Capex optimisation, quantifies the hidden cost of deferred maintenance (energy waste, emergency repair premiums, reduced asset life), and links every dollar spent to its impact on Asset Operational Availability and OEE.
- Maintenance spend linked to measurable ROI target
- Under-utilised assets reviewed for decommissioning
- Deferred Maintenance Backlog quantified as board-level risk
- Telemetry-driven budgeting — not flat annual allocation
Leadership in the RISL model is a verb, not a noun. The 2026 metrics move beyond Safety Hours to measure the Health of the System: VFL Frequency (documented coaching interactions), Barriers Closed (speed of leadership resolution), and Psychological Safety Score (team member willingness to surface Red Flags without fear of blame).
- VFL Frequency tracked vs. target
- Barriers Closed — speed metric
- Psychological Safety Score via DACI model
- "Honey" inquiry: "What system failure caused this?"
We audit the process, not just the result. If a job was finished on time but the Work Order History is blank — that is a Section 20 Failure. Ontario Reg. 851 (Industrial Establishments) and 2026 OSHA updates require Pre-Start Health & Safety Reviews (PHSR) for any new or modified apparatus before execution.
- Three-Way Match Audit — 5 WOs sampled randomly
- LOTO verification — physically tagged per DACI
- Backlog prioritised by risk (Sec 13) — not by noise
- PPE traceability — fit and quality verified, not just present
The most expensive thing on any industrial site is not broken equipment.
It is the silence between people who already know what is wrong.
Communication failure is not a soft skill problem. It is a financial event —
one that accumulates daily in coordination losses, rework, missed handovers,
and decisions made without the information that someone in the room already had.
What Communication Failure Costs — The Evidence
The Joint Commission (USA) analysed over 4,000 sentinel events
and found that communication failure was the root cause in 70% of cases.
While this data originates in healthcare, the Joint Commission's findings have been
replicated across high-consequence industries including nuclear, oil and gas, and aviation —
industries where the consequence of communication failure is equipment damage, production loss,
or fatality.
The US Chemical Safety Board (CSB) found communication breakdowns
as a contributing factor in the majority of major process safety incidents investigated
since 2000 — including the Texas City refinery explosion (2005) and the Macondo/Deepwater
Horizon blowout (2010). In both cases, critical information existed within the organisation.
It was not communicated to the people who needed to act on it.
IAEA INSAG-15 (Key Practical Issues in Strengthening Safety Culture) identifies
organisational communication as one of five core safety culture indicators —
alongside leadership, accountability, learning, and employee involvement.
An organisation that scores poorly on communication scores poorly on safety culture
by definition, regardless of what its safety statistics show.
The Four Communication Failure Modes RISL Measures
1 — Downward Communication Failure
Leadership decisions and standards that are stated but not verified at the workface.
The plan says one thing. The crew heard something different. Nobody checked.
Cost: rework, scope deviation, repeat instruction, extended job duration.
RISL measure: Three-Way Communication compliance rate on safety-critical instructions.
2 — Upward Communication Failure
Workface reality that never reaches leadership. The technician knows the part is wrong.
The supervisor knows the scope is not ready. Nobody says anything because the culture
makes silence safer than speaking. Cost: decisions made on false information,
board reports that do not reflect site reality.
RISL measure: Near-miss to incident ratio and Governance Uplink submission frequency.
3 — Lateral Communication Failure
Breakdowns between work groups — maintenance and operations, planning and execution,
day shift and night shift. Work brought ahead without consulting other groups.
Jobs started without confirming equipment access with operations.
Cost: re-isolation, permit cancellation, crew standby, schedule compression.
RISL measure: Coordination Gate sign-off compliance and cross-shift barrier events.
4 — Handover Communication Failure
The incoming shift begins work without complete information about the current state
of the equipment, the isolation, or the in-progress scope. The most common cause
of cross-shift incidents and the most consistently underestimated risk in
shift-based industrial operations.
RISL measure: Handover completion rate logged in Governance Uplink — incomplete
handover is a Type B risk event.
World Class Communication Standards — What They Require
IAEA-TECDOC-1329 — Safety Culture in Nuclear Installations
Defines communication as a measurable organisational characteristic — not a personality trait.
Requires that organisations establish formal communication channels at every level,
verify that information travels accurately from source to receiver, and measure the
effectiveness of communication through observable outcomes rather than self-report.
The standard explicitly identifies "normalised silence" — the cultural condition where
people stop reporting because they believe nothing will change — as a leading indicator
of safety culture degradation.
Reference: IAEA-TECDOC-1329 · International Atomic Energy Agency · iaea.org/publications
CCPS — Guidelines for Risk Based Process Safety (Communication Chapter)
The Center for Chemical Process Safety (AIChE/CCPS) identifies communication across
work shifts and between departments as a critical process safety element. The guidelines
require formal shift handover protocols, documented communication of safety-critical
information, and verification that critical information has been received and understood —
not merely transmitted. A shift handover that is verbal only, undocumented, and unverified
does not meet the CCPS standard — regardless of how long the site has been operating that way.
Reference: CCPS Guidelines for Risk Based Process Safety · AIChE · aiche.org/ccps
INPO SOER 10-2 — Fatigue and Communication in Nuclear Operations
INPO Significant Operating Experience Report 10-2 specifically addresses the
compounding failure of fatigue and communication breakdown at shift boundaries —
the highest-risk period in any continuous industrial operation.
The SOER established that verbal-only handovers, even between experienced operators,
produce measurably higher error rates than structured written-plus-verbal handovers
with Three-Way Communication verification. This finding has been validated across
refining, petrochemical, and utilities operations outside nuclear.
Reference: INPO SOER 10-2 · Institute of Nuclear Power Operations · inpo.info
REGDOC-2.1.2 — Safety Culture (CNSC) — Communication as a Measured Characteristic
The Canadian Nuclear Safety Commission's REGDOC-2.1.2 explicitly identifies open
communication as one of five measurable safety culture characteristics. It requires that
organisations demonstrate — through observable evidence, not self-assessment — that
safety-significant information flows freely across all levels and that individuals feel
safe raising concerns without fear of retaliation. The CNSC assesses communication
effectiveness during safety culture assessments by measuring near-miss reporting rates,
the frequency of unsolicited safety concerns raised by workers, and the response time
from concern raised to concern resolved.
Reference: REGDOC-2.1.2 · Canadian Nuclear Safety Commission · nuclearsafety.gc.ca
Closed-Loop Learning Architecture — The Communication of Lessons
Learning at RISL is a mechanical process, not a philosophical one. A lesson that is not
formally communicated to everyone it applies to has not been learned — it has been noted.
The distinction costs lives and money. Three stages govern the communication of every lesson:
-
Stage 1 — Forensic Capture.
Systemic root cause identified — not human error. The question is always
what process allowed this to happen, not who made the mistake.
-
Stage 2 — Standard Work Update.
Lessons Learned becomes a Standard Operating Procedure (SOP) revision within 72 hours —
not a one-time fix. Every affected procedure updated before the next shift touches
the affected equipment.
-
Stage 3 — Horizontal Proliferation.
If Asset A fails, every asset with identical specifications receives the same update
simultaneously. A lesson that travels to one asset and not its identical twins
is a lesson waiting to repeat itself on a different shift.
- ■ World class communication standard: best-in-class updated within 30 days of field observation
- ■ Supervisor performance reviews tied to learning integration — SOPs updated as a KPI
- ■ Ghost Audit verifies new knowledge is actually practised — not just documented
- ■ 72-hour response window to frontline improvement suggestions — silence is a culture signal
- ■ Near-miss to incident ratio tracked — a high ratio means the workforce is seeing and reporting
The CMMS tells the technician what to do. The CMMS records what was reported. Everything in between is human performance. These seven tools govern individual and team behaviour at the workface — where every incident originates and where every incident can be prevented. They are not training topics. They are enforced standards.
STAR
Self-Check with Verbalization — Stop · Think · Act · Review
Stop — Pause before beginning any task. Eliminate distractions. Think — Verbalize the task aloud: what am I doing, what can go wrong, what are my controls? Act — Execute the task as planned. Review — Verify the result matches the expectation. The verbalization is not optional — spoken words engage a different cognitive pathway than silent thought. RISL enforces STAR before every high-consequence task. A technician who cannot verbalize the task is not ready to execute it.
3-WAY
Effective Communication — Three-Way Communication
No instruction is complete without three exchanges. Sender states the instruction clearly. Receiver repeats back in their own words. Sender confirms or corrects. A communication that stops after the first exchange is an assumption — not a verified instruction. This applies to radio, face-to-face, and written handovers. Three-way communication is mandatory for all safety-critical instructions and all work handovers.
STOP
Stop When Unsure — Contact Your Supervisor
Uncertainty is a stop condition — not a proceed condition. When a technician is unsure about any aspect of a task — the procedure, the equipment state, the permit, the expected outcome — work stops and the supervisor is contacted before proceeding. The cost of stopping is always less than the cost of proceeding incorrectly on a high-consequence task. There is no production pressure that justifies proceeding under uncertainty. Supervisors are required to respond without blame — a technician who calls is performing correctly.
Every worker has the authority and the obligation to question any condition, instruction, or assumption that does not look right — regardless of the source. Silence in the presence of doubt is a failure. A questioning attitude is not insubordination — it is the primary defence against normalised deviance, the mechanism by which most major industrial incidents develop. RISL measures questioning attitude by the ratio of near-miss reports to incidents — a high ratio indicates a workforce that sees and reports. A low ratio indicates suppression.
P/A
Procedure Use and Adherence — Step by Step as Written
Safety-critical procedures are not guidelines — they are the written record of every engineering decision, regulatory requirement, and lessons-learned event that preceded them. They are executed step by step as written. No step is skipped. No step is assumed complete. No improvisation regardless of experience level or time pressure. The statement "I know this job" is not an authorisation to deviate from the written procedure. Deviations are formal events, documented and approved before execution — not improvised at the workface.
Each completed step in a procedure is physically marked at the workface — initialled, checked, or stamped. No step is assumed done. The physical mark is the evidence of completion. A procedure returned with no marks is a procedure that was not followed — regardless of what the technician reports to the CMMS. Place keeping is the physical enforcement of procedure adherence. It is auditable, visible, and tamper-evident. RISL verifies place-kept procedures during workface inspections.
HO
Effective Shift Handover
The shift handover is a safety-critical event — not an administrative formality. Verbal plus written. The incoming supervisor receives and confirms understanding before the outgoing supervisor leaves the site. Incomplete handover is a Type B risk event logged in the Governance Uplink. The three-way communication standard applies to all handover exchanges. An incoming shift that begins work without a complete handover is operating in a known information gap — the primary cause of repeat incidents and cross-shift coordination failures.
STAR3-Way CommsStop When UnsureQuestioning AttitudeProcedure AdherencePlace KeepingREGDOC-2.1.2HuP Framework