Beyond Root Causes

Full disclosure, I have been writing this since 2021.

I have had some time to think lately. Trying to figure out what went wrong with what I was working. I made the mistake of going down a rabbit hole on decisions and how they are made. More importantly on how we think and reason. That really ended up messing me up. It made me think about how our brains works and how our bias coupled with first impressions of a problem can color what we are working on and our perceptions. Man did that really do a number on me.

So check this. The bias that we have in analyzing an incident is not contained to our investigation. Our biases as humans probably started the whole thing. They sway the investigation. They shape the responses we get. When we look at the system and analyze it to determine what is wrong, the more we handle the situation, the likely we are introducing a Heisenberg Uncertainty like principle to what we are trying to understand. For those that forgot or never cared, the Heisenberg Uncertainty principle is where when you study a particle you end up influencing or reducing the certainty of what you are observing inadvertently. It seems easy to apply that same principle to what we are trying to solve or understand when investigating a deviation or malfunction of a system. This is absolutely nothing new but it is new to me in my application of thinking to a systemic process.

What I am going to attempt to do in this is to break out each stage of what is happening. I will attempt to identify where bias comes in and where I think our bias based Heisenberg principle comes into play and how. It may not be complete but it should hopefully give you a flavor of what to be aware of.

**Incident**

Like I said previously. The bias already happened before you were made aware of the incident. It will continue throughout the incident unless personnel are removed from the investigation process and only data is used. The fear of removing people from incident review (or investigation process) is that it may not give you the complete picture. I think it starts with establishing a culture at the site that removing potential errors is more important than blame. This is not to say that accountability would not be an issue here. But based on the severity and the history of the person involved I could only see a few extreme circumstances where HR would allow a person to transition out to another position or out of the company given due time to find other employment. Removal of them from work product as a liability and trust of the person in question would have to be resolved but that is not out of the realm of impossibility. Especially if you put forth that the credibility of the company is much more expensive to the public trust.

_Common Perception_

Immediately when the incident is observed/discovered, we are dealing with competing perceptions. In my thinking I'm using the term _common perception_ as what the average/median observer would portray as fact. This does not necessarily mean that bias has been minimized or that the incident is completely characterized but more to the point that the main facts that describe the incident can be agreed upon. In the capturing of the incident you have to think about who has the potential (higher likelihood) to have the common perception. The outside world would be a bystandard witnessing a crime or historical event. Depending on the person's frame of mind and attention as well as duration of the event what you will receive in description of the event are broad strokes and general outcomes. So how do we deal with this? What is the parallel in a regulated industry doing cGMP work? Everyone is supposed to be trained. Everybody has the experience in the field. Is it the supervisor? Is it a person on the line doing other work? Is it security cameras on the line?

There is a need to look at your data and grade it for its resolution and reliability. What is the foundation of the data? Is it memory which is flawed? Is it an electronic system that cannot be tampered with? Integrity and humans are fallible, just because it is a person with 20+ years of experience and service cannot keep you from evaluating their perceptions or their influence on the issue.

Bias throughout RCA

Each one of the questions above regarding the framework of this concept shows that bias needs to be taken into account in each aspect of not just the root cause analysis but to determine the substantive ways it can impact understanding the incident. To take a step backwards it may not be directly bias but a compendium of both biases, stressors, and frame of mind to provide a general context from each perspective of everyone involved including the investigators, approvers, management and department heads which are affected.

What I mean by that is there are those incidents that are true mistakes and they are waiting to be discovered. And there are those systemic issues that are due to culture of operations. This does not mean that our issues as humans did not come into play. But imagine an employee that is made several of those mistakes in a short period of time. The incident itself will certainly be a stressor in the identification of the incident in the information to be evaluated in its appropriate context. We are looking for trends too late in the investigation.

We need to be honest about the culture from a quality and operations perspective. Many firms will accept mistakes and further systemic issues as "known" for the sake of timeline. This is of course antithetical to quality and cGMP but it is of course present in the industry. Further there are those "acceptable" issues that are unspoken.

But this is already known by those who have investigated a "sensitive" issue which is currently on managements radar or which can impact timeline to a critical project. So how do we set that culture? It has to be from the top down. It has to start from the inception of the project. The inherent risk needs to be understood not just from the technical perspective but the execution by a trained operator during the establishment of the process. We should apply known stressors or common themes of the culture as a failure mode.

The culture needs to state that the status quo is unacceptable and that our measure for ourselves is continuous improvement. But the improvement needs to be systemic and meaningful to where the simple mistakes are the norms and the systemic issues are the exception. When in early stages that is an issue, but it does not need to be. We should set an example of ascendancy of conformance as the process is being developed.

From an incident capture perspective we need to have a plan as well as technique to gather this information.

So regarding the incident I think as investigators we need to establish a baseline for our investigative process understanding that just because you are informed of a bias it does not remove the bias. So what can we do?

We need other eyes on critical or major investigations of repeat issues or where we failed to determine root cause.

Post mortem/case studies of the investigations by a board or committee that understand the biases above and have received training in human error as well as investigation. These people have to have executed investigations and have a high level of integrity to drive both continuous improvement and be fair to those involved. If not, maybe we need to anonymize items for review.

Establish procedural requirements for hierarchy of control. Where there are escalations of approvals and budget to ensure that critical items and their failure modes are removed or detection systems increased. Do not allow by procedure, to do nothing for a repeat deviation or similarly to only train and update a record. Consider Kaizen events, scrums, tiger teams, etc.

Have cost metrics reported to operations based on the total cost of quality to drive the business case of quality. Don't just look at downtime but also against the cost avoidance of compliance remediations.

Have outward facing metrics that lay bare the success or lack thereof any corrections. Have a report card on your effectiveness checks. These should be managed objectively by a group of people that live with the system and rated on a neutral scale.

Use those bright big words on the vision, mission and guiding principles on the wall against management. Be ruthless about it. It is not insubordinate if it is based in data. Have them put their names on the line as to why we can't fix what is obvious.

Report repeat issues that require capital as compliance issues by establishing both the compliance risk AND the impact to operations. Would you rather spend $25K on a piece of equipment or not test and release a batch worth $1M to the bottom line.

Hold management accountable for resources through town halls, employee surveys and use of the ombudsman.

Tell your C-suite. When they ask for reasons or what is needed, tell them. You may be fighting against middle management but I guarantee you more often than not, the C-suite are trying to mine ideas from all levels. If they don't want to do something about it, then it is their responsibility. Make the issue easy to understand and the solution even easier. Think elevator pitch.

Advocate for all solutions even if they are not yours and they only address part of the problem. Find allies in all places at all times. It may open up you to solutions you have not thought of.

You are not stuck unless you want to be. We jail our minds in the prisons of our own creation. There is always a door if you look hard enough. Even if that means leaving and letting them know why. Tell them that unfunded mandates and lack of support as expected by management responsibility requirements are not being fulfilled and you are not expecting the culture to change. Be prepared to show them receipts.

We are here for the patients, we are here for the operators. Be their advocates. It is your job.

Beyond Root Causes

Search Posts

Featured Posts