Can intermittent faults in operation be cured at the design stage?

5 mins read

Intermittent faults can be irritating. Often attempts to repair them will result in the phenomenon of NFF – No Fault Found – as the product fails to reproduce its failures when sent for service or repair. Such problems cease to be irritating and start to become very expensive and potentially dangerous when they start to appear in safety critical systems like aircraft or trains.

The EPSRC is tackling such problems by setting up the Through-life Engineering Services Centre at Cranfield University. The Centre was set up last year and has £5.5m of EPSCRC funding, matched by industry sponsors (BAE Systems, Bombardier, Ministry of Defence and Rolls-Royce), which secures its future until 2017 after which the goal is for it to continue on a self-funded basis. Manager of the Centre is Andy Shaw, who summed up the proposition: "Our definition of this Through-Life Engineering Services area is trying to make complex engineering things more reliable, require less maintenance, and last longer. We are looking at all aspects of doing this and the No Fault Found project is one of them. The aim is to fundamentally change the way that we design or manufacture things to achieve those ends." More specifically to the NFF project, Paul Phillips, project manager, added: "What we are looking at is what we can learn from the sort of faults that can occur in service, how they result in No Fault Found, and how this information can be captured, stored and then translated back into the design and manufacturing stage so that the electronics in the systems is increasingly immune to the effects of No Fault Found." Samir Khan is a research fellow at the Centre and one of the projects he is working on is developing a statistical model, based on results from injecting artificial intermittent fault signals to the electronics and monitoring how it performs under certain varying intermittent faults. "It allows you understand the behaviour of the system with particular faults," said Khan. "Depending on particular fault distributions you can develop degradation characteristics for a system. What we are trying to do is basically measure the amplitude and the time stance when we inject these faults. We internally create a fault model as a statistical computer model of a number of faults being generated intermittently – they have different amplitude size, they occur around different instances, the duration that they last for is different, more or less trying to replicate what happens in real life when degradation occurs in the system. Using this we can connect this to a relay switch, for example, which opens and closes at different times depending on the model, and the effects that has on the system is studied." The idea is that this experimental data, along with actual maintenance and repair data taken in the field, can be combined and this information can be used for finding links between faults and design. Ultimately, the theory goes, these sort of faults can be designed out. The initial project was scheduled to last for three years, now extended to five. However it is still the team's intention to deliver the three main objectives at the end of the initial three year period, which will be in September 2014. These goals were three fold. The first was to create a set of design rules, probably hosted online, informing designers of NFF problems and how they can be avoided at the outset. The second goal will be a more substantial publication that outlines best practice for No Fault Found. These guidelines will cover all aspects of design including the organisational and cultural issues, how to classify the problems and faults, going into the actual design of the hardware focussing on such things as built-in-test techniques, service monitoring, and interactions between systems that may not be easily captured in the design world. The third output is going to be a draft towards a system design evaluation standard and that will probably be tied up with work the Centre is doing with the British Standards Institute. "The following two years will build on the problems we find emerging from the first three years," said Shaw. No Fault Found problems can be crippling in instances where faults are critical. It is no surprise that the four key sponsors of the Centre all develop systems that must be reliable. Safety concerns apart, it is an increasingly popular model that the supplier of a major system, like fighter jets or trains, is responsible for their availability. The days could be over when suppliers produce unreliable equipment that is owned by the operator, who then has to pay to maintain it and buy expensive spares from the original supplier. Now, with the original supplier (or contractor) owning the equipment it sharpens the enthusiasm to design systems that don't fail. However, identifying these faults is not easy. Phillips continued: "What we are trying to do is make the link between the component at board level, up to the sub-system, then the system, and then up to the platform, because often a flashing red light in a aircraft will highlight a problem at platform level, but that fault can be traced right down to individual components and we are trying to understand that link. For example with the intermittent faults, you can drill down through a platform, system, board and component until you find an intermittent fault, but you have to be sure when you find a fault that it is the one that caused the red light to flash. We are trying to make sure the right links are being made." It is too early in the process to identify repeat offenders, but there are indications that problems frequently exist in connectors and in wire harnesses, which can be difficult to test for intermittent faults. So often in cases of NFF, such items would be stacked in a workshop as they are irreparable until the fault can be found, and more capital is then required to replace them. Another potential problem emerging is tin whiskers forming in lead-free solder joint, but Phillips conceded that until more work was done it was not clear if this was an actual problem or a perceived one as the military market starts to look apprehensively at 'a life after lead'. Non-compatibility between new and legacy software in long-life systems, like trains, is another issue that is emerging. "We have certain examples of problems that we have uncovered," commented Khan. "Without mentioning specific aircraft, there are certain de-icing systems on certain aircraft with electrodes that have been designed for use in minus 50 degrees. They have solenoids in the electronics which are designed to heat the electronics up to a stable temperature so that they can work, but what has been identified is that the electronics are not working in the de-icing conditions. If your aircraft cannot de-ice itself you have a problem. But what we have found out is that these heating elements are positioned just an inch or two too far away from what they are meant to be heating. So a simple design modification means that the electronics don't freeze up and are continuing to work. I think most of the design electronics issues that we have come across have been down to environmental factors." So where does Shaw anticipate that the initial programme will lead? "I would hope that we could develop a set of useful guides," he concluded, "that would enable anybody who was designing anything, to make it so that it does not break down, be more reliable, last longer, and function properly throughout all of its operating environments. If that means making a TV remote that still functions after it has been dropped in a glass of water at the expense of overdesigning all the TV remotes that are not dropped in water, well maybe that is what we need to think about it. Resources are finite - we can't carry on making and throwing away. Whereas the issues now are with the high value, complex systems, these problems will roll down through the value chain to cheaper and more disposal equipment."