Test Economics 101


The Economics of Diagnostics and Repair in Development, Production and Support

Duane Lowenstein
Agilent Technologies
Test Process Analyst
Andover, MA 01810
duane_lowenstein@agilent.com

Abstract—intrinsically everyone knows that the diagnostics and repair processes in any operation are a large area of inefficiencies for labor, material, time and other resources, ultimately leading to higher costs in some way, whether higher cost products, delayed deliveries, increase spares or other attributes. But with all this understanding, programs continue to short change investments in diagnostic and repair development. This paper will explore different economic models for the diagnostics and repair processes in Development, Production and Support. It will include a look at the tradeoffs with elements like prognostics, built in self-test and automated diagnostics.

 

I. INTRODUCTION

Perfect parts, perfect design, perfect assembly, perfect test, perfect diagnosis and perfect repair equals perfect products. In a perfect world this would be well, perfect. In fact, it could be even more perfect because with perfect parts, perfect design and perfect assembly there is no need for test, diagnosis or repair. That would mean less expensive products, predictable manufacturing times and no customer returns. Wow, life sure would be easy.

Time to wake up, we don't have perfect parts, or designs or assembly so we are stuck with test and thus diagnostics and repair. So with that reality, what can we do to minimize the repair loop effects and maximize its value? To understand how to take advantage of this necessary evil, as most people describe test, you must understand the process. Although it may be obvious, products get tested, some fail, some pass. The passing product goes onto the next process, failing products go onto a rework loop, hopefully with some sort of data (paper or electronic) explaining, which tests it failed and passed. In the rework loop, the failed product is diagnosed with the help of data and diagnostician expertise, then repaired and retested.. If it passes, it goes forward; if it fails it goes through the rework loop again. This process continues until all the products pass or in some instances when the rework loop costs are greater than the product, the product gets scrapped. To complicate this process further, because failed product needs to be retested, it's not unusual for additional test capacity to be needed on the production line. Figure 1. is a pictorial of the above described process.

Figure 1. Simplified Test and Rework Flow

Understanding the basics of this whole process allows for the building of a simple model and thus the exploration of different alternatives to reduce the overall cost of diagnostics and repair.


II. ECONOMICS OF A TEST REWORK LOOP

To understand the economics of the test and repair process, or rework loop, you first must define all the cost elements that make up the total cost. The following list is the most common financial drivers. Although there may be more, these elements make up more than 95% of total cost;

Even with the knowledge of all the financial drivers, their individual impact on the total cost is dependent on three rework loop metrics;

Product yield will determine what percentage of the products will be entering the rework loop during any given time. This will impact virtually all of the elements listed above. Just as important this yield will establish the amount of diagnostic, repair and test resources that will be needed to ensure a consistent flow. A low yield product can contribute to the doubling or tripling of the cost of test for a product.

There are three main contributors to product yield: design, workmanship and material. Any or all of these contribute to poor yield. The fact is that design has the largest impact. This is because poor design can lead to poor material selection and poor assembly properties. Although design is not the sole contributor to

Once a product fails and goes into the rework loop, diagnostic effectiveness is the largest variable in the process from both the effort and time needed for a successful repair. Diagnostics effectiveness is all about how fast and accurately the root cause of the failure can be identified. There tends to be three major diagnostic processes most commonly used. The first we will call the expert. This is usually a person in the production group that has a great knowledge of both the product and processes of the production facility. Their intimate knowledge of the product allows them to take the information from the failing tests and their knowledge of previous failures, and determine a root cause. The expert is extremely flexible and can consume new information very easily. The problem is that replicating the expert's knowledge is extremely hard; therefore when they are not around the diagnostic process it becomes a large variable both in effectiveness in getting to a root cause and time it takes to get to the root cause.

The second diagnostic process that is used is some sort of a probabilistic / deterministic program or decision tree. These methods tend to use information from the design to build a model and predict the cause of a failure from information from passing and/or failing tests in production. The two major groups of these diagnostics in electronic manufacturing are Failure Modes and Effects Analysis (FMEA) and Bayesian Analysis. Although both of these methods fundamentally use a different strategy to predict a root cause, they both do take effort in the R&D/development stage to develop. If implemented properly they cannot only reduce the time to diagnose a failure down to milliseconds, but they can also increase accuracy over the expert. The down fall to these methods is that many of the models, fundamental to the accuracy of the diagnosis, can be flawed because of the lack of detail in engineering to understand all the intricacies needed to get to a root cause.

The last major diagnostic process used is a historical data collection process. This process uses information from past and current failures and their root causes to build a database. This database takes the data from the failing tests with the diagnostic to get to the root cause. The database can become more accurate as more and more data is put into the database. Eventually with enough inputs, if the diagnostics are correct to the true root cause, every permutation of failure can be logged and thus used to determine any failure. If done correctly, this type of diagnostic system, in the long run, can be the most accurate and effective. Of course the down fall is that one would need a significant number of failures with accurate root cause analysis to populate the database before it would become useful.

The last financial driver of the repair loop is the repair efficiency. This is the least complicated but is dependent on both product yield and diagnostic effectiveness. Yield and diagnostics will determine the number of times a product will be repaired. If perfect diagnostics could be achieved then the repair efficiency could be 100%, and thus the repair effort would be driven by product yield. Since the diagnostic accuracy is far from perfect, the repair efficiency is hampered in three areas due to misdiagnosis. The most evident one is repairing the wrong root cause. That is replacing a component or fixing a solder joint or repairing something else that really was not the root cause. This effort adds no value because the product will be retested, diagnosed again and then repaired for a second time.

When a repair is done on a product and the diagnostic point is the wrong component, ultimately a good component is replaced. This of course means money, resources and material are wasted. There are many estimates of how many components are replaced unnecessarily in repairs, including up to 50% on lower volume lines. In some instances companies have put in economic rules to suggest to repair technicians that when deciding to replace one of two or more parts, always start with the least expensive.

The last impact of repair is the repair process itself. With the density and size of electronics, it comes at no surprise that every time a person repairs a product they have a significant chance of causing a secondary failure during the repair. This can be as simple as mishandling the product or damaging an adjacent solder joint. No matter what the reason is for the damage, every repair done increases the occurrence of another failure.

With all the drivers and variables that impact the repair process it becomes very hard to understand what has the greatest impact on the economics of this process. Simply put they all do. That is each of them, yield, diagnostics and repair can drastically reduce the cost of this process. But, one, diagnostic effectiveness, ultimately can drive the other two for the maximum cost avoidance.

 

III. DIAGNOSTIC EFFICIENCY'S IMPACT ON YEILD AND REPAIR

A study conducted in 2002 by Agilent Technologies looked at a commercial manufacture of electronics. The study looked at many of the drivers and metrics listed above and broke down the process of the repair rework process. Specifically they looked at a production line that had an annual volume of 25,000 products a year with a material and labor cost of approximately $1,500 each. Each time a product failed it took about 30 minutes to diagnose and repair. About 50% of the time an additional 30 minutes of a "super" technician's time would be needed to do additional diagnostics, and about 40% of the time an engineer would have to spend 90 minutes solving or approving a repair. Figure 2. shows the results of the study [1] and associated cost avoidance from increased diagnostic efficiency.


Figure 2. Impact of Diagnostic Efficiency on Rework Costs

The above example highlights the two major cost drivers of diagnostics, time to diagnosis and effectiveness of the diagnosis. The reason time to diagnosis is so important is that one of the largest costs of rework is labor. The largest component of labor in rework is the person diagnosing the failure. If diagnostics were instantaneous approximately 25-30% of the cost would be removed from the rework process.

In a similar vein if, as shown in the example above, diagnostic effectiveness increases, cost drops. But there is a secondary effect that takes place. If your diagnostic effectiveness is only 50%, then 50% of the time the product goes through the rework loop again, and because effectiveness is only 50%, that would equate to 25% of the products going through the rework process a third time. And so on and so on. Therefore, if a production line produced 100 products per week with a yield of 80%, and a diagnostic efficiency of 50%, the rework loop would diagnose and repair and retest approximately 39 units, or the equivalent of building 139 products with 100% yield. The fact is that diagnostic efficiency usually does go up during the second, third and additional times it goes through the process.

Combing these two drivers, time to diagnosis and effectiveness of the diagnosis, what could the economic impact be in a rework process? Figure 3. shows a typical rework process of a low volume high complexity product. A production line with a monthly capacity of 100 units, test time of 30 minutes, yield of 80%, diagnostic time of 2 hours, a repair time of 30 minutes and a varying diagnostic effectiveness depending on the rework loop. Additionally, included in this example is a process rule that states if the product has been repaired more than 3 times it is scraped. Many companies adhere to a rule similar to this to prevent financial losses from excess repairs, which can damage the product beyond repair and/or the cost of rework being more than the cost of the value of the product. In this example the total cost of the repair process is 150 hours and the cost of a scrapped board.

Figure 3. Typical Rework Process

Figure 4. shows that by enhancing the rework process with an automated diagnostic capability and increasing the diagnostic effectiveness, the total cost of the repair process is 76 hours with no scrap. That is almost a 50% cost reduction in labor alone for the rework process not including the cost of material, assets, material handling and several of the other costs listed above.

Figure 4. Enhanced Rework Process

The example shown above is not unusual. These types of rework dynamics are typical of many high mix, low volume manufacturers. Understanding these dynamics shows the importance of implementation and investment better diagnostic processes for the rework loop.


IV. INVESTMENT IN DIAGNOSTICS

In the GAO report [2] on total cost of ownership for ship's crew and reduction of total ownership cost, one of the findings in the report was that the decisions made early on in the development process drives the majority of the costs in the procurement, operations and support life cycle phases. Figure 1. shows the breakdown, in percentages of cost, in each of these phases. Recommendations from this report included establishing a process to facilitate the adoption of labor-savings technologies and best practices across the whole Navy system early in the development process to focus on the goal of overall operation and support cost reduction.

Figure 5. Total Ownership Cost Breakdown

With this study, there is a strong argument for developing diagnostic programs in the development stage that save both time and money in the operational and support stages of a product's life cycle. Using the example in Figure 4 and 5 above, if it took a development engineer a year to develop a set of algorithms or build a decision tree to be able to diagnose the product in a minute or two in a production re-work loop, the breakeven time for this effort would be about 2 years of production. Not only would this save money in production but, also have large pay back in support for the product.

This idea is not foreign to most companies and many have started to invest in these efforts. Many techniques have been used including many third party programs that help designers build their diagnostic routines, while designing the products. Others have used built in self test (BIST) to develop algorithms to quickly isolate failures down to a card level for large complicated products. In many cases designers are learning from these new techniques to actually design for diagnostics and repair while designing the product. All of these efforts have shown great results and return on investment for the associated repair and rework loops.

One development that has come from this work is that test coverage is directly related to the accuracy of any diagnostic. Very simply put, if something is not tested, when it fails, it cannot be detected and thus cannot be diagnosed. Therefore, design for test and design for diagnostic and repair need to all be done in sync and early in the development cycle.

But what about the products that are already designed and are in production or support, does the investment make sense to develop better diagnostics? Agilent Technologies asked that question to their support organization. What would the savings be to develop a more robust diagnostic program for products already in production, coming back from their customers for repair? Using commercially available diagnostic software, engineers developed the programs to automate the diagnostic for the available test strategy that was originally put in place for the products. Figure 6. shows the results [3].

Figure 6. Comparision of traditional and autmated diagnostics in support

In a short amount of time the results were evident; by investing in automated diagnostic the average time to fix a product went down by over 30%. Just as important, the accuracy of getting to the root cause went up to 91% on the first diagnostic and the need for replacing components went down by 26%. These types of results prove not only that you retrofit older products with better diagnostic processes, but there is also both financial and customer satisfaction benefits.

The next step in this evolution of diagnostics will be the continued work in the study and implementation of prognostics. The ability to predict failures will fundamentally change the way diagnostics and repair will be done in the future. While prognostics will be able to predict failures, the need to have effective diagnostics will continue to be key to the ability to drastically reduce the cost of diagnostics and repair.

 

V. SUMMARY

As electronics continue to become more intricate and the products they use have greater features and capabilities, it's inevitable our capability to fix them will become increasingly more complex. Without taking a proactive stance by developing diagnostics either earlier in the design or developing prognostic programs to predict failures, the cost of failed product will sky rocket. As shown in this paper, although there are a slew of elements that drive the cost of repairing a product, the main drivers are yield, diagnostics and repair, with diagnostics having the greatest controllable cost.

Although it is evident that the next frontier in this effort will be the development of prognostics, there is still huge economic and financial gains that can be achieved with today's tools to develop better diagnostics. The largest barrier standing in the way is the understanding of investing the time in building these diagnostic tools while developing the product will have a payback many times greater than either delaying them or using traditional methods. As the saying goes, an ounce of prevention is worth a pound of cure.

 

REFERENCES

[1] David Menzer, Agilent Technologies, "An Application of Bayesian Reasoning to Improve Functional Test Diagnostic Effectiveness", IEEE AUTOTESTCON, 2002
[2] GAO, "Navy Actions Needed to Optimize Ship
Crew Size and Reduce Total Ownership Costs", GAO-03-520, June 2003
[3] Agilent Technologies, WWCSS Fault Detective study, January 2010