Operating Around the Margins
When to soldier, and solder, on.
I live in the San Francisco Bay Area. Recently, on December 2, 2016, a flash fire in Oakland, near where I live, killed 36 people. It was national news. The setting of the tragedy was a warehouse that apparently had been “converted” to some form of low-rent live/work space, ostensibly serving as affordable housing for local artists. It further evolved into a venue for a Friday night concert, attracting over 100 people on the evening of December 2. A somewhat regular happening, judging from what I’ve read. Except something unexpected happened that night. A bad mixture of a short circuit, lots of flammable materials and too many people in close proximity. Quick conflagration. Sudden confusion and limited, unfamiliar exits, especially in dark and smoke. The rest of the world now knows. And the hand-wringing began.
Much has been written since then about the building not being permitted to the prevailing fire codes. Recriminations have gone back and forth between the obvious need for enforcing public safety in residential live/work spaces, alternating with the equally compelling plea (in the opinion of some) to maintain an adequate supply of affordable housing for penurious artists. We have heard calls to shut them all down, as well as appeals that housing authorities retain some degree of compassion for the unavoidable state of affairs of the residents. How does one strike a balance between the need for safety and the desire for affordability, if one accepts it must be struck at all?
This got me thinking. How much of our EMS and testing business owes its existence to operating within the well-defined lines, and how much of it operates outside those margins? Who determines when and where to operate (inside or outside) and what qualifies them to make that distinction in the first place, and at the appropriate time? Just what is “the appropriate time?” When should one proceed “by the book,” and when is it advisable to simply “get it done.” When should the whip be cracked and enforcement brought down, and when is it wiser to simply let things go? Finally, who cares? Anecdotal evidence suggests we have an unacknowledged, but tacitly assumed, hierarchy of providers from Tier 1 EMS companies, as well as OEMS — especially in the military/aerospace sector — who cannot deviate one jot from written specifications, for obvious reasons (human lives are often at stake). As you descend that metaphorical hierarchical ladder, from Tier 1 until Tier ?, you depart, in a certain sense, the world of specifications, and enter the underworld of getting things done. Just don’t peer too far behind the curtain….
How does that translate from the abstract to the practical in the world of board test and failure analysis? Well, for one thing, aerospace folks love reports. Until they have to pay for them. Then they just want pictures of what’s wrong, so they can take whatever corrective or preventive action is needed. So we get two tiers of customers: One who needs the 250-pager with glossies to satisfy higher-ups (your tax dollars at work); while the other simply needs a clear picture of either the offending defect of the overriding symptom, so they can fix the problem, continue shipping, and make their quarterly number (“Just give me the damn picture of what’s wrong; I’ll know what to do with it, so spare me the report”).
We are blessed with two tiers of acceptance in the ICT world as well. There are those who publish elaborate statements of work (SOWs), expecting them to be adhered to as the final criteria for acceptance. Lab coat guys go line-by-line through the SOW, adding fault injection steps to make predetermined failures into confirmed failures. If it fails, it works (note ironic twist of eyebrow). The non-SOW guys just want the fixture to work (read: pass boards), and not be an impediment to Q2’s numbers. As long as the light on the screen is green and not red, everybody’s happy. Never mind that pile of “red” boards in the corner of the MRB cage.
Same story in the flying probe world. Some want everything parametrically debugged; others haven’t a clue what that means. Still others, scarily, don’t care. Merely being tested, however imperfectly the board is being tested, carries with it a built-in Good Housekeeping Seal of Approval. As long as that “passed” stamp is on the board, it doesn’t matter how it passed or why. Plus this dirty little secret: every flying probe test system of note contains a feature whereby the test system can quickly “learn” a board, and from that learning create a raw program. Emphasis on “raw.” As in, un-debugged. No matter for some: a program is a program. Start testing, which means, in reality, start shipping. As long as the tester doesn’t beep, doesn’t squawk, doesn’t spit out one of those disagreeable pieces of paper, equanimity, and revenues, prevail. Soldier on.
The class system continues in the functional test realm. The simplest organisms use just red lights and green lights (Go/No-Go). No diagnostics: Just set the bad ones aside to be looked at, someday, out of sight. As long as the green pile far exceeds the red pile, someday more likely means “never.” Hopefully the red pile is sufficiently small that it can be hidden somewhere, out of sight and out of mind of management. That’s go/no-go. At the other end of the spectrum, at Tier 1, massively tested units emerge at the end of a battery of steps, at the rate of one. Per week. To do this requires insider knowledge, and the acquiescence to the reality that costs can only be borne by a few companies or other entities. These entities also typically have the luxury of time; they are not beholden to quarterly results. Unknowing taxpayer support is crucial.
Back to the questions posed in Paragraph 3. Is this split dimension a scandal? Does it matter that not every electronic product is tested to the “letter of the law”? Yes, it matters, but that’s why companies pay smart people lots of money to assess risks. Risk assessments, properly conducted, point to testing some avionics and flight hardware products to a level and a precision matching every conceivable scenario in the life of the product. The question is, what constitutes a risk assessment, “properly conducted”? That’s what keeps the smart people awake at night. You doubt me? Think about how life is these days at Samsung, in the risk assessment and quality assurance departments responsible for the Note 7. Not a lot of Korean gemuetlichkeit there.
Am I making an equivalence between a tragic fire and electronic manufacturing? No, I’m not. I’m simply pointing out the reality of varying grades of scrutiny, depending upon the product in question and its application. This is nothing new. It was ever thus. Tragic events sometimes illuminate other realities. Rightly considered, they help us reflect on what we do, why we do it, and whether what and how we test is enough to ensure the stuff we test works and works reliably. ISO9001 and AS9100 certifications merely mean your quality system exists, and is consistent in the eyes of a third party. Auditors do their best, yet you can still be consistently bad. Nevertheless, like the Oakland building inspectors, our safety watchdogs and ISO auditors have neither unlimited funds nor unlimited personnel to make Underwriters Laboratories a rival to the UN for scope and spread of bureaucracy. (Now there’s an interesting science fiction possibility!) The truth is we police ourselves. The fact is that common sense principles are employed every day, in all manner of products. Many of them aren’t written down. Tragedy or not, that is not about to change.
Also, never forget: It remains in the best interests of some that the differences between a thorough test regime and a lax test regime remain obscure.
Trust is essential.
Kind of like calling your own fouls in a pickup basketball game.
Caveat Emptor.