3.1 Technical Issues
TYPES OF TO BE PRECIPITATED
Based on the types of failures expected in the
product, product responses to environmental stimuli and the sensitivity to
these responses, product unique ESS profiles can be developed. Table
gives examples of typical defects that are sensitive to either thermal cycling, vibration or both. This table can be used as a guide in developing tailored ESS profiles. Care must be taken that in tailoring to one type of failure, other classes of failures are detected
LEVELS OF ASSEMBLY AT WHICH ESS SHOULD BE PERFORMED
The term piece part as used herein is defined as a monolithic integrated circuit, resistor, switch, etc., which is the lowest level of assembly. The next level of assembly is a multi-part assembly that has a defined identity-e.g., one that is given a drawing number and, usually, a name. A typical item at this level is a printed wiring assembly (PWA) or an equivalent shop replaceable unit (SRU). The top level is a system, but one person's system is another's subsystem (engine, propulsion system, air vehicle, weapon system). In reality, there is always some aggregate that is the largest entity reasonably possible to subject to ESS. In any event, there usually are several levels of assembly at which ESS can be contemplated.
It is more cost effective to do ESS at the lowest level possible and at more than one level. The choices of how many levels and which levels will involve an engineering evaluation.
The costs associated with a failure usually appear in connection with a single part or interconnection and will increase dramatically with the level of assembly. Consider the following brief list, which will vary depending upon the manufacturer, complexity of the item and how much control the manufacturer has of his process:
The above factors tend to lead management to decide to
perform ESS at lower levels of assembly. However, each step in assembly and
integration provides additional opportunities for the introduction of flaws.
Obviously, ESS at a particular level cannot uncover flaws that are not
introduced until the next level. Generally, this dilemma is usually controlled
by performing ESS at each major functioning level in the manufacturing process
consistent with an assessment of the potential defect population at each level
of assembly. Resolution of these conflicting considerations usually involves
screening at multiple (usually 2 or 3) levels of assembly. ESS at lower levels
should focus on surfacing and correcting flaws in piece parts and PWA
processing. Thus, most ESS failures at higher levels will reflect flaws
introduced later in the manufacturing sequence that are usually correctable
without tear-down to the PWA level. Table
provides a summary of the risks and results of doing ESS at various levels and functional conditions.
TYPES AND SEVERITIES OF STRESSES
A variety of environmental stresses have been
candidates for use in ESS over the years . Of these, random vibration and
thermal cycling currently are considered to be the most cost effective. Table
identifies some common types of failures and reflects whether random vibration or thermal cycling is the more likely stress to precipitate that particular failure. A failure may also be surfaced under one stress, but detected under the other. The references in Appendix B present other screening techniques which may be appropriate for some products.
Traditional ESS, consisting of temperature cycling and random vibration, may not be the most effective environment to use for certain hardware. For example, power cycling is effective in precipitating certain types of latent defects; pressure cycling may be desirable for sealed equipment; and acoustic noise may excite microelectronics structures better than structure-borne vibration. Ultimately, ESS environments must be chosen based on the types of flaws that are known or expected to exist.
In the past, fixed-frequency or swept-sine vibration were sometimes used. These practices were attributable in part to costs and physical limitations of available test equipment at the time. However, the major reason is believed to be the lack of recognition of the shortfalls of fixed frequency and swept-sine vibration in comparison with broadband random vibration.
Today, true random and quasi-random vibration are used almost exclusively for ESS. True random vibration, which is well known in the ESS community, applies all frequencies in a certain bandwidth (usually 20 to 2000 Hz) and is neither cyclic nor repetitive. Quasi-random vibration, on the other hand, is a relatively new technology using pneumatically driven vibrators which generate repetitive pulses. For screening applications, several (usually 4 to 6) of these vibrators are attached to a specially designed shaker table which is allowed to vibrate in multiple axes simultaneously. This complex motion (6 degrees of freedom vibration-3 linear axes and 3 rotational axes) is very effective in finding all types of flaws.
It is not difficult to visualize that the complex
interactions possible under random vibration can induce a wider variety of
relative motions in an assembly. As indicated by Table
, vibration is the area of stressing that normally precipitates latent assembly flaws caused by the undesired relative motion of parts, wires, structural elements, etc., as well as mechanical flaws that lead to propagating cracks.
Burn-in has been defined many ways by different agencies and companies; however, for this document it is the exposure of powered equipment to either ambient or steady elevated temperature. This technique has been used in the past with some success and needs to be considered as a possible supplement to the ESS requirement. It is of particular value where components are of high power and where heat buildup occurs over a long period. Burn-in is not a substitute for ESS.
Effective screening usually requires large, rapid
temperature changes and broadband random vibration. Such thermal cycling is
used for the detection of assembly flaws that involve installation errors or
inadequate chemical or mechanical isolation or bonding. Under rapid thermal
cycling (e.g., in solder joints), differential thermal expansion takes place
without sufficient time for stress relief, and this is a major mechanism for
precipitating latent defects to detectable failures. As indicated in Table
, some types of flaws may be precipitated to failures by either thermal cycling or random vibration. However, it is important to note that thermal cycling and random vibration are synergistic. For example, thermal cycling following random vibration sometimes leads to detection of vibration induced failures that were not apparent immediately. There have been reported cases where a very small flaw did not propagate to the point of detectability during random vibration, but advanced to the point of detectability during subsequent thermal cycling.
The combined efforts (synergism) between vibration and thermal cycling suggests that concurrent application of the two stress types may be desirable. This combined environment is in fact sometimes used in ESS, but more often is avoided because it requires more elaborate facilities. Also, concurrent application of random vibration and thermal cycling can make it difficult to determine what caused a defect so that corrective action can be taken.
If random vibration and thermal cycling are to be conducted sequentially, random vibration would usually be done first. A more effective sequence would be five minutes of random vibration prior to thermal cycling, and another five minutes of random vibration following.
Measurements During Thermal Cycling
Two approaches exist to monitoring equipment during thermal cycling. The first approach utilizes periodic measurement. In this approach, limited performance measurements are necessary prior to and at the end of ESS. These performance measurements may be made on the first and last cycle. Additional measurements may be taken at other cycles, if desired. Each measurement should be made at the hot and cold operating extremes.
The second approach calls for continuous monitoring of equipment operation during the "cold-to-hot" transition and the "hot" dwell portion of each cycle.
Measurements During Random Vibration
The strong argument for monitoring equipment during
vibration screens is that the resulting movement of a marginal component may
show up as an equipment failure only during the stress application. Otherwise,
the incipient failure will escape detection, only to show up in an operational
environment. Some of the initial work in random vibration screening indicated
a 2:1 difference in the efficiency of the screen if the equipment were powered
and monitored versus not powered. The technical risks and costs are summarized
at each level of assembly for random vibration screening.
BASELINE ESS PROFILES
The baseline profiles (Tables 3-3 and 3-4) represent
the combined agreement of the three Services on minimum levels to ensure
effectiveness. They are derived from experimental and analytical stress
screening studies plus surveys of screens used in industry. The random
vibration baseline profile given in Table 3-3 shows the
values for response levels, frequencies, axes, duration and monitoring. The
thermal cycling baseline profile given in Table 3-4
shows a range of values for the temperature extremes, the temperature rate of change and the number of cycles.
These baseline profiles for random vibration and
temperature cycling are not recommended stress levels, and should be used only
as starting points to develop unique optimum profiles for a particular
configuration. If the response levels in Tables 3-3 and 3-4
exceed the design capability of the unit and/or system, the contractor should submit appropriate rationale with supporting data to the Government for a waiver or deviation.
The most significant conclusion of ten years of random vibration screening is that the excitation must be tailored to the response experienced by the components of the unit under test. The selection of stress levels must be based on available data and structural design due to the potential for highly resonant members, as well as the existence of vibration sensitive electro-optical and electromechanical devices. To avoid potential fatigue or peak level damage due to resonances, some level reduction of the input spectrum nay be done at points of severe resonant frequencies which result in amplification of the applied stress level by a factor of 6 dB or more. These resonances would be obtained from data accumulated during development tests, or by conducting a low-level sine sweep.
Notching (but not notching out) may be permitted with government approval, but should be the exception, not the general rule. Where warranted, temporary stiffening of the unit should also be considered to prevent overstressing during the stress screen. The design agency may find that the most economic approach is a minor design change to provide permanent stiffening. Whether temporary or permanent, the stiffening should be done in a manner which achieves the desired flat response throughout the unit being screened.
The temperature cycling screens also have to be tailored to each specific equipment and are equipment unique. Differences in components, materials and heat dissipation lead to variations in the thermal stresses throughout the item.
OPTIMIZING/TAILORING OF ESS
The Environmental Stress Screening Plan should allow the manufacturer to optimize a particular profile as needed, with government approval. The flexibility to change the screens as new parts, vendors, assemblies and new or alternate materials arise is also essential for a good ESS program.
For any given part or production process, there exists a level of ESS stress that is optimal, i.e., maximizes the likelihood of flaw detection without significant degradation of the unit undergoing ESS. Determining this optimal level is normally referred to as the optimization of a profile for an individual piece of equipment.
ESS tailoring (the modification of ESS parameters to fit specific hardware), if not planned and done properly, could be a major consumer of resources. Experience with similar hardware can be helpful in setting initial tailoring levels leading to a rough approximation of optimal parameters. However, a true optimization is likely to require an extensive, carefully planned effort.
Recommended tailoring techniques are given in Sections 4 and 5 for vibration screens and thermal cycling screens, respectively. These are not the only techniques available but are recognized throughout the industry as viable approaches for developing an acceptable profile. The selection and use of one or more of these techniques is usually predicated on such things as availability of screening equipment or cost of procurement, architecture of equipment to be screened and type of manufacturing defects expected, and maturity of design and manufacturing processes. Trade-offs are needed because the payoff between "reasonably good" and "optimal" ESS parameters may not be commensurate with the costs of finding the optimal profile.
Some specific engineering considerations in determining optimal ESS stress levels and making a sound engineering decision that tends to be on the conservative side (i.e., no overstressing) are as follows:
- Differences in physical characteristics such as thermal inertia, thermal conductivity, mechanical coupling, and mechanical resonant frequencies assure that differently configured assemblies will respond differently to identical thermal and vibrational inputs.
- Stress profiles should be defined in terms of responses rather than input. A uniform level of stress may not be achieved throughout the unit, because all units are not generally internally homogeneous. The response can be specified and measured at only a few points, so it will still differ locally within differently configured assemblies.
RELATIONSHIPS OF ESS TO OTHER ACTIVITIES IN PRODUCT DEVELOPMENT AND PRODUCTION
Since the primary purpose of ESS is to precipitate latent problems associated with the manufacturing processes, its effective use is predicated on good design with quality parts. Historically, ESS results show that failures due to workmanship are approximately two thirds of the total with the other third due to bad parts and poor design.
The ESS effort is expensive initially, particularly considering the associated costs of the capital investment. Additional recurring cost factors that will add to the overall cost include the utilities, failure analysis and corrective actions that go along with the associated FRACAS program and all the labor necessary to control the ESS program.
The ESS effort will be much more cost effective if it is not loaded down with failures due to an immature design and inferior parts. On the other hand, ESS is a major cost avoidance factor in manufacturing because the production process can be optimized, resulting in:
- Less teardown
- Less troubleshooting time
- Less failure reporting and corrective action
- Less repair time
- Less inspection time
- Less reassembly time
- Improved production personnel efficiency and proficiency
- More efficient utilization of production facilities
Parts Rescreening and Quality
Poor quality piece parts play havoc with printed wiring assembly (PWA) yields, with a resultant increase in assembly rework, cost and scrappage. Current guidelines being implemented by some Services call for 100% rescreening of microcircuits and semiconductors by Original Equipment Manufacturers (OEM) at receiving inspection. This is normally continued until a quality level of less than 100 defective parts per 1,000,000 parts shipped can be demonstrated. The emphasis is on vendor process control to improve quality of parts to an acceptable level rather than OEM rescreening. For information on parts rescreening and quality, see References B.24 and B. 1-19
Test, Analyze and Fix Programs
TAAF reliability growth testing programs are used extensively by the Services to identify and correct design deficiencies on new systems while still in the engineering & manufacturing development phase. As mentioned in Reference B.1-20, ESS should precede final TAAF testing. This helps to minimize the occurrence of failures unrelated to design inadequacies. Unrelated failures tend to retard the TAAF process, lengthen its duration, and increase its total cost.
Reliability Demonstration and Production Reliability Acceptance Testing
All reliability predictions, demonstrations or tests are related to the system design and quality of parts used and do not consider workmanship or process deficiencies. Therefore, ESS is a necessary prerequisite for success in any reliability quantification based on failures and operating time. The failures that occur during ESS are not counted in subsequent reliability demonstrations but are input to a FRACAS program to prevent reoccurrence. See References B.1-9 and B.1-10.
Failure Reporting and Corrective Action System
One of the best practices in successful system development efforts is the proper implementation of a FRACAS. As defined in MIL-STD- 1629, FRACAS is a "closed-loop system for initiating reports, analyzing failures, and feeding back corrective actions into the design, manufacturing and test processes." Thus, ESS is an essential tie to the design and manufacturing processes during development and to statistical process control (SPC) of the manufacturing processes during production and depot repair.
SAMPLING VS 100% SCREENING
When an item has been in production for some time, manufacturing processes and purchased parts may have reached a steady state and be well controlled. Under these conditions, ESS will no longer be precipitating a significant number of failures. At this point, it can be argued that ESS is no longer productive and that resources could be conserved by discontinuing ESS. If it can be demonstrated that the decline in ESS failures is indeed due to improvements, and not to manufacturing changes that make the ESS conditions ineffective, suspension of 100 percent ESS may be considered. However, monitoring should be instituted to make sure that the improvements remain effective. The best way to accomplish this is to develop a sampling plan, with reversion to 100 percent ESS on evidence of loss of process control. One hundred percent ESS also should resume when processes, parts or sources are changed and after production breaks or new product introduction.
In most military contracts the production quantities are not sufficient to justify the effort necessary to go from 100% screening to a sampling procedure. See Reference B.1-12.
If there are many small and different modules in the equipment, the cost of vibration fixtures for these modules may be prohibitive, especially if each is powered and monitored. A compromise, in this case, may be to do power-off thermal cycling only at the module level and do both thermal cycling and random vibration at the next higher level.
Conversely, if some equipments or cabinets are odd shaped or have heavy cantilevered components, for example, then it may be more cost effective to do only thermal cycling at this level and do both stress screens at a lower level of assembly. It is essential that these analyses result in a cost effective program to precipitate manufacturing defects.
Even with relatively simple configurations and small module sizes, poorly designed mounting fixtures can severely distort the applied vibration spectrum and even cause unwanted failures due to structural resonances. Each vibration screen setup should ensure that the stress excitation is evenly applied to the product throughout the spectrum. Enough problems are encountered within the product without confounding the issue with resonances in the fixture. For example, fixture resonances and cost were countered in one program by suspending the product on "bungee" cords and using tri-axial excitation applied at the corner of the product.
Many temporary schemes can be used to damp excessive resonances within the product. These schemes include clamping, strapping or supporting the resonating area only for the duration of the vibration screen. Usually the amount of damping can be adjusted to obtain the desired responses.
PERFORMANCE MONITORING AND POWERED VERSUS UNPOWERED CONSIDERATIONS
In developing a screening program, an important consideration is whether the product should be powered or unpowered, monitored or unmonitored. Unless they are the end items, PWAs are usually unpowered because they aren't used as stand-alone items in the operational environment. In addition, appropriate screen equipment is usually not available to functionally monitor PWAs during the screening process. On the other hand, units and systems should be powered and monitored because they usually function as standalone items and appropriate test equipment is usually available to functionally monitor them.
During Thermal Screening
During thermal stress screening, whether performance monitoring should be required and/or when power should be applied are primarily determined by two factors:
- Without performance monitoring, intermittent failures may go undetected (this is an argument for performance monitoring with power applied).
- With power applied, the parts may not be able to be cycled over a large temperature range without overstressing some parts (this is an argument for unpowered equipment).
The availability of electrical test equipment is traditionally limited, and conflicts are generated between screening and bench test operations . In addition, schedules may be affected by the need to move and set up test equipment at each different location. If all of the failures that occurred were "hard failures" (i.e., failures that stay failed once they occur), performance monitoring might not be necessary. Unfortunately, many failures that occur in electronic hardware are intermittent failures and only occur while thermal stress is being applied.
Performance monitoring should be done at the lowest temperature limit and at the upper temperature limit of each thermal cycle. Monitoring at these temperature limits will detect intermittent defects that would not show up at room temperature. Power need not be applied during the entire thermal screen. Rather, it can be turned off during the cooling portion of the thermal cycle until temperature has stabilized at the low temperature. It is desirable to monitor performance while power is applied during the cold to hot ramp. The degree of monitoring needs careful study regarding cost effectiveness. Any attempt to monitor intermittent shutdowns for as short a period as 2 to 3 milliseconds may be very expensive.
During Random Vibration
Industry has developed the following information about power on/power off random vibration screening:
- POWER OFF is of some value. When power is not applied, approximately 50% of the defects are not precipitated to failure and all of the intermittent failures are not identified.
- POWER ON, OUTPUTS INACTIVE is of greater value. When power is on, but the hardware is not operating, about 70% - 80% of the defects are stimulated to failure.
- POWER ON, OUTPUTS FULLY ON is of most value in that all latent and intermittent defects are stimulated if there is an effectively designed random vibration screen. However, all random vibration defects won't be precipitated since the random vibration screen is of limited duration
CHAMBER AIR FLOW CHARACTERISTICS
When any item is subjected to thermal cycling, the temperature of the item lags that of the chamber air because of thermal inertia and imperfect heat transfer. The thermal lag, i.e., the difference between the chamber air and hardware temperatures, increases with increasing equipment thermal inertia and with decreasing air speed. The thermal lag is greatest for heavy assemblies and for low speed air cycled at high rates of temperature change.
If the chamber air temperature rate of change is too high, the dwell time too short, and/or the chamber air too slow, the part temperatures will not attain the chamber air temperature extremes, resulting in a less effective screen.
In thermal stress screening, the rate of change of temperature is as important as the temperature extremes . The faster the rate of change, the more effective the temperature stress screen. But it is the individual components that must experience a particular rate of change of temperature and temperature extremes. To attain the appropriate temperature rate of change and temperature extremes of the item being screened, there are several things that the screen designer may be able to do:
- Allow the ESS chamber to "overshoot" the temperature parameters. Overshooting is a method of achieving an increased temperature rate of change and higher/lower temperature extremes when the chamber air temperature exceeds the upper and lower screening temperature limits for a controlled period of time. Controlled overshooting is permissible and encouraged as an excellent method of achieving higher temperature rates of change, thereby increasing screen effectiveness. To avoid overstress at temperature extremes, the temperature of (or immediately adjacent to) the part with the smallest thermal mass should be monitored with thermocouples, if practical.
- If practical, remove the protective covers of the equipment, thus allowing the chamber air flow to more easily reach the individual components.
- Install an air circulating system. In many units, the electronic parts are densely packaged, thus increasing the thermal mass of the unit. As the thermal mass increases, the air flow becomes more restricted. To compensate for this, an air circulating system (e.g., a fan) can be installed to direct the air to the areas of the unit with the highest thermal mass, thus causing the components to experience a much greater temperature rate of change.
- PWAs and subassemblies which are not conformally coated may suffer damage or intermittent operation due to condensation in the chamber. Consideration should be given to using an air drying system or some other means of minimizing this condensation.
Repeated application of screens after correction of ESS flaws can very easily begin to use up significant useful life and to initiate rather than precipitate flaws. To avoid such counter-productive screening, the following guidelines are recommended:
- After repair of failure during first operating vibration screen, complete remaining duration of screen, or five minutes, whichever is greater.
- After repair of failure during first non-operating vibration screen, repeat screen as a confidence check at full level and 50% duration.
Do not exceed five vibration exposures.
- After subsequent repairs and/or modifications, repeat original screen at -3 dB level (70% gRMS) for 50% duration.
- If failure is detected and repaired during the initial thermal cycling screen, the balance of the cycles scheduled, or a minimum of three, should be run.
- After subsequent repairs and/or modifications, run
one complete thermal cycling screen.
The guidelines above should be used in conjunction with alerting Government and contractor program managers and an assessment of the appropriate amount of rescreening which takes into account the nature of the repair/modification, the amount of teardown, rework and reassembly involved, and the chance for
introducing workmanship flaws. Such assessments are appropriately made through Corrective Action Board/Failure Review Board actions.