# Design for Reliability Samiha Mourad\* and Hideo Fujiwara\*\* Santa Clara University, Santa Clara, California, USA smourad@scu.edu \*\* Department of Information Processing Graduate School of Information Science Nara Institute of Science and Technology, Nara 630-0101, JAPAN fujiwara@is.naist.jp Abstract – This paper explores an important aspect of VLSI design: how can the design processes assure that a product is reliable. While reliability is closely related to yield, we will show that it is not sufficient to improve yield to assure high reliability. This is the first stage in exploring the topic and needs further study to reach some practical approach for design for reliability. In this paper we explore a methodology to guide the engineer's design choices toward an optimal implementation of reliable VLSI design. Keywords - Reliability, Testing, IC Design, Yield, Co-synthesis, nanoelectronics #### I. INTRODUCTION Traditionally, the word "reliability" in electronic design alludes to IC fabrication and the special steps taken during the physical design process to minimize process defects and to increase the yield. Many studies have been conducted to improve yield, but without evidence of improved reliability. How to improve the reliability is a fundamental question that needs to be answered particularly in the context of present nano-size technology features. In addition, the response to this question should not be addressed only at the backend of the design but also at the front end. As the technology, feature size, becomes of the order of few nanometers, more numerous and newer defects are encountered. In order to increase the reliability of the product it is important to consider the different steps taken during the whole design process: from concept to fabrication including the management. In this presentation, an attempt is made to provide a framework to assess and improve reliability from the early stage of the design to the completion of the product. In the rest of this paper we will first give the motivation for raising the issue in Section II, then in Section III we will contrast yield and reliability. In Section IV we define reliability defects and in Section V we relate reliability to different phases of design. The last section of the paper explores the scope of Design for Reliability. # II. MOTIVATION Electronic Products are globally ubiquitous. The electronic market is driven by consumers who are obsessed with newer gadgets and higher-speed products at lower cost. Most of these products include many functions that are not necessary and often used by consumers. For example, a picture and web phone or a has become actually an entertainment center. In order to meet this growth, the industry is struggling with the challenge of designing nanoelectronics products whose lifetime is shorter than the time to market on one hand, and, the continual rise in awareness for reliability and sustainability on the other. To optimize cost and profitability, many efforts are taken which concentrate on improving the yield. # III. YIELD AND RELIABILTY Yield is the ratio of the number of known good chips to the total number of chips at the beginning of production. It is the most important index of IC manufacturing and increasing the yield makes good business sense. Although many yield models are used, the value of this statistical parameter is a function of different stages through which the product passes as illustrated by Fig. 1. The yield may be improved with higher precision fabrication, packaging, and increased fault coverage. Usually it is also improved by burn-in process before shipping. Figure 1. Stages of yield calculation (Kuo 1999) Reliability, on the other hand, is the ability of a system or component to perform its required functions under stated conditions for a specified period of time [IEEE'90]. It is given by: $$R = e^{-t/m} = e^{-\lambda t}$$ (1) where, $R$ is the Reliability (Probability of success), $t$ is the Mission time, $m$ is the Mean Time Between Failure (MTBF), and $\lambda$ is the Failure Rate (1/MTBF). The Failure Rate is a function of time and varies through the lifetime of the product as illustrated with the well-known bathtub curve. At the beginning of the product life, the reliability can be expressed as function of the defect level, $D_L$ , that is, the number of chips that are not defective: $R = l - D_L$ . There are different ways of the expressing $D_L$ in terms of the yield and the fault coverage, T, among these: $$R = 1 - D_L = Y^{T-1}$$ [Kim 1998] (2) Reliability is a function of quality and time while Yield is a function of quality at a certain time. A high yield does not guarantee reliable ICs as yield may be low but reliability can be high as indicated in Fig. 2. Scattered evidence showing correlation between yield and reliability is available [Kim 1998], [Jensen 1999] and [Huston 1992]. Nevertheless, such evidence does not constitute a proof. Correlation simply indicates possibility. Some evidence from IBM are given in [Melan 1982], [Jensen 1988] and [Stapper 1982]. If the testing process is thorough, both the yield and the reliability are affected by the fault coverage as indicated by the last expression, Eq. 2. This explains the emphasis on testing and design for testability particularly in the 1980s: Full Scan design and BIST for RAMs and logic circuits. Testing uncovers mostly manufacturing defects and incorrect design. However, good testability is not a guarantee for product endurance unless the fault coverage includes reliability defects. #### IV. RELIABILITY DEFECTS While industry is placing its emphasis on catastrophic defects which are usually of large sizes or large clusters, defects undermining reliability are relatively small. The industry is faced with a fab process that in constantly changing while the design process remains stable. This is a wrong assumption since the reliability is affected by non-catastrophic defects. A very common distribution of defects sizes is shown in Fig. 3. For example, gate oxide defects will results in yield loss or reliability failure depending on the size of the defects as illustrated in Fig. 4. The relation between yield and reliability defects has been expressed as $$R = Y^{k}$$ $$R = (Y/M)^{\alpha}$$ (6), (7) where k = Ar/Ay [Huston 1995) and $\alpha = Dr/Dy$ and M a clustering effect [Kuper 1996]. A and D are the areas and density of reliability and yield. Reliability defects are small in size. They are activated by the use of the product, (utilization) examples: thin oxide including leakage, large current densities, overheating tight pitch of interconnects. Design process is unable to cope with the constant scaling down of the technology. Weak interaction exists between logic and physical synthesis. Figure 3. Distribution of Defects All these demands create a problematic situation because of the characteristics of present ICs, such as the newest communication and networking ICs, which are DSP based, using mixed signal (digital and analog); or for system on a chip (SOC) paradigm where there is embedded software, control, data path, DRAMs, etc. making the system larger and more complex. Fig. 4. Type of failure according to defect size. ### V. RELATING RELIABILITY TO DESIGN The failure rate is given by $\lambda = \lambda_0 \pi_T \pi_E \pi_Q \pi_F \pi_M$ (3) where, the multiplier $\pi$ is a factor that reflects the technology, the environment E, the quality of the process Q, and the type of circuit F. Originally, the Circuit Type Multiplier was held constant, but subsequent findings redefine it in term of a more specific type of circuit such as PLA or a RAM: $$\pi_F = \pi_L \pi_{RAM} \cdots \pi_{PLD} \qquad [Stapper 1995] \qquad (4)$$ These multipliers underestimates the elaborate design cycle encountered in modern products. The product's reliability depends also on the Design Cycle stages, the CAD tools used and the management of the project. We propose a multiplier that includes the different design processes, logic synthesis, physical design, and testing $$\pi_F = \pi_{PD} \pi_{LD} \cdots \pi_T \tag{5}$$ # VI. DESIGN FOR RELIABILITY The main issue therefore, is how to minimize potential problems that may be caused by reliability defects? Moreover, how to reflect effects of design stages in the design multiplier? In future paper, we will respond to these questions by investigating the following issues and making recommendations to improve reliability. - Co-synthesis Paradigm: Combining logic and physical synthesis in Vertical Integration [Salek 1999] and concurrent integration [Pedram 1998]. Also the need for specifying the boundary protocol for Hardware/Software Co-Design: - Verification: Simulation, [MacMillen], Emulation, and Formal verification - 3. Testing: which fault model to emphasize? Representation, if possible, of design utilization failures. Noise Margin faults: faults caused by violating noise margin, e.g. due crosstalk. Effect of analog circuits and their testing; Incorporating the effects of current density in an existing fault model. - 4. General Recommendations such as: - More emphasis on system level, Divide and conquer. - · Reuse, don't build. - Fault tolerant Design. - Design for tolerance of defects. - CAD Tools: We can design with CAD tools but we cannot rely solely on them, Lack of accurate parasitic calculations [Breuer 2002] - Emphasis on theoretical research. - 7. Reliability and Management #### VII. SUMMARY AND FUTURE WORK This paper explored the space for design for reliability and indicated the need to consider *Reliability* as a VLSI attribute that is distinguishable from *Yield*. Seven different areas to improve reliability have been identified and will be explored more fully in subsequent work. # **ACKNOWLEDGMENT** The author wish to thank Dr. Santanu Dutta of Philips, Sunnyvale, CA for his comments on management. The work was pursued while on sabbatical at the center for design and Test of Nara Institute of Science and Technology (NAIST), Nara, Japan. #### REFERENCES - D. MacMillen, M. Butts, R. Camposano, D. Hill, T.W. Williams, "An Industrial View of Electronic Design Automation," IEEE Trans. Computer Aided Design of Integrated Circuits and Systems. Vol. 19. No. 12. Dec 2000. - Systems, Vol. 19, No. 12, Dec 2000. 2. W. Kuo, T. Kim, "An Overview of Manufacturing Yield and Reliability Modeling for Semiconductor Products," Proceedings of the IEEE, Volume: 87 Issue: 8, Aug. 1999 Page(s): 1329-1344 - W. Willing, A. Helland, "Establishing ASIC Fault-Coverage Guidelines For High-Reliability Systems," 1998 Proceedings Annual Reliability and Maintainability Symposium. - J. Humprey, G. Luettgenau, "Reliability Considerations in Design and Use of RF Integrated Circuits," Motorola Semiconductor Applications Notes- AN1025A. - C. H. Stapper and R. J. Rosner. "Integrated circuit yield management and yield analysis: Development and implementation," IEEE Trans. Semiconduct. Manufact., vol. 8, pp. 95-102, May 1995. - pp. 95-102, May 1995. D. F. Frost and K. F. Poole, "A method for predicting VLSI device reliability using series models for failure mechanisms," IEEE Trans. Reliability, vol. R-36, pp. 234-242, 1987. - J. L. Stevenson and J. A. Nachlas, "Microelectronics reliability predictions derived from components defect densities," in Proc. Annu. Reliability and Maintainability Symp., 1990, pp. 366-371. - Annu. Reliability and Maintainability Symp., 1990, pp. 366-371. J. G. Prendergast, "Reliability and quality correlation for a particular failure mechanism," in Proc. Int. Reliability Physics Symp., 1993, pp. 87-93. - B. El-Kareh, A. Ghatalia, and A. V. S. Satya, "Yield management in microelectronic manufacturing," in Proc. 45th Electronic Components Conf., 1995, pp. 58-63. T. W. Williams and N. C. Brown, "Defect level as a function of - T. W. Williams and N. C. Brown, "Defect level as a function of fault coverage," IEEE Trans. Comput., vol. C-30, pp. 987–988, Dec. 1981. - H. H. Huston and C. P. Clarke, "Reliability defect detection and screening during processing—Theory and implementation," in Proc. Int. Reliability Physics Symp., 1992, pp. 268–275. - F. Kuper, J. van der Pol, E. Ooms, T. Johnson, R. Wijburg, W. Koster, and D. Johnston, "Relation between yield and reliability of integrated circuits: Experimental results and application to continuous early failure rate reduction programs," in Proc. Int. Reliability Physics Symp., 1996, pp. 17-21. - T. Kim, W. Kuo, and W. T. Chien, "A relation model of yield and reliability for gate oxide failures," in Proc. 1998 Annu. Reliability and Maintainability Symp., Anaheim, CA, Jan. 19–22, 1998, pp. 428–433 - G.V. der Plas, J. Vandenbussche, G.G.E. Gielen, W. Sansen, "A Layout Synthesis Methodology for Array-Type Analog Blocks," IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 21, No. 6, Jun. 2002. - S.-M. Tang, "New burn-in methodology based on IC attributes, family IC burn-in data, and failure mechanism analysis," Proc. Annual Reliability and maintainability symposium, pp. 185-190. - F. Jensen, "Yield, quality and reliability: a natural correlation," S. DasGupta, "SOC: What will it take," Proc. ISCAS 2001, pp. - M. Kakumu and M. Kinugawa, "Power-supply voltage impact on circuit performance for half and lower submicrometer CMOS LSI," IEEE Trans. On Electron devices, Vol. 17, No. 8, Aug. 1990.