CISUC

Practical Issues in the Use of ABFT and a new Failure Model

Authors

Abstract

In this paper we study the behavior of Algorithm Based Fault Tolerance (ABFT) techniques under faults injected according to a quite general fault model. Besides the problem of roundoff error in floating point arithmetic, we identify two further weakpoints, namely lack of protection of data during input and output, and incorrect execution of the correctness checks. We propose the Robust ABFT technique to handle those weakpoints. We then generalize it to programs that use assertions, where similar problems arise, leading to the technique of Robust Assertions, whose effectiveness is shown by fault injection experiments on a realistic control application. With this technique a system follows a new failure model, that we call Fail-Bounded, where with high probability all results produced are either correct or, if wrong, they are within a certain bound of the correct value, whose exact value depends on the output assertions used. We claim that this failure model is very useful to describe the behavior of many low redundancy systems.

Keywords

Failure models, assertions, ABFTs, error detection methods, fault injection.

Subject

Failure Models

Conference

FTCS - 28, June 1998


Cited by

Year 2009 : 1 citations

 Robert Granat, Kiri L. Wagstaff, Benjamin Bornstein, Benyang Tang, and Michael Turmon. Simulating and Detecting Radiation-Induced Errors for Onboard Machine Learning. Proceedings of the Third IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT), 19 a 23 de July, Pasadena, CA, EUA, 2009

Year 2008 : 1 citations

 Daniel Skarin, Johan Karlsson, "Software Implemented Detection and Recovery of Soft Errors in a Brake-by-Wire System?, Seventh European Dependable Computing Conference, EDCC 2008, Kaunas, Lituánia, 07 a 09 de Maio de 2008

Year 2005 : 3 citations

 Sumant Kowshik, Girish Baliga, Scott Graham, Lui Sha, "Co-design Based Approach to Improve Robustness in Networked Control Systems?, IEEE/IFIP International Conference on Dependable Systems and Networks, Performance and Dependability Symposium, DSN-PDS 2005, Yokohama, Japão, 28 de Junho a 1 de Julho de 2005.

 Jonny Vinter, Olaf Hannius, Torbjorn Norlander, Peter Folkesson, Johan Karlsson, ?Experimental Dependability Evaluation of a Fail-Bounded Jet Engine Control Syste,m for Unmanned Aerial Vehicles", IEEE/IFIP International Conference on Dependable Systems and Networks, Performance and Dependability Symposium, DSN-PDS 2005, Yokohama, Japão, 28 de Junho a 1 de Julho de 2005.

 Jonny Vinter, "On the Effects of Soft Errors in Embedded Control Systems?, PhD Thesis, Department of Computer Science and Engineering, Chalmers University of Technology, Göteborg, Sweden, 2005, ISBN 91-7291-630-3

Year 2004 : 1 citations

 Christoforos N. Hadjicostis, "Coding Techniques for Fault-Tolerant Parallel Prefix Computations in Abelian Groups?, The Computer Journal, 47(3):329-341, © The British Computer Society, 2004

Year 2003 : 4 citations

 J. Z. Lou, "Tests and Tolerances for High-Performance Software-Implemented Fault Detection?, IEEE Transactions on Computers 0018-9340/03, © 2003 IEEE Published by the IEEE Computer Society Vol. 52, No. 5; MAY 2003, pp. 579-591. (Science Citation Index Expanded)

 M. Turmon, R. Granat, D. S. Katz, J. Z. Lou, "Tests and Tolerances for High-Performance Software-Implemented Fault Detection,? IEEE Trans. Computers, 2003. Volume 52, Issue 5, May 2003 Page(s):579 - 591 (Science Citation Index Expanded)

 ÿrjan Askerdal, "On Impact and Tolerance of Data Errors with Varied Duration in Microprocessors", tese de doutoramento, Chalmers University of Technology, Gotemburgo, Suécia, ISBN 91-7291-285-5, 2003.

 Jonny Vinter, A. Johansson, P. Folkesson, and J. Karlsson, "On the design of robust integrators for fail-bounded control systems", IEEE/IFIP International Conference on Dependable Systems and Networks, Dependable Computing and Communications, DSN-DCC 2003, San Francisco, CA, USA, pp. 415-424, 22 a 25 de Junho de 2003

Year 2002 : 1 citations

 1. J. Norberg, T. Ersson, J. Vinter, M. Torngren, P. Folkesson, J. Karlsson, "A co-design approach for error handling in computer control systems", supplemental volume of the IEEE/IFIP International Conference on Dependable Systems and Networks, DSN-2002, Bethesda, Maryland, USA, pp. B-22-B23, June 23-26, 2002.