CISUC

WinFT: Software Implemented Fault Tolerance for Win32 Applications.

Authors

Abstract

The increasing number of existing business critical applications targeted for the low end PC market and general purpose operating systems such as Microsoft windows 95/NT poses the problem of guaranteeing continuous avail-ability and data integrity for those applications. This class of programs include online transaction processing, elec-tronic commerce, Internet/World Wide Web, data warehousing, decision support, online analytical processing (OLAP), control systems, and other business-critical solutions for the finance, retail, and healthcare markets. These consist mainly in server applications of the business world that runs 24 hours a day, i.e. run perpetually providing a service to its clients, demanding high availability, or/and applications that manage business critical data requiring high levels of data integrity. Traditionally, this software runs in highly specialized machines with special operating system support in order to provide the required levels of dependability through fault tolerance [6] [7]. Using off-the-shelf cheap PC's and operating systems without any fault tolerance support for this purpose, as is seen nowadays, can be risky, unreliable and dangerous.
This article presents a library, named WinFT, that provides fault tolerance support for Win32 applications (Windows95/NT) that need high levels of dependability. In this approach, fault tolerance is totally software imple-mented and enables the detection and recovery of software faults (bugs) that cause processes to crash or hang as well as faults in the underlying operating system and hardware that are not handled by the O.S. and hardware them-selves. Basically, WinFT performs automatic detection and restart of failed processes, diagnostic and reboot of a malfunctioning or strangled operating system, checkpointing and recovery of critical volatile data and preventive actions such as software rejuvenation. WinFT was employed successfully in real time control applications for indus-try based on personal computers running Windows 95 in a workgroup environment.

Subject

Fault-Tolerance in Control Systems

Journal

BYTE, pp. 51-52, February 1997

Cited by

Year 1999 : 1 citations

 1. Peter M. Chen and David E. Lowell "Reliability Hierarchies" 1999 Workshop on Hot Topics in Operating Systems (HotOS), March 29-30, 1999, Rio Rico, Arizona, USA.