CISUC

Design of Multi-Threaded Fault-Tolerant Connection-Oriented Communication

Authors

Abstract

Fault-tolerance is vital for dependable distributed applications that can deliver service, even in the presence of faults. Over the last few decades, above all protocols proposed to offer reliability and fault-tolerance, TCP grew to become one of the cornerstones of the Internet. However, despite emulating reliable communication in distributed environments, TCP does
not handle connection failures when the connectivity is lost for some time, even if both endpoints are still running. When this occurs, developers must rollback the peers to some coherent state, many times with error-prone, ad hoc, or custom application-level
solutions.
In this paper, we refine the Acceptor-Connector design pattern to tackle the TCP unreliability problem. The pattern decouples the failure-related processing from the connection and service
processing, efficiently handling different connections and their possible crashes concurrently, thereby yielding more reusable, extensible, and efficient distributed communication. The solution we propose incorporates proven multi-threaded solutions and a buffering scheme that discards the need for an applicationlayer acknowledgment scheme. This simplifies the development of reliable connection-oriented applications using the ubiquitous
TCP protocol.

Keywords

TCP, Connection Failure, Fault Tolerance, Design Pattern, Multi-Threading

Subject

Fault Tolerance

Related Project

iCIS - Intelligent Computing in the Internet of Services

Conference

The 20th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC 2014), November 2014

DOI


Cited by

No citations found