Consistent Online Backup in Transactional File Systems

No Thumbnail Available
Date
2012
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
A consistent backup, preserving data integrity across les in a le system, is of utmost importance for the purpose of correctness and minimizing system downtime during the pro- cess of data recovery. With the present day demand for continuous access to data, backup has to be taken of an active le system, putting the consistency of the backup copy at risk. We propose a scheme referred to as mutual serializability to take a consistent backup of an active le system assuming that the le system supports transactions. The scheme extends the set of con icting operations to include read-read con icts, and it is shown that if the backup transaction is mutually serializable with every other transaction individually, a consistent backup copy is obtained. The user transactions continue to serialize within themselves using some standard concurrency control protocol such as Strict 2PL. Starting with considering only reads an writes, we extend the scheme to include le operations such as directory operations, le descriptor operations and operations such as append, truncate, rename, etc., as well as operations that insert and delete les. We put our scheme into a for- mal framework to prove its correctness, and the formalization as well as the correctness proof is independent of the concurrency control protocol used to serialize the user transactions. The formally proven results are then realized by a practical implementation and evalua- tion of the proposed scheme. In the practical implementation, applications run as a sequence of transactions and under normal circumstances when the backup program is not active, they simply use any standard concurrency control technique such as locking or timestamp based protocols (Strict 2PL in the current implementation) to ensure consistent operations. Now, once the backup program is activated, all other transactions are made aware of it by some triggering mechanism and they now need to serialize themselves with respect to the backup transaction also. If at any moment a con ict arises while establishing the pairwise mutu- ally serializable relationship, the con icting user transaction is either aborted or paused to resolve the con ict. We ensure that the backup transaction completes without ever having to rollback by always ensuring that it reads only from committed transactions and never choosing it as the victim for resolving a con ict. To be able to simulate the proposed technique, we designed and implemented a user space transactional le system prototype that exposes ACID semantics to all applications. We simulated the algorithms devised to realize the proposed technique and ran experiments to help tune the algorithms. The system was simulated through workloads exhibiting a wide range of access patterns and experiments were conducted on each workload in two scenarios, one with the mutual serializability protocol enabled (thus capturing a consistent online backup) and one without (thus capturing an online inconsistent backup) and comparing the results obtained from the two scenarios to calculate the overhead incurred while capturing a consistent backup. The performance evaluation shows that for workloads resembling most present day real workloads exhibiting low inter-transactional sharing and actively accessing only a small percentage of the entire le system space, has very little overheads (2.5% in terms of transactions con icting wit.
Description
Supervisor: Gautam Barua
Keywords
COMPUTER SCIENCE AND ENGINEERING
Citation