Topic: 19.3 Distributed Transactions

Core Concept:

  • A distributed transaction involves accessing and updating data at multiple sites (databases) in a distributed system.
  • The goal is to maintain the ACID properties (Atomicity, Consistency, Isolation, Durability) even when data is spread across different locations.

Key Challenges:

  • Atomicity: Ensuring that either all parts of the transaction (across all sites) commit or all parts abort, despite failures.
  • Consistency: Maintaining data consistency across replicated or fragmented data.
  • Isolation: Ensuring that concurrent distributed transactions do not interfere with each other.
  • Durability: Ensuring that committed changes are permanent, even if sites fail.
  • Handling Failures: Dealing with site failures, network partitions, and communication failures gracefully.

19.3.1 System Structure:

  • Transaction Manager (TM): Each site has a local TM responsible for:
    • Maintaining a log for recovery.
    • Participating in concurrency control.
  • Transaction Coordinator (TC): A process (usually at the site where the transaction originates) that coordinates the execution of a distributed transaction. It is responsible for:
    • Starting the transaction.
    • Breaking it into subtransactions for each site.
    • Coordinating the commit/abort process (e.g., using 2PC or 3PC).

19.3.2 System Failure Modes:

  • Site Failure: A site (database server) crashes.
  • Link Failure: Communication link between sites goes down.
  • Message Loss: Messages exchanged between sites are lost.
  • Network Partition: The network splits into two or more parts, preventing communication between sites in different partitions.

Key Considerations:

  • Local Transactions: Transactions that access data only at a single site.
  • Global Transactions: Transactions that access data at multiple sites (distributed transactions).
  • Commit Protocols (e.g., 2PC, 3PC): Essential for ensuring atomicity in distributed transactions.
  • Concurrency Control: Mechanisms (e.g., locking, timestamping) to ensure isolation and serializability in a distributed environment.
  • Recovery: Mechanisms to handle failures and restore data consistency.