19.5.1 Locking Protocols in Distributed Databases

  • Context: Adapts centralized locking schemes (Chapter 15) for use in a distributed environment. The primary goal is still to ensure serializability of transactions. The key difference is handling data that might be replicated across multiple sites.
  • Assumption: Each site participates in a commit protocol (like 2PC) to guarantee global atomicity.
  • Basic Lock Modes: We assume the existence of shared (read) and exclusive (write) locks, as in centralized systems.

Key Challenge: Lock Management

The main issue is how and where to manage locks, especially when data is replicated. Different approaches have different trade-offs in terms of:

  • Overhead: Number of messages required for lock/unlock operations.
  • Bottlenecks: Risk of a single site becoming overloaded.
  • Vulnerability: Impact of site failures on the locking mechanism.
  • Deadlock handling complexity.
  • Availability: if sites/replicas containing the data item are not accessible.

Different Locking Approaches

19.5.1.1 Single Lock-Manager Approach

  • Mechanism:

    • A single designated site () acts as the lock manager for all data items (regardless of where the data resides).
    • All lock and unlock requests are sent to .
    • If a lock can be granted, sends a grant message to the requesting site.
    • If a lock cannot be granted (conflict), the request is queued.
    • Reads can access any replica.
    • Writes must update all replicas.
  • Advantages:

    • Simple implementation.
    • Easy deadlock handling (all lock information is at one site, so centralized deadlock detection algorithms can be used).
  • Disadvantages:

    • becomes a bottleneck. All requests go through it.
    • Vulnerable: If fails, the entire locking mechanism is lost (unless a backup/recovery scheme is used).

19.5.1.2 Distributed Lock Manager

  • Mechanism:

    • The lock manager functionality is distributed across multiple sites.
    • Each site manages locks for data items stored at that site.
    • For non-replicated data at site , lock requests are sent to ‘s lock manager.
    • Handling of replicated data is determined by the specific protocol such as Primary copy, Majority Protocol, Biased Protocol, and Quorum Consensus Protocol.
  • Advantages:

    • Reduces the bottleneck problem compared to the single lock-manager approach.
    • Lower overhead than centralized approach (for non-replicated data).
  • Disadvantages:

    • Deadlock handling is more complex. Deadlocks can span multiple sites, requiring distributed deadlock detection algorithms.

19.5.1.3 Primary Copy

  • Mechanism:

    • For each data item Q, one replica is designated as the primary copy.
    • All lock requests for Q are sent to the primary site of Q.
    • This effectively reduces replicated data locking to non-replicated data locking (from the lock manager’s perspective).
  • Advantages:

    • Simple implementation (similar to non-replicated data).
    • Concurrency control similar to that of unreplicated data.
  • Disadvantages:

    • Vulnerability: If the primary site of Q fails, Q becomes inaccessible, even if other replicas exist.

19.5.1.4 Majority Protocol

  • Mechanism:

    • If data item Q is replicated at n sites, a lock request must be sent to more than half of the n sites (i.e., a majority).
    • Each lock manager independently decides whether to grant the lock (based on local lock information).
    • A transaction can proceed only after obtaining a lock on a majority of the replicas.
    • Writes are performed on all replicas.
  • Advantages:

    • Decentralized: Avoids the single point of failure of the primary copy approach.
    • Can tolerate some site failures (as long as a majority of replicas are accessible).
  • Disadvantages:

    • Higher overhead: Requires more messages than primary copy (at least for lock requests, for unlock requests).
    • Complex deadlock handling:
      • Distributed deadlock detection is needed.
      • Deadlocks can occur even when locking a single data item (example provided in the PDF). This can be avoided by having all sites request locks on replicas in the same predetermined order.

19.5.1.5 Biased Protocol

  • Mechanism:

    • Treats shared (read) and exclusive (write) locks differently:
      • Shared Locks: A transaction needs to lock only one replica of Q.
      • Exclusive Locks: A transaction needs to lock all replicas of Q.
  • Advantages:

    • Lower overhead for read operations (compared to majority protocol), which are often more frequent.
  • Disadvantages:

    • Higher overhead for write operations.
    • Complex deadlock handling (like majority protocol).

19.5.1.6 Quorum Consensus Protocol

  • Mechanism:

    • A generalization of the majority protocol.
    • Each site is assigned a non-negative weight.
    • For data item x, let S be the sum of weights of all sites where x is replicated.
    • Two integers are defined:
      • Read quorum ()
      • Write quorum ()
    • Constraints that need to be satisfied: * *
    • To read, a transaction must lock enough replicas so that the sum of their weights is .
    • To write, a transaction must lock enough replicas so that the sum of their weights is .
    • Writes are performed on all locked replicas.
  • Advantages:

    • Flexibility: Can simulate majority and biased protocols by appropriate choice of weights and quorums.
    • Can tune performance for read/write ratios.
    • Can tolerate some site failures.
  • Disadvantages:

    • Complexity in choosing optimal weights.
    • Complex deadlock handling (like majority protocol).

Summary Table

ProtocolRead Lock RequirementWrite Lock RequirementAdvantagesDisadvantages
Single Lock ManagerAny one replicaAll replicasSimple, easy deadlock handling.Bottleneck at lock manager, vulnerable to lock-manager failure.
Distributed Lock ManagerDepends on protocol.Depends on protocol.Reduced Bottleneck.More complex deadlock handling.
Primary CopyPrimary copyPrimary copySimple, like non-replicated data.Vulnerable to primary site failure.
Majority ProtocolMajority of replicasMajority of replicasDecentralized, fault-tolerant.Higher overhead, complex deadlock handling (even for single item).
Biased ProtocolAny one replicaAll replicasLow read overhead.High write overhead, complex deadlock handling.
Quorum ConsensusEnough replicas to meet read quorum ()Enough replicas to meet write quorum ()Flexible, can simulate other protocols, fault-tolerant.Complex parameter tuning, complex deadlock handling, requires careful weight assignment.

Topic: 19.5.3 Replication with Weak Degrees of Consistency

  • Core Idea: Replicate data across multiple sites, but don’t enforce strict consistency (ACID properties) across all replicas at all times. This improves availability and performance, but sacrifices immediate consistency.

  • Trade-off: Availability and performance vs. consistency.

  • Two Main Approaches:

    • Master-Slave Replication:
      • One site is designated as the “master” for each data item.
      • All updates go to the master.
      • The master propagates updates to the “slave” replicas.
      • Reads can be done from any replica (master or slave).
      • Advantages:
        • Simple.
        • Reads can be very fast (local to a replica).
      • Disadvantages:
        • The master can be a bottleneck for updates.
        • Slaves might have slightly outdated data (eventual consistency).
        • If the master fails, a new master must be elected.
        • Reads from slave might be slightly outdated.
      • Transaction-Consistent Snapshot: Slaves must reflect a consistent state of the master as of a specific transaction (all updates from transactions up to a point are applied; updates of later transaction are not applied)
    • Multimaster Replication (Update-Anywhere Replication):
      • Any site can accept updates for any data item.
      • Updates are propagated to all other replicas.
      • Advantages:
        • Higher availability for updates (no single master bottleneck).
        • Updates can be done locally (faster).
      • Disadvantages:
        • Much more complex.
        • Conflicts are possible (two sites updating the same data item concurrently).
        • Conflict resolution is required (can be complex, may require manual intervention).
        • Consistency guarantees are weaker. Often, only eventual consistency is provided.
      • Lazy Propagation: Updates are not propagated as part of the original transaction. They are propagated later, asynchronously. This improves performance and availability, but further weakens consistency.