19.5.4 Deadlock Handling in Distributed Databases

Context: Extends deadlock management concepts from centralized systems (Chapter 15) to distributed environments. The core challenge is that deadlocks can span multiple sites.

Challenges in Distributed Deadlock Handling

Maintaining the Wait-For Graph: The primary difficulty is deciding how and where to maintain information about which transaction is waiting for which other transaction (the wait-for graph). This is complicated because transactions can access data at different sites.
Global vs. Local Deadlocks: A deadlock might exist globally (involving multiple sites) even if no local deadlock exists at any single site.
Communication Delays: There are inherent delays in communication between sites. This means any constructed wait-for graph is an approximation of the real, but unknown, state of the system.

We can use either deadlock prevention or deadlock detection and recovery. Prevention is often less desirable due to:

Therefore, deadlock detection is more commonly used.

Mechanism:
- Each site maintains a local wait-for graph. Nodes represent transactions (local or global) that hold or request resources at that site.
- A single site is designated as the deadlock-detection coordinator.
- Sites periodically send their local wait-for graphs to the coordinator.
- The coordinator constructs a global wait-for graph by combining the local graphs.
- The coordinator searches the global graph for cycles. If a cycle is found, a victim transaction is selected for rollback.
- The coordinator notifies the sites about the victim. The sites must roll back this victim transaction.
Example:

In the above diagram, Site S1 and S2 maintain their local wait-for-graphs. The coordinator constructs the global graph, where the cycle ( $T_{1} \to T_{2} \to T_{1}$ ) indicates a global deadlock.
Local Wait-for Graph Construction:
- If $T_{i}$ at site $S_{1}$ requests a resource held by $T_{j}$ at site $S_{2}$ :
  1. Send message to site $S_{2}$ .
  2. Insert edge $T_{i} \to T_{j}$ into the local graph of site $S_{2}$ .
- If a transaction requests a resource at the same site, add edge to the local wait-for graph normally.
Global Graph Construction Triggers:
- Whenever a new edge is added to or removed from a local graph.
- Periodically, after a certain number of changes.
- Whenever the coordinator needs to check for deadlocks.
Issues and Potential Problems:
- False Cycles: Due to communication delays, the global graph might show a cycle that doesn’t actually exist in the real system. This can lead to unnecessary rollbacks. The PDF provides an example of this (Figure 19.6).
  - For example, a resource release might not be reflected in the global graph immediately, while a new request might already be added, causing a false cycle. false cycle
Unnecessary Rollbacks: Even if a deadlock did exist, a transaction involved might have been aborted for an unrelated reason. This can lead to more rollbacks than strictly necessary.

The deadlock detection can also happen in a distributed way.

Local Wait-For Graph: A graph at each site showing which transactions are waiting for resources held by other transactions at that site.
Global Wait-For Graph: An approximation of the overall wait-for relationship across all sites. Constructed by the deadlock detection coordinator.
False Cycle: A cycle that appears in the constructed global wait-for graph but does not exist in the real, instantaneous state of the distributed system. Caused by communication delays.
Real graph: the real but unknown wait-for state.
Constructed graph: the approximated wait-for graph, generated by the coordinator.
Victim: A transaction selected to be rolled back to break a deadlock cycle.
Deadlock-detection coordinator: a single site, responsible for constructing the global wait-for graph and detecting deadlocks.

Advantages:
- Relatively simple to implement.
- Centralized control makes cycle detection easier.
Disadvantages:
- The coordinator can become a bottleneck.
- Vulnerability - If the coordinator site fails, deadlock detection stops (unless a recovery/election mechanism is in place).
- Potential for false cycles and unnecessary rollbacks.