The following is partially taken from section 7.9 of your book.
Performing atomic transactions is very much a realm of databases, but it has some bearing to Operating Systems. Many operating systems utilize database-like file systems (or run databases internally), so the idea of performing atomic transactions is far from irrelevant. Also, if you ever work with databases, this lesson will give you a good idea of how they perform transactions.
The whole idea is very much related to critical sections. Except now we don't just care about the atomicity of it, but about the fact that it completes as a single unit. For example, imagine you're dealing with two bank accounts, and you're transferring money from one account to the other. The act involves two steps, withdrawing money from one account, and depositing it in the send.
Obviously, you'd like to have the withdrawal and deposit to either both happen or for none of them to happen. If only one of them happens (only withdrawal) then you've lost the money, and if only the deposit happens, you've created money out of nothing.
The big trick to all of the transaction details is that we need to ensure that they work even under extreme circumstances. Like a power failure, or disk crash, or something that's outside of the control of our program.
Well, transactions, are a series of read and write system calls. Transactions naturally have a beginning and an end. When they complete successfully, they are committed, and when they fail, they're aborted. When they're aborted they also need to be rolled back.
Things that go wrong
Whenever a transaction is aborted, we do a roll back that restores the system to original state before the transaction. No other processes are aware that any failure happened.
There are several things that we have to worry about. The transaction could abort for logical reasons, or because of some failure. If the transaction fails for logical reasons, there are usually no major issues to restoring the state of the system. If the transaction fails because of some failure (hardware, power, etc.), then the system must somehow recover its state that it had right before the failure.
In order to recover after a failure, transactions often utilize different types of media with different reliabilities.
Volatile Storage: This generally refers to memory and cache. This is where most of the calculations happen in the computer, and is usually the thing to disappear in case of power failure (or major problems).
Nonvolatile Storage: This includes things like hard-drives, tape backups, etc. The data on these devices is generally not lost in case of power failure or other types of problems.
Stable Storage: This is storage that is never lost (well, relatively `never'). These might include things like RAID devices (or networked storage, etc.) Something really major would have to happen in order for one of these things to loose data.
One of the ways to recover is to write down everything that you do, before you do it. This is the idea behind write-ahead logging. The system maintains a log (on nonvolatile storage) of every single update operation.
The log contains information like: Transaction Name, Data Item Name, Old Value, and New Value. (you also include tags for transaction start and transaction commit).
The log is written to before the actual data in the database is updated.
The system (well, the transaction manager) also include two operations: undo, and redo, which either `undo' (roll back) or `redo' the transaction. It is important for these idempotent (that is, multiple executions of an operation have the same result as does one execution). This ensures that if the system crashes again during the recovery, you can still recover.
Now, for the actual recovery: For every transaction in the log that has a `start' and a `commit', you perform a `redo', and for every transaction that has a `start' but no `commit', you do an `undo'.
There are several issues with this method though; it uses lots of space (which can periodically be erased - more on that later; checkpoints), and it requires at least two disk accesses for every single update (one to access the log, the other to actually perform the update).
The idea of a checkpoint is not to worry about transactions that have already completed, and are known to be `ok'. Every once in a while, we write out `checkpoint' to the log, to indicate that during recovery, everything upto this point is `ok'. That's really all there is to it.
Some implementors take `checkpoints' to mean that they can perform transactions (updates) and logging in volatile memory, and once in a while write out the updates and log to nonvolatile memory. This could work, but is overall less reliable (you're risking recovery of much coarser grained pieces of data).