Thursday 4 December 2008

STM with control: Communication for retrying transactions

While working on the join pattern with guards and propagation implementation, we (my PhD student Edmund and me) came across an interesting problem which points out the current limited control/communication capabilities of retrying STM transactions (in Haskell). More on join patterns later. Here's the gist of the problem and a solution in form of a new Haskell library.

The task



We're given a list of switches, they can either be on or off


> data Switch = On | Off


If all switches are on, we switch them all off and report True If one switch is off, we do nothing and report False.

In a sequential setting, this task is straightforward. But what if the switches are shared among concurrent threads performing the above task?

In a concurrent setting, the task must be formulated as follows.

Atomically (
If all switches are on, we switch them all off and report True
If one switch is off, we do nothing and report False. )

Sounds like an application of STM. Indeed, here's a solution.


> verifyAndSwitchOff :: [TVar Switch] -> IO Bool
> verifyAndSwitchOff ts =
> atomically $ do
> bs <- mapM (\s -> do v <- readTVar s
> case v of
> On -> return True
> Off -> return False)
> ts
>
> if (and bs)
> then do mapM (\s -> do writeTVar s Off) ts
> return True
> else return False



We first check that all switches are on. Only then we switch all of them off. Isolation and atomicity of STM are important here.

Isolation guarantees that no intermediate step is visible to any other thread. After checking that all switches are on, we have the guarantee that they are still on, once we switch them off. Atomicity guarantees that all operations are executed all at once or not all which implies that some of the initially read switch values has changed. In this case, we abort the transaction and discard all changes so far and start over again.


It's tempting to immediately turn a switch off once we've read that the switch is on. But what if we read that a switch is still on? We need some additionally functionality to explicitly abort the transactions. That's what the retry and orElse combinators are for.


> verifyAndSwitchOff2 :: [TVar Switch] -> IO Bool
> verifyAndSwitchOff2 ts =
> let loop [] = return True
> loop (s:xs) = do v <- readTVar s
> case v of
> On -> do writeTVar s Off
> loop xs
> Off -> return False
> in atomically $
> (do b <- loop ts
> if b then return True
> else retry)
>
> `orElse`
>
> return False


The loop reads the status of each switch and turns the switch off if the switch is on. Otherwise, we 'abort' by returning False. For example, in case of the initial list [On, On, Off, On] we reach the third switch and the intermediate result [Off, Off, Off, On]. We can't just naively exit at this stage because the tasks demands that all changes are done atomically, ie to its full extent or not at all.

Here comes the rescue. The retry combinator allows the programmer to explicitly abort a transaction. That is, we discard all changes (which is important) here. The orElse combinator allows the programmer to compose two transaction. The second transaction will be executed if the first transaction retries. Exactly what we require.

Making the task more interesting



Suppose each switch is attached to an action, sort of a continuation. Let's say an IO action.


> type Cnt = IO ()


We want to return the continuation of the switch which first 'failed'. Here's the updated task description.

Atomically (
If all switches are on, we switch them all off and report True
Otherwise, return the continuation of the first turned on switch.)

The signature of the function we need to implement


verifyAndSwitchOffCnt :: [(TVar Switch, Cnt)] -> IO Cnt


What are our options how to implement the above?

We could adapt our first solution verifyAndSwitchOff. This is pretty straightforward. What about verifyAndSwitchOff2? Previously, we simply return False in the second orElse branch. For the more interesting task, we need to iterate over the list of pairs of switches and continuation and select the continuation of the first switch which is still on. This means that in both case we possibly need to iterate over the list twice. This is inefficient and leads to clumsy code.

Below we make use of an extension of STM where a retrying transaction can pass some information to the second transaction composed with orElse.

The extension is implemented as the STMControl.STMC library. The new primitives are:


newTVarC :: b -> STMC a (TVar b)
readTVarC :: TVar b -> STMC a b
writeTVarC :: TVar b -> b -> STMC a ()
atomicallyC :: STMC a b -> IO b
retryC :: a -> STMC a b
orElseC :: STMC a b -> (a -> STMC a b) -> STMC a b



STMC stands for STM with more control. The first parameter refers to the type of the value transmited in case of a retryC. The parameter in the orElseC combinator is now a function which accepts the result transmitted of the retrying transaction.

The implementation is fairly simple and uses an (abused) reader monad on top of STM. Here's finally an application of the new functionality.


> verifyAndSwitchOffCnt :: [(TVar Switch, Cnt)] -> IO Cnt
> verifyAndSwitchOffCnt tcs =
> let loop [] = return Nothing
> loop ((s,c):xs) = do v <- readTVarC s
> case v of
> On -> do writeTVarC s Off
> loop xs
> Off -> return (Just c)
> in atomicallyC $
> (do r <- loop tcs
> case r of
> Nothing -> return (return ()) -- default (donothing) continuation
> Just c -> retryC c)
>
> `orElseC`
>
> (\c -> return c)


The loop checks and resets all switches. In case a switch is turned off we return the attached continuation. We use the Maybe type to indicate failure (Just c) or success (Nothing). In case of success, we return the donothing continuation. Otherwise, we retryC to discard all changes made so far and pass the continuation to the second transaction which then returns this continuation.

I'd say compared to the other options, the STMC solution looks pretty clean and easy to understand.

Making the task even more interesting



That's the final extension, I promise. Suppose that to each switch there's also an attached STM Bool transaction. In addition to testing that the switch is off, we also test that the STM Bool transaction evaluates to True. Otherwise, we discard our changes and return the attached continuation.


> verifyAndSwitchOffCnt2 :: [(TVar Switch, STM Bool, Cnt)] -> IO Cnt
> verifyAndSwitchOffCnt2 tcs =
> let loop [] = return Nothing
> loop ((s,stm,c):xs) =
> do v <- readTVarC s
> case v of
> On -> do writeTVarC s Off
> b <- lift $ stm

STMC are (reader monad) lifted STM operations

> if b then loop xs
> else return (Just c)
> Off -> return (Just c)
> in atomicallyC $
> (do r <- loop tcs
> case r of
> Nothing -> return (return ()) -- default (donothing) continuation
> Just c -> retryC c)
>
> `orElseC`
>
> (\c -> return c)


The required changes to our previous solution are minor. In fact, I can't see how to implement the even more interesting task with STM alone. I believe that this task strictly requires STMC or something similar.


The complete source code of this example is available as part of the stmcontrol package on the Haskell platform hackage.

Implementation details




type STMC a b = ReaderT (IORef a) STM b

retryC :: a -> STMC a b
retryC x = do msgChan <- ask
lift $ unsafeIOToSTM (writeIORef msgChan x) -- L1
lift $ retry


What if the transaction is aborted just after execution of L1? Is our use of unsafeIOToSTM (un)safe? It's safe. It's okay to discard the IO action in case the transaction is rolled back. The important point is that if the transaction is explicitly retried, the IO action has been executed (which is the case as far as I understand the interplay of STM and unsafeIOToSTM).

4 comments:

Unknown said...

You could also do this with an exception, instead of the retry/orElse pair, since an exception aborts the transaction in the scope of the handler. Using a custom exception type would avoid the need for the reader monad too, I think.

Unknown said...

Interesting thought. You're saying that an exception raised within a transaction will cancel all updates and the control then flows to the handler which catches the exception. Indeed, no retry/orElse and no reader monad required.

In fact, it almost seems that retry/orElse are redundant given we can raise exceptions in transactions?

Martin Sulzmann said...

One more point. The types of retryC/orElseC are pretty informative. This information is lost in the exception encoding.

College Term Papers said...

I'm going to work on it...I'll see if I can come up with something... This is really fun.