Friday, July 30, 2010

A state based exposition of the Non Blocking Linked List by Timothy Harris

Continuing my efforts to use a state based approach to some simple lock free data structures, this time I take a shot at the Pragmatic Implementation of Non-Blocking Linked-Lists by Timothy Harris. To follow this post, besides the obvious need for knowledge of the (simply) linked list data structure, you must know what lock free algorithms try to achieve, their fundamental limitations, the restrictions modern hardware imposes upon us when it comes to multithreading and the operations provided to overcome these, namely what Compare-And-Swap is. There are no bindings to a specific programming language, but if you are familiar with one and its supporting memory model you should be able to follow more easily.

Delving a little deeper into the state-based approach

In my previous post, I glossed over the details of state transitioning and its underlying theory, a transgression allowed by the simplicity of the examined algorithm. However, in this case, some (very little) groundwork has to be laid first so that the following exposition will make some kind of sense.
The basic idea is that we define a global state (corresponding to the memory shared by all threads) and operations on it (the actual code) in a way that all possible values of this global state are possible and legal and the result of one of the possible operations. Each thread can examine the current value of the state and perform an operation based on it so that even in the face of a race condition, there is no possibility of ending up with an invalid structure. The result is an algorithm that requires no use of locking/mutex mechanisms, since the concept of atomicity is embedded in the state.
At the state diagrams I will present later on, there are two kind of transitions between values of the state. One is the result of an operation by a thread on the shared memory region, usually a CAS operation, that either succeeds or fails (always both outcomes are taken into consideration). This transitions between values that are fundamentally different and basically are new values for the shared memory region. The other kind of transition is an equivalence between two state values that although they are notationally different, they represent the instance of the shared memory region.
(I open a side note here to digress a bit. This is a notion that is foreign to the imperative paradigm but comes naturally in functional programming. That is, of course, to be expected: State transitioning is, basically, a functional construct, so whereas in imperative programming - as those of you who read the paper by Harris will note, is the paradigm he uses - we have loops and conditionals to check our current state, in my diagrams we will have what loosely relates to pattern matching in functional programming. My next post will expose more about these parallels and also formalize my work a bit more).

As always, we show the problems before we solve them

Before we introduce notation (a boring task I endeavor to postpone), let us see what the problem with traditional linked lists is, or rather, under which circumstances there is no problem. If you think about it a bit, when search and insert are the only supported operations (i.e. no deletion of nodes), with a single successful CAS we can swing any pointer to point to the newly inserted node and the structure remains consistent (search presents no problems, since, if the state-altering operation - the insert - leaves the structure intact, then there are only valid pointers to follow). The CAS ensures that even in the face of concurrent inserts after the same node, there are no dangling pointers. The schematic below I hope clarifies things enough.

Of course, things go downhill when we add support for deletions. In essence, the thing that messes with the algorithm is that a successful CAS on a node is non-indicative of a valid transition since the next field of a node can be CAS'ed with a pointer for a new node even if that node has already been deleted! The result is an insertion that is unreachable from a traversal of the list, also known as a crush and burn. Sorry, no figure for this scenario, the paper presents this case very eloquently. Also, the solution proposed is kind of apparent: The deletion is a two-phase process, consisting of first marking the node to be deleted and then performing the excision from the structure. Of course there is more to it, ending up with a cooperation among the threads that leaves the structure consistent. But let's take things more slowly.

Notation, a necessary evil

We will introduce notation in two steps. First we will represent the state of the list and allow support only for inserts, show the inadequacy for support for deletions and then expand the state space to allow for them. But let's start at the beginning.
We assume that the list has two sentinel nodes, named head and tail, that hold no value and are globally accessible. The next field of the head node is never null, pointing to either the first actual node or the tail node, whereas the tail node always has a next field value of null. No next field of a list node points to the head node and always exactly one list node or the head node points to the tail node. Basic stuff, really. Obviously we disregard the existence of actual useful data, since the presentation does not depend upon them. Also, Harris in his paper assumes that there is a total ordering on the key values held by the list nodes and his search() operation has a loop depending upon it. We can ignore this restriction using one of two rationals: Either we can assume that the "key" is the position of the node in the list (hence a natural number) or simply ignore the manner in which we arrive at a node. The latter is obviously more permissive and actually allows for more generality (which, however is not necessary, since the former assumption will cover all real life implementations).
So, let's define the state simply as any two adjacent nodes and see what we can come up with. Of course, there are four values to take into consideration in this case, the left node, its next pointer, the right node and its next pointer. So, the state is <L,, R,>. The list starting value is of course <head, tail, tail, null>. The left node is the head, its next pointer points to tail, the right node is the tail and its next field is null. A more general value is <A,B,B,C>: The left node is a node A, which points to B. The right node is B which points to C. That's it.
A word on equivalence. The value <A,B,B,C> is obviously semantically equivalent to <D,E,E,F> or <A,B,B,D>. However, none of them is equivalent to <A,B,C,D> although the latter is subsumes all three of them, as they are a special case of it. I hope this is apparent. If not, wait for my next post and come back here afterwards, sorry.
Insertion from the instance <A,B,B,C> means inserting a node between nodes A and B. For that purpose, we hold in a local variable the address of, in another its value (&B), construct a new node D with and do a CAS of from &B to &D. If we succeed then the new node is in its place and we are done. If it fails, then during our thread local operations another thread inserted in the same position a different node (remember, so far we examine the insert-only scenario), so we must try again. The state transitions are in the following diagram, along with the imperative, linked list figures we are used to.

Deletion messes things up

The problem with deletions arises naturally from the above assumption that the only pointer that could have changed after our CAS failed is The reality is that regardless of whether our CAS succeeds or not, there is the possibility that the pointer of the node that points to L could have changed to point further down the list after L, a situation not accounted for and impossible to represent in our previous attempt. Obviously, we must expand our state domain to be able to represent such a case. Enter the primed pointer or as Harris names it, markedReference. The marked reference is an expansion of the next field of the list node structure that allows for a reference to a node to be expressed as either primed or unprimed. Both values resolve to the same memory location but can be distinguished and switched between them. An implementation suggestion is the reservation of the LSB of the value of the next field, assuming proper alignment of addresses (in Java, a wrapper class will work as well). We will use the term marked and primed interchangeably and, when allowed by context, a node will be referred as marked when
its next field is marked.
Semantically, marking the next field of a node is the first of the two steps in the process of deleting a node. Marking its next field is a heads up that this node will be deleted and should not be used as a previous node during an insertion although it can be used in traversals. In our state model, priming is denoted by a single quote (') and there is no equivalence between a primed and an unprimed value so the values <A,B,B,C> and <A,B,B,C'> are not equivalent. Note that we can have more than one marked node in sequence, something that requires us to introduce a sequence in our state notation. My choice is the square bracket [], used as [] to denote the empty sequence, [A,B] to denote one marked node (everything in the square brackets is primed by definition, so there is no need to be explicit about it) and finally [A..B] to denote a sequence that has as its first node a primed node A and as its last a node (unknown, could be A) with a marked next field pointing to node B.

Bringing it all together

Obviously, the idea of restricting the state to a slice of the complete list stems from the observation that threads generally operate on different parts of the list and problems arise only when two threads try to concurrently operate on overlapping slices. So, first, let's see what are all the possible values for a state.
The simplest case is <A,B,[],B,C> where there are no marked nodes. Insertion can happen here after node A with a simple successful CAS. If the CAS fails however, there are two possibilities. One is that the node A has been marked so the new state is <AL, A, [A..D], D, E> or a new node has been inserted before we managed to complete our insertion, so the new state is <A,D,[],D,E>. The latter case is equivalent to our starting point, so we simply retry. The former is a more general value and has to be treated specially. We will get there in a minute.
<A,B,[],B,C> is also ripe for deleting, or rather for marking for deletion the node A. A CAS of the field from &B to &B' leads, if successful to <AL,A,[A,B],B,D> and if failed to <W,X,[X..Y],Y,Z>.
This leaves only one set of state values unaccounted for, that is what happens in the case <A,B,[B..C],C,D>, which is a probable result of a failed insert() CAS or a certain outcome of a delete() CAS. In either case, the operation is simple. Do a CAS to excise the marked node sequence, leading, if successful, to <A,C,[],C,D> and to the same situation if failed.
Finally, let's address the AL node introduced above. It is the left node of A, the one where the next field is &A. It appears abstract to assume that it is unmarked (its existence is certain however, remember how we defined head), however, there is no need to panic. There is an equivalence between this value and the most general one, namely <A,B,[B..C],C,D>, so all we have to clarify is that AL is the rightmost unmarked node that is a predecessor of A. There, happy now?
Let's bring this all into a figure:

Note that figure 3 contains ALL possible values for our state as we defined it and for every one of them there exists a well defined set of operations that can be applied to it. Therefore, there is no way for our structure to happen to be in an inconsistent state and every thread can operate on the structure without requiring any locks. Mission accomplished!

Reality is somewhat more complicate

In our discussion above we defined states that represent the part of the world that a specific thread focuses on at a specific point in time. How it got there is something intentionally left without mention, since regardless how it got there, it has been (loosely) proved that it will end up in a recognizable state value from which it can decide what action to perform. In an actual implementation however, we do not deal with states. We deal with pointers and also we have to traverse the list always starting from the head node, since there is no indexing mechanism and, more importantly, a way to quickly find sequences of marked nodes. This requires nested loops for the traversal of the list that will also perform the bookkeeping of the nodes to ensure that the previous unmarked node is known, to ensure that the rightmost node is not marked even after our CAS succeeds and of course the more mundane considerations for argument passing, result value conventions and the like. However, these are "mere" engineering concerns, where we tried to show the existence of a state model that correctly predicts and defines the existence of a concurrent linked list, as modern hardware allows it.

Wednesday, July 28, 2010

A kind of roadmap

What is coming

I am currently trying to explain the workings of the non-blocking linked list as described by Timothy Harris using a state based approach. I have understood the algorithm and of course the problem is defining the correct state, but I am nearly there. From there on, I will move to doubly-linked lists with the complete set of operations (search, delete, insert).

Departing from the usual operations

From there I will try to trim down the doubly-linked structure to support only the following non-blocking operations: find, move from anywhere to the end and prepend. This is the main ingredient for constructing a non-blocking LRU structure that will make it easier to implement non-blocking variations of some popular cache-management algorithms. I am not totally convinced that such a concept is viable but even if I fail, at least I will have gained some wisdom out of it.

The end of the trail

I am really excited about lock-free data structures but there has to be an application domain in sight to give one the focus he needs for implementing something useful. In my case, graph databases are the stuff and I plan, if my work comes to fruition to offer any interesting solutions that crop up, as a library, to the public domain. I only hope that the guys over at neo4j or infogrid are interested in taking a look at my work.