Notes on synchronization in the Linux Kernel
23 Jul 2015Just a small collection of notes on synchronization mechanisms in the Kernel. The underlying reference for which is the Linux Kernel Development book by Robert Love.
Atomic Operations
- Indivisible operations - executed without interruption
- Keep state consistent across threads of execution
- Some architectural support
- Atomic Integer Operations
- Use special type
atomic_t
for int- Prevent compiler from optimizing value
- hide architecture specific implementation differences
- defined in
linux/types.h
- Use special type
typedef struct {
volatile int counter;
} atomic_t;
* Devlopers need to limit the usage of `atomic_t` to `24 bits` to not break on sparc
* on sparc lock embeded in lower 8 bits (this may no longer be relevant) * atomic operations in `asm/atomic.h` * defining atomic
/* define v */
atomic_t v;
/* define u and initialize it to zero */
atomic_t u = ATOMIC_INIT(0);
- Some atomic operations :
/* v = 4 (atomically) */
atomic_set(&v, 4);
/* v = v + 2 = 6 (atomically) */
atomic_add(2, &v);
/* v = v + 1 = 7 (atomically) */
atomic_inc(&v);
/* will print "7" convert atomic_t to int */
printk(“%d\n”, atomic_read(&v));
-
Using
atomic_add
,atomic_inc
,atomic_dec
lighter weight instead of complex locking -
int atomic_dec_and_test(atomic_t *v)
- decrement
v
- if
v
is zero returns true - else return false
- if
- List of Atomic Operations
- decrement
// Atomic Integer Operation Description
//At declaration, initialize to i.
ATOMIC_INIT(int i)
// Atomically read the integer value of v.
int atomic_read(atomic_t *v)
// Atomically set v equal to i.
void atomic_set(atomic_t *v, int i)
// Atomically add i to v.
void atomic_add(int i, atomic_t *v)
// Atomically subtract i from v.
void atomic_sub(int i, atomic_t *v)
// Atomically add one to v.
void atomic_inc(atomic_t *v)
// Atomically subtract one from v.
void atomic_dec(atomic_t *v)
// Atomically subtract i from v and
// return true if the result is zero;
// otherwise false.
int atomic_sub_and_test(int i, atomic_t *v)
//Atomically add i to v and return
//true if the result is negative;
//otherwise false.
int atomic_add_negative(int i, atomic_t *v)
//Atomically add i to v and return
//the result.
int atomic_add_return(int i, atomic_t *v)
//Atomically subtract i from v and
//return the result.
int atomic_sub_return(int i, atomic_t *v)
//Atomically increment v by one and
//return the result.
int atomic_inc_return(int i, atomic_t *v)
// Atomically decrement v by one and
// return the result.
int atomic_dec_return(int i, atomic_t *v)
//Atomically decrement v by one and
//return true if zero; false otherwise.
int atomic_dec_and_test(atomic_t *v)
//Atomically increment v by one and
//return true if the result is zero;
//false otherwise.
int atomic_inc_and_test(atomic_t *v)
- Implemented as inline functions & inline assembly
- word sized reads always atomic
- most architectures read/write of a single byte is atomic( no two simultaneous operations)
- read is consistent(ordered either before or after the write)
- Thus on most architectures we get
/**
* atomic_read - read atomic variable
* @v: pointer of type atomic_t
*
* Atomically reads the value of @v.
*/
static inline int atomic_read(const atomic_t *v)
{
return v->counter;
}
- Consitent atomicity does not guarantee consistent ordering (use memory barriers for enforced ordering)
64-bit
architecturesatomic_t
size cant be different for different architectuersatomic_t
is consitently 32-bit accross architectuers
atomic64_t
is used to for 64-bit atomic operations on 64 bit machines- nearly all atomic operations listed above implement a 64-bit version
- all corresponding functions shown below
typedef struct {
volatile long counter;
} atomic64_t;
ATOMIC64_INIT(long i)
long atomic64_read(atomic64_t *v)
void atomic64_set(atomic64_t *v, int i)
void atomic64_add(int i, atomic64_t *v)
void atomic64_sub(int i, atomic64_t *v)
void atomic64_inc(atomic64_t *v)
void atomic64_dec(atomic64_t *v)
int atomic64_sub_and_test(int i, atomic64_t *v)
int atomic64_add_negative(int i, atomic64_t *v)
long atomic64_add_return(int i, atomic64_t *v)
long atomic64_sub_return(int i, atomic64_t *v)
long atomic64_inc_return(int i, atomic64_t *v)
long atomic64_dec_return(int i, atomic64_t *v)
int atomic64_dec_and_test(atomic64_t *v)
int atomic64_inc_and_test(atomic64_t *v)
- Atomic Bit Operations
- architecture specific code
asm/bitops.h
for details- operate on generic pointers
- on
32-bit machines
- bit
31
most significant bit - bit
0
least significant bit
- bit
- Non-atomic versions of bit operations provided but their name prefixed with
__
test_bit()
atomic version ,__test_bit()
non-atomic version- when dont need locking use non-atomic versions for performance
/**
* Atomically set the nr -th bit starting from addr.
*/
void set_bit(int nr, void *addr)
/**
* Atomically clear the nr -th bit starting from addr.
*/
void clear_bit(int nr, void *addr)
/**
* Atomically flip the value of the nr -th bit starting from addr.
*/
void change_bit(int nr, void *addr)
/**
* Atomically set the nr -th bit starting from addr and return the previous value.
*/
int test_and_set_bit(int nr, void *addr)
/**
* Atomically clear the nr -th bit starting from addr and return the
* previous value.
*/
int test_and_clear_bit(int nr, void *addr)
/**
* Atomically flip the nr -th bit starting from addr and return the
* previous value.
*/
int test_and_change_bit(int nr, void *addr)
/**
* Atomically return the value of the nr -
* th bit starting from addr.
*/
int test_bit(int nr, void *addr)
- See example usage setting and clearing bits atomically
unsigned long word = 0;
/* bit zero is now set (atomically) */
set_bit(0, &word);
/* bit one is now set (atomically) */
set_bit(1, &word);
/* will print “3” */
printk(“%ul\n”, word);
/* bit one is now unset (atomically) */
clear_bit(1, &word);
/*bit zero is flipped; now it is unset (atomically) */
change_bit(0, &word);
/* atomically sets bit zero and returns the previous value (zero) */
if (test_and_set_bit(0, &word)) {
/* never true ... */
}
/* the following is legal; you can mix atomic bit instructions with normal C */
word = 7;
Spin Locks
- Allow for creation of critical regions in code
- spinlock can be held by at most one thread at a time
- contending threads busy loop waiting for the lock to unlock
- if lock is uncontended acquire lock and proceed instantly
- busy loop will
"waste"
processor time - thus must minmize length of time spinlock is held
- advantage that doesnt result in context switches
- can be used in contexts which do not support blocking (interrupt context) or (preemption disabled?)
- Spin Lock Methods
- Defined in
linux/spinlock.h
- Defined in
DEFINE_SPINLOCK(my_lock);
// acquire the spin lock
spin_lock(&my_lock);
... place critical region here ..
//release the spin lock
spin_unlock(&my_lock);
- On uniprocessor system spinlocks compile away
- “some people” say they have some side effects, and spinlocs are not noops
- need to verify
- act as markers to enable disable kernel preemption
- Spinlocks are not recursive
- trying to acquire the same lock again will hang the thread of execution
- Using spinlock in interrupt context - need to disable
local interrupts
on this processor- if preemption not disabled may deadlock on the lock this processor already holds
- Thus need to use methods to which will disable interrupts
spin_lock_irqsave
- save the current state of interrupts
- disables interrupts locally
- obtain the spinlock
spin_unlock_irqrestore
- unlock the given lock
- returns interrupts to
previous state
- This is key , if the interrupts were disabled on entry then we need should keep them disabled
- allows nesting of different spinlocks and interrupt disabling clode
- Locks should be associated with data structures which they lock
- make association of lock with data clearer by naming conventions
DEFINE_SPINLOCK(my_lock);
unsigned long flags;
spin_lock_irqsave(&my_lock, flags);
/* critical region ... */
spin_unlock_irqrestore(&my_lock, flags);
- Kernel Configs to help Spin Lock Debugging
CONFIG_DEBUG_SPINLOCK
- using uninitialized spinlock detection
- unlocking a lock which was never locked
CONFIG_DEBUG_LOCK_ALLOC
- debugging lock lifecycles?
- Dynamic Allocation of Spinlock
spin_lock_init()
initialize dynamically created spinlock, ie with a pointerspin_trylock()
- try to obtain lock, if fail return 0, if succed return non-zero
/* Acquires given lock*/
spin_lock()
/** Disables local interrupts and acquires given lock */
spin_lock_irq()
/** Saves current state of local interrupts, disables local inter-
/ rupts, and acquires given lock */
spin_lock_irqsave()
/** Releases given lock */
spin_unlock()
/** Releases given lock and enables local interrupts */
spin_unlock_irq()
/** Releases given lock and restores local interrupts to given pre-
vious state */
spin_unlock_irqrestore()
/** Dynamically initializes given spinlock_t */
spin_lock_init()
/** Tries to acquire given lock; if unavailable, returns nonzero */
spin_trylock()
/** Returns nonzero if the given lock is currently acquired, other-
wise it returns zero */
spin_is_locked()
- Spin Locks and Bottom Halfs
- Bottom halfs preempt process contexts
- two tasklets of same type don’t run concurrently
- tasklet never preempts another tasklet on same processor
- interrupt context can preempt bottom halfs and process contexts
- Process context should disable bottom half preemption over contended locks
- Reader-Writer Spin Locks
- Dealing with the asymmetrical nature of reading and writing.
- Writing demands mutal exclusion
- no reading permitted
- no writing permitted
- Reading requires - write exclusion
- no writers permitted
- multiple readers ok
- Reader-Writer spinlocks increase throughput by allowing multiple readers
- Provide separate variants of read and write locks
- one or more readers can hold read locks
- only one writer and no one readers or writers can hold write lock
- alternative terminology: (reader,writer) ,(shared/exlusive) , (concurrent/exclusive)
DEFINE_RWLOCK(mr_rwlock);
// Attain read lock
read_lock(&mr_rwlock);
/* critical section (read only) ... */
read_unlock(&mr_rwlock);
// Attain write lock
write_lock(&mr_rwlock);
/* critical section (read and write) ... */
write_unlock(&mr_rwlock);
- read locks cant later be transformed to write locks without first releasing the read lock and then asking for a write lock
read_lock(&mr_rwlock);
// Dead locks will never get a write lock since the read lock will
// never get released
write_lock(&mr_rwlock);
-
reader locks are recursive, same thread can re-obtain same reader locks without deadlock
-
Thus if only readers in interrupt handlers no need to disable interrupts.
-
use
read_lock()
instead ofread_lock_irqsave()
-
Must use
write_lock_irqsave()
in interrupts
-
// Acquires given lock for reading
read_lock()
// Disables local interrupts and acquires given lock for reading
read_lock_irq()
/**
* Saves the current state of local interrupts, disables local in-
* terrupts, and acquires the given lock for reading
*/
read_lock_irqsave()
// Releases given lock for reading
read_unlock()
// Releases given lock and enables local interrupts
read_unlock_irq()
/**
* Releases given lock and restores local interrupts to the
* given previous state
*/
read_unlock_ irqrestore()
//Acquires given lock for writing
write_lock()
/**
* Disables local interrupts and acquires the given lock for
* writing
*/
write_lock_irq()
/**
* Saves current state of local interrupts, disables local inter-
* rupts, and acquires the given lock for writing
*/
write_lock_irqsave()
//Releases given lock
write_unlock()
//Releases given lock and enables local interrupts
write_unlock_irq()
/**
* Releases given lock and restores local interrupts to given
* previous state
*/
write_unlock_irqrestore()
/**
* Tries to acquire given lock for writing; if unavailable, returns
* nonzero
*/
write_trylock()
//Initializes given rwlock_t
rwlock_init()
-
Need to think of appropriate priority for writers. Else lots of readers may starve writer.
-
Spinlocks are on small timescales (nanosceconds?) For larger timescales blocking semaphores are advisable instead.
Semaphores
-
For larger timescales where putting process context to sleep is advisable (ms) scale
-
Sets processor free to execute other code.
-
Should only be obtained in
process context
cannot be used ininterrupt context
,interrupt context
is not schedulable. -
Cannot hold a
spinlock
while acquiring a semaphore. Since you may sleep and cause a deadlock on thespinlock
(how is this enforced?) -
Semaphores don’t disable kernel preemption, don’t hurt affect scheduling latency as much as spinlocks
-
Counting vs Binary Semaphores
-
Unlike spinlocks allow arbitrary number of simultaneous lock holders
-
usage count
orcount
- number of permisable simultaneous lock holders -
Binary semaphore/mutex - enforces mutual exclusion -
usage count
is one -
Counting semaphore -
usage count
greater than one -
While
count
is greater thanzero
acquiring semaphore succeeds. -
asm/semaphore.h
-
struct semaphore
key structure
-
struct semaphore name;
sema_init(&name, count);
// or alternatively for delcaring and inializing a staitc mutex
static DECLARE_MUTEX(name);
* `init_MUTEX(sem)` - initialize a dynamically created semaphore
* To try acquire use `down_*` methods.
* To release use `up_*` methods
* `down_interruptible()`
* one failure to acquire lock puts calling process in
`TASK_INTERRUPTIBLE` and sleep
* If `process` receives a `signal` then wake up and fail by
returning `-EINTR`
* `down()`
* place task in `TASK_UNINTERRUPTIBLE`, then sleep
* process will not respond to signals
* prefer `down_interruptible()` over `down()`
* `down_trylock()`
* acquire semaphore without blocking
* if lock held return non-zero
/* define and declare a semaphore, named mr_sem, with a count of one */
static DECLARE_MUTEX(mr_sem);
/* attempt to acquire the semaphore ... */
if (down_interruptible(&mr_sem)) {
/* signal received, semaphore not acquired ... */
}
/* critical region ... */
/* release the given semaphore */
up(&mr_sem);
- Semaphore Methods
/**
* Initializes the dynamically created semaphore
* to the given count
*/
sema_init(struct semaphore *, int)
/**
* Initializes the dynamically created semaphore
* with a count of one
*/
init_MUTEX(struct semaphore *)
/**
* Initializes the dynamically created semaphore
* with a count of zero (so it is initially locked)
*/
init_MUTEX_LOCKED(struct semaphore *)
/**
* Tries to acquire the given semaphore and
* enter interruptible sleep if it is contended
*/
down_interruptible (struct semaphore *)
/**
* Tries to acquire the given semaphore and
* enter uninterruptible sleep if it is contended
*/
down(struct semaphore *)
/**
* Tries to acquire the given semaphore and
* immediately return nonzero if it is contended
*/
down_trylock(struct semaphore *)
/**
* Releases the given semaphore and wakes a
* waiting task, if any
*/
up(struct semaphore *)
Reader-Writer Semaphores
- Similar to
reader-writer
spinlocks linux/rwsem.h
- Static Creation
static DECLARE_RWSEM(name);
- Dynamic Creation
init_rwsem(struct rw_semaphore *sem)
- Define mutual exclusion on writers with usage
count
1 - Allow simultaneous reads
- All readers and writers use uninterruptible sleep
// statically declare a read only semaphore
static DECLARE_RWSEM(mr_rwsem);
/* attempt to acquire the semaphore for reading ... */
down_read(&mr_rwsem);
/* critical region (read only) ... */
/* release the semaphore */
up_read(&mr_rwsem);
/* attempt to acquire the semaphore for writing ... */
down_write(&mr_rwsem);
/* critical region (read and write) ... */
/* release the semaphore */
up_write(&mr_sem);
- For non-blocking lock-acquisition use
down_read_trylock()
anddown_write_trylock()
- return non-zero if lock can be acquired
- return zero if lock cannot bet acquired
downgrade_write()
converts awrite
lock to a read lock
Mutexes
struct mutex
- used to simplify common use case of semaphores
- Static Definition
DEFINE_MUTEX(name)
- Dynamic Definition
mutex_init(&mutex)
- Locking and Unlocking Mutex
mutex_lock(&mutex);
/* critical region ... */
mutex_unlock(&mutex);
- Does not require usage counts
/**
* Locks the given mutex; sleeps if the lock is
* unavailable
*/
mutex_lock(struct mutex *)
/**
* Unlocks the given mutex
*/
mutex_unlock(struct mutex *)
/**
* Tries to acquire the given mutex; returns one if suc-
* cessful and the lock is acquired and zero otherwise
*/
mutex_trylock(struct mutex *)
/**
* Returns one if the lock is locked and zero otherwise
*/
mutex_is_locked (struct mutex *)
- Constrains Imposed on mutex usage
- usage count is always
one
- single task holds mutex
- only original locker allowed to unlock mutex
- recursive locks and unlocks are not allowed.
- process cannot exit while holding a mutex
- a mutex cannot be acquired by an interrupt handler or bottom half
- Constraints checked via
CONFIG_DEBUG_MUTEXES
- usage count is always
- Usage guidelines
- start with mutex use semaphore if constriants not statisfiable
- When to use Spinlocks vs (Semaphore , Mutex) ?
| Low overhead | Spin lock |
| Short lock hold time | Spin lock |
| Long lock hold time | Mutex is preferred. |
| Need to lock from interrupt context | Spin lock is required. |
| Need to sleep while holding lock | Mutex is required. |
Completion Variables
- Event signalling between tasks
- One task waits for signal, other task signals on completion
- On completion wake up all sleeping tasks
- Ex. Completion Variable, wake up parent when child exits
-
- See
kernel/sched.c
andkernel/fork.c
- See
-
- Static
DECLARE_COMPLETION(mr_comp);
- Dynamic
init_completion()
- To wait for completion call
wait_for_completion
- To signal completion call
complete
/**
* Initializes the given dynamically created
* completion variable
*/
init_completion(struct completion *)
/**
* Waits for the given completion variable
* to be signaled
*/
wait_for_completion(struct completion *)
/**
* Signals any waiting tasks to wake up
*/
complete(struct completion *)
BKL: The big kernel lock
- Spinlock used to ease transition to SMP
- global spinlock
- Lock dropped on sleep
- Is a recursive lock: same process multiple acquisition allowed
- Forbidden new users
- Transition away from it
lock_kernel
acquires lockunclock_kernel
releases recursivelykernel_locked
returns0
- lock already heldnon-zero
-lock not being held
linux/smp_lock.h
- problem with single lock - difficult to determine which two parties actually need to syncrhonize with each other # Sequential Locks
Sequential Locking
- Simple mechanism reading/writing shared data
- maintains a sequence counter
- when sequence counter is odd a write is taking palce
- check sequence counter is even prior to and after read to be sure no write was underway
// define a seq lock
seqlock_t mr_seq_lock = DEFINE_SEQLOCK(mr_seq_lock);
// Create a write lock region
write_seqlock(&mr_seq_lock);
/* write lock is obtained... */
write_sequnlock(&mr_seq_lock);
// Read path
unsigned long seq;
do {
seq = read_seqbegin(&mr_seq_lock);
/* read data here ... */
} while (read_seqretry(&mr_seq_lock, seq));
- Write lock always succeeds as long as no writers
-
Favor writers over readers
- Ideal cases
- Many of readers
- Few writers
- readers never starve writers
- simple data
- Ex :
seq_lock
onjiffies
u64 get_jiffies_64(void)
{
unsigned long seq;
u64 ret;
do {
seq = read_seqbegin(&xtime_lock);
ret = jiffies_64;
} while (read_seqretry(&xtime_lock, seq)); // if unsuccesful read tries again
return ret;
}
// update jiffies in timer interrupt
write_seqlock(&xtime_lock);
jiffies_64 += 1;
write_sequnlock(&xtime_lock);
Preemption Disabling
- kernel is preemptive
- spin locks disable premption for duration held
- during modification of per-processor data disable preemption to prevent corrupting in flight data
preempt_disable
andpreempt_enable
can be nested- methods maintain counts of the number of times preeption is enabled or disabled.
- If count is
0
kernel is preemptive
/**
* Disables kernel preemption by incrementing the preemp-
* tion counter
*/
preempt_disable()
/**
* Decrements the preemption counter and checks and serv-
* ices any pending reschedules if the count is now zero
*/
preempt_enable()
/**
* Enables kernel preemption but does not check for any
* pending reschedules
*/
preempt_enable_no_resched()
/* Returns the preemption count*/
preempt_count()
-
Alternatively calling
get_cpu()
to obtain an index into perprocessor data will disable kernel preemption. -
put_cpu()
will reenable kernel preemption
Ordering and Barriers
- Ensuring ordering of memory loads and stores
- Compiler and Processors like to reorder loads and stores
- Barriers prevent compiler and processor from reordering instructions
x86
does not do out of order stores- Other processors might do out of order stores
- Reorderings can have dependencies :
a = 1
b = a
-
data dependency between b and a
- static reordering : compiler
- Reflected in object code
-
dynamic reordering: processor
rmb()
- a read memory barrier
- loads before the
rmb()
will never be reordered to loads after the call read_barrier_depends()
- read barrier for loads
- only for loads where the subsequent load depends on previous load
- much quicker on some architectures
- on some architectures transforms to
noop
wmb()
- write barrier
- stores before the
wmb()
will never be reordered with stores after the call x86
does not reorder stores
mb()
- read/write barrier
- (loads and stores) before the
wmb()
will never be reordered with (loads and stores) after the call
smp_rmb
,smp_wmb
andsmp_read_barrier_depends
- turn memory barrier to compiler barrier on smp
- compiler barrires are nearly free - only preventing static rearrangement