Lock Striping - A Practical Concurrency Pattern (with Go examples)

I was reading the code of this Redis implementation in Go using threads and trying to understand the various design principles applied and data structures used. One of the data structures used is ConcurrentDict, which is a thread-safe concurrent hash map implementation using lock striping/sharding for high-performance concurrent access in Go.

So I became curious about lock striping design. It is certainly something that the creators of Redis didn’t have to think about, given that Redis is a single-threaded system.

Under the hood, Godis uses 65,536 concurrent maps. I’m not sure how the creator came up with that number, but ChatGPT says

65,536 shards (1 map = 1 shard) = very fine-grained locking
2^16 = power of 2 → fast bit-mask spread function
High parallelism: 65K concurrent operations possible
Memory overhead: ~65K mutex structs (~1-2 MB)

Sounds reasonable. So why do we need this data structure and how does lock sharding help us?

The Problem: Lock Contention

Without Lock Striping (Naive Approach)

// BAD: Single lock for entire data structure
type NaiveDict struct {
    data  map[string]interface{}
    mutex sync.RWMutex  // ONE LOCK FOR EVERYTHING
}

func (d *NaiveDict) Get(key string) interface{} {
    d.mutex.RLock()         // ← Bottleneck!
    defer d.mutex.RUnlock()
    return d.data[key]
}

Problem:

Thread 1: Get("user:1")  ─┐
Thread 2: Get("user:2")  ─┼─► ALL wait for the SAME lock
Thread 3: Get("user:3")  ─┤
Thread 4: Get("user:4")  ─┘

Even though they're accessing DIFFERENT keys!

Result: Serial execution, little real concurrency.

The Solution: Lock Striping

Core Idea

Instead of one lock for the entire structure, use many locks, each protecting a subset (stripe/shard) of the data.

// ConcurrentDict is thread safe map using sharding lock
type ConcurrentDict struct {
    table      []*shard    // Array of shards (stripes)
    count      int32
    shardCount int
}

type shard struct {
    m     map[string]interface{}  // Each stripe has its own data
    mutex sync.RWMutex            // Each stripe has its own lock
}

Visual Explanation

Without striping:

┌─────────────────────────────────────────┐
│           ONE BIG MAP                   │
│  {"user:1": ..., "user:2": ..., ...}   │
│                                         │
│         Protected by ONE LOCK 🔒        │
└─────────────────────────────────────────┘
       ↑
   Bottleneck: Everyone waits here!

With striping:

┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│   Shard 0         │ │   Shard 1         │ │   Shard 2         │
│ map: {...}        │ │ map: {...}        │ │ map: {...}        │
│ lock: 🔒          │ │ lock: 🔒          │ │ lock: 🔒          │
└───────────────────┘ └───────────────────┘ └───────────────────┘
       ↑                      ↑                      ↑
   Independent locks - less contention between shards!

How It Works in Code

Step 1: Hash Key to Stripe

const prime32 = uint32(16777619)

func fnv32(key string) uint32 {
    hash := uint32(2166136261)
    for i := 0; i < len(key); i++ {
        hash *= prime32
        hash ^= uint32(key[i])
    }
    return hash
}

func (dict *ConcurrentDict) spread(key string) uint32 {
    if len(dict.table) == 1 {
        return 0
    }
    hashCode := fnv32(key)
    tableSize := uint32(len(dict.table))
    return (tableSize - 1) & hashCode  // Map to stripe index
}

Example:

"user:123" → hash → 42173 → stripe[42173]
"user:456" → hash → 8956  → stripe[8956]
"user:789" → hash → 15234 → stripe[15234]

Step 2: Lock Only the Relevant Stripe

// Get returns the value and existence
func (dict *ConcurrentDict) Get(key string) (val interface{}, exists bool) {
    index := dict.spread(key)   // Find which stripe
    s := dict.table[index]
    s.mutex.Lock()              // Lock ONLY that stripe
    defer s.mutex.Unlock()
    val, exists = s.m[key]
    return
}

Concurrency Scenarios

Different keys landing on different stripes execute in parallel; keys hashing to the same stripe serialize on that stripe’s lock.

Final Thoughts

Quick Note on architecture difference

Redis: Single-threaded, no locks, super fast C code
Godis: 65,536 shards with independent locks, parallel access

While browsing the code, I also noticed a linked list implementation, which powers Redis LIST commands. That’s next on my reading list.

Reference:

godis repository
ConcurrentDict code

package dict

import (
    "github.com/hdt3213/godis/lib/wildcard"
    "math"
    "math/rand"
    "sort"
    "sync"
    "sync/atomic"
    "time"
)

// ConcurrentDict is thread safe map using sharding lock
type ConcurrentDict struct {
    table      []*shard
    count      int32
    shardCount int
}

type shard struct {
    m     map[string]interface{}
    mutex sync.RWMutex
}

func computeCapacity(param int) (size int) {
    if param <= 16 {
        return 16
    }
    n := param - 1
    n |= n >> 1
    n |= n >> 2
    n |= n >> 4
    n |= n >> 8
    n |= n >> 16
    if n < 0 {
        return math.MaxInt32
    }
    return n + 1
}

// MakeConcurrent creates ConcurrentDict with the given shard count
func MakeConcurrent(shardCount int) *ConcurrentDict {
    if shardCount == 1 {
        table := []*shard{{m: make(map[string]interface{})}}
        return &ConcurrentDict{count: 0, table: table, shardCount: shardCount}
    }
    shardCount = computeCapacity(shardCount)
    table := make([]*shard, shardCount)
    for i := 0; i < shardCount; i++ {
        table[i] = &shard{m: make(map[string]interface{})}
    }
    return &ConcurrentDict{count: 0, table: table, shardCount: shardCount}
}

const prime32 = uint32(16777619)

func fnv32(key string) uint32 {
    hash := uint32(2166136261)
    for i := 0; i < len(key); i++ {
        hash *= prime32
        hash ^= uint32(key[i])
    }
    return hash
}

func (dict *ConcurrentDict) spread(key string) uint32 {
    if dict == nil { panic("dict is nil") }
    if len(dict.table) == 1 { return 0 }
    hashCode := fnv32(key)
    tableSize := uint32(len(dict.table))
    return (tableSize - 1) & hashCode
}

func (dict *ConcurrentDict) getShard(index uint32) *shard {
    if dict == nil { panic("dict is nil") }
    return dict.table[index]
}

func (dict *ConcurrentDict) Get(key string) (val interface{}, exists bool) {
    if dict == nil { panic("dict is nil") }
    index := dict.spread(key)
    s := dict.getShard(index)
    s.mutex.Lock()
    defer s.mutex.Unlock()
    val, exists = s.m[key]
    return
}

func (dict *ConcurrentDict) GetWithLock(key string) (val interface{}, exists bool) {
    if dict == nil { panic("dict is nil") }
    index := dict.spread(key)
    s := dict.getShard(index)
    val, exists = s.m[key]
    return
}

func (dict *ConcurrentDict) Len() int {
    if dict == nil { panic("dict is nil") }
    return int(atomic.LoadInt32(&dict.count))
}

func (dict *ConcurrentDict) Put(key string, val interface{}) (result int) {
    if dict == nil { panic("dict is nil") }
    index := dict.spread(key)
    s := dict.getShard(index)
    s.mutex.Lock()
    defer s.mutex.Unlock()
    if _, ok := s.m[key]; ok { s.m[key] = val; return 0 }
    dict.addCount()
    s.m[key] = val
    return 1
}

func (dict *ConcurrentDict) PutIfAbsent(key string, val interface{}) (result int) {
    if dict == nil { panic("dict is nil") }
    index := dict.spread(key)
    s := dict.getShard(index)
    s.mutex.Lock(); defer s.mutex.Unlock()
    if _, ok := s.m[key]; ok { return 0 }
    s.m[key] = val
    dict.addCount()
    return 1
}

func (dict *ConcurrentDict) PutIfExists(key string, val interface{}) (result int) {
    if dict == nil { panic("dict is nil") }
    index := dict.spread(key)
    s := dict.getShard(index)
    s.mutex.Lock(); defer s.mutex.Unlock()
    if _, ok := s.m[key]; ok { s.m[key] = val; return 1 }
    return 0
}

func (dict *ConcurrentDict) Remove(key string) (val interface{}, result int) {
    if dict == nil { panic("dict is nil") }
    index := dict.spread(key)
    s := dict.getShard(index)
    s.mutex.Lock(); defer s.mutex.Unlock()
    if val, ok := s.m[key]; ok { delete(s.m, key); dict.decreaseCount(); return val, 1 }
    return nil, 0
}

func (dict *ConcurrentDict) addCount() int32 { return atomic.AddInt32(&dict.count, 1) }
func (dict *ConcurrentDict) decreaseCount() int32 { return atomic.AddInt32(&dict.count, -1) }

// ForEach traversal
func (dict *ConcurrentDict) ForEach(consumer func(string, interface{}) bool) {
    for _, s := range dict.table {
        s.mutex.RLock()
        func() bool {
            defer s.mutex.RUnlock()
            for key, value := range s.m {
                if !consumer(key, value) { return false }
            }
            return true
        }()
    }
}

func (dict *ConcurrentDict) Keys() []string {
    keys := make([]string, dict.Len())
    i := 0
    dict.ForEach(func(key string, val interface{}) bool {
        if i < len(keys) { keys[i] = key; i++ } else { keys = append(keys, key) }
        return true
    })
    return keys
}

func (shard *shard) RandomKey() string {
    shard.mutex.RLock(); defer shard.mutex.RUnlock()
    for key := range shard.m { return key }
    return ""
}

func (dict *ConcurrentDict) RandomKeys(limit int) []string {
    size := dict.Len()
    if limit >= size { return dict.Keys() }
    shardCount := len(dict.table)
    result := make([]string, limit)
    nR := rand.New(rand.NewSource(time.Now().UnixNano()))
    for i := 0; i < limit; {
        s := dict.getShard(uint32(nR.Intn(shardCount)))
        if s == nil { continue }
        key := s.RandomKey()
        if key != "" { result[i] = key; i++ }
    }
    return result
}

func (dict *ConcurrentDict) RandomDistinctKeys(limit int) []string {
    size := dict.Len(); if limit >= size { return dict.Keys() }
    shardCount := len(dict.table)
    result := make(map[string]struct{})
    nR := rand.New(rand.NewSource(time.Now().UnixNano()))
    for len(result) < limit {
        shardIndex := uint32(nR.Intn(shardCount))
        s := dict.getShard(shardIndex)
        if s == nil { continue }
        key := s.RandomKey()
        if key != "" { if _, exists := result[key]; !exists { result[key] = struct{}{} } }
    }
    arr := make([]string, limit); i := 0
    for k := range result { arr[i] = k; i++ }
    return arr
}

func (dict *ConcurrentDict) Clear() { *dict = *MakeConcurrent(dict.shardCount) }

func (dict *ConcurrentDict) toLockIndices(keys []string, reverse bool) []uint32 {
    indexMap := make(map[uint32]struct{})
    for _, key := range keys { indexMap[dict.spread(key)] = struct{}{} }
    indices := make([]uint32, 0, len(indexMap))
    for index := range indexMap { indices = append(indices, index) }
    sort.Slice(indices, func(i, j int) bool { if !reverse { return indices[i] < indices[j] }; return indices[i] > indices[j] })
    return indices
}

func (dict *ConcurrentDict) RWLocks(writeKeys []string, readKeys []string) {
    keys := append(writeKeys, readKeys...)
    indices := dict.toLockIndices(keys, false)
    writeIndexSet := make(map[uint32]struct{})
    for _, wKey := range writeKeys { writeIndexSet[dict.spread(wKey)] = struct{}{} }
    for _, index := range indices {
        _, w := writeIndexSet[index]
        mu := &dict.table[index].mutex
        if w { mu.Lock() } else { mu.RLock() }
    }
}

func (dict *ConcurrentDict) RWUnLocks(writeKeys []string, readKeys []string) {
    keys := append(writeKeys, readKeys...)
    indices := dict.toLockIndices(keys, true)
    writeIndexSet := make(map[uint32]struct{})
    for _, wKey := range writeKeys { writeIndexSet[dict.spread(wKey)] = struct{}{} }
    for _, index := range indices {
        _, w := writeIndexSet[index]
        mu := &dict.table[index].mutex
        if w { mu.Unlock() } else { mu.RUnlock() }
    }
}

func stringsToBytes(strSlice []string) [][]byte {
    byteSlice := make([][]byte, len(strSlice))
    for i, str := range strSlice { byteSlice[i] = []byte(str) }
    return byteSlice
}

func (dict *ConcurrentDict) DictScan(cursor int, count int, pattern string) ([][]byte, int) {
    size := dict.Len()
    result := make([][]byte, 0)
    if pattern == "*" && count >= size { return stringsToBytes(dict.Keys()), 0 }
    matchKey, err := wildcard.CompilePattern(pattern)
    if err != nil { return result, -1 }
    shardCount := len(dict.table)
    shardIndex := cursor
    for shardIndex < shardCount {
        shard := dict.table[shardIndex]
        shard.mutex.RLock()
        if len(result)+len(shard.m) > count && shardIndex > cursor {
            shard.mutex.RUnlock()
            return result, shardIndex
        }
        for key := range shard.m {
            if pattern == "*" || matchKey.IsMatch(key) { result = append(result, []byte(key)) }
        }
        shard.mutex.RUnlock()
        shardIndex++
    }
    return result, 0
}