为什么sync.Mutex存在？

Why sync.Mutex exists while we have sync.RWMutex? I can lock/unlock rw mutex. What is the main difference between them?

It's true that you could use a sync.RWMutex whenever you need a sync.Mutex.

I think both exist because there are a lot of cases when a sync.Mutex is enough (you don't need read and write level locking), and the implementation of sync.Mutex is simpler: requires much less memory and is most likely faster.

sync.Mutex is just 8 bytes:

type Mutex struct {
    state int32
    sema  uint32
}

While sync.RWMutex is 8 + 16 = 24 bytes (it includes a sync.Mutex):

type RWMutex struct {
    w           Mutex  // held if there are pending writers
    writerSem   uint32 // semaphore for writers to wait for completing readers
    readerSem   uint32 // semaphore for readers to wait for completing writers
    readerCount int32  // number of pending readers
    readerWait  int32  // number of departing readers
}

Yes, you could say 8 or 24 bytes should not matter. And it doesn't as long as you only have a few mutexes.

But it's not uncommon to put the mutex into the struct it's ought to protect (either embed or a regular, named field). Now if you have a slice of these struct values, maybe even thousands of them, then yes, it will make a noticeable difference.

Also, if you just need a mutex, sync.Mutex gives you less chance of misusing it (you can't accidentally call RLock() because it doesn't have that method).

A part from taking more space as mentioned by @icza, it is also less efficient in terms of execution time.

If we look at the source code of RWMutex.Lock:

// Lock locks rw for writing.
// If the lock is already locked for reading or writing,
// Lock blocks until the lock is available.
func (rw *RWMutex) Lock() {
    if race.Enabled {
        _ = rw.w.state
        race.Disable()
    }
    // First, resolve competition with other writers.
    rw.w.Lock()
    // Announce to readers there is a pending writer.
    r := atomic.AddInt32(&rw.readerCount, -rwmutexMaxReaders) + rwmutexMaxReaders
    // Wait for active readers.
    if r != 0 && atomic.AddInt32(&rw.readerWait, r) != 0 {
        runtime_SemacquireMutex(&rw.writerSem, false)
    }
    if race.Enabled {
        race.Enable()
        race.Acquire(unsafe.Pointer(&rw.readerSem))
        race.Acquire(unsafe.Pointer(&rw.writerSem))
    }
}

We can see that it calls a Mutex.Lock, hence it takes the same time of Mutex.Lock plus all the other stuff that it does.

We see a call to atomic.AddInt32, runtime_SemacquireMutex and other methods of the object race, this will have an overhead.