使用WMI的CoInitializeEx（COINIT_MULTITHREADED）和Goroutines

We have a monitoring agent written in Go that uses a number of goroutines to gather system metrics from WMI. We recently discovered the program was leaking memory when the go binary is run on Server 2016 or Windows 10 (also possibly on other OS using WMF 5.1). After creating a minimal test case to reproduce the issue it seems that the leak only occurs if you make a large number of calls to the ole.CoInitializeEx method (possibly something changed in WMF 5.1 but we could not reproduce the issue using the python comtypes package on the same system).

We are using COINIT_MULTITHREADED for multithread apartment (MTA) in our application, and my question is this: Because we are issuing OLE/WbemScripting calls from various goroutines, do we need to call ole.CoInitializeEx just once on startup or once in each goroutine? Our query code already uses runtime.LockOSThread to prevent the scheduler from running the method on different OS threads, but the MSDN remarks on CoInitializeEx seem to indicate it must be called at least once on each thread. I am not aware of any way to make sure new goroutines run on an already initialized OS thread, so multiple calls to CoInitializeEx seemed like the correct approach (and worked fine for the last few years).

We have already refactored the code to do all the WMI calls on a dedicated background worker, but I am curious to know if our original code should work using only one CoInitializeEx at startup instead of once for every goroutine.

AFAIK, since Win32 API is defined only in terms of native OS threads, a call to CoInitialize[Ex]() only ever affects the thread it completed on.

Since the Go runtime uses free M×N scheduling of the goroutines to OS threads, and these threads are created / deleted as needed at runtime in a manner completely transparent to the goroutines, the only way to make sure the CoInitialize[Ex]() call has any lasting effect on the goroutine it was performed on is to first bind that goroutine to its current OS thread by calling runtime.LockOSThread() and doing this for every goroutine intended to do COM calls.

Please note that this basically creates an 1×1 mapping between goroutines and OS threads which defeats much of the purpose of goroutines to begin with. So supposedly you might want to consider having just a single goroutine calling into COM and listening for requests on a channel, or having a pool of such worker goroutines hidden behing another one which would dispatch the clients' requests onto the workers.

Update regarding COINIT_MULTITHREADED.

To cite the docs:

Multi-threading (also called free-threading) allows calls to methods of objects created by this thread to be run on any thread. There is no serialization of calls — many calls may occur to the same method or to the same object or simultaneously. Multi-threaded object concurrency offers the highest performance and takes the best advantage of multiprocessor hardware for cross-thread, cross-process, and cross-machine calling, since calls to objects are not serialized in any way. This means, however, that the code for objects must enforce its own concurrency model, typically through the use of synchronization primitives, such as critical sections, semaphores, or mutexes. In addition, because the object doesn't control the lifetime of the threads that are accessing it, no thread-specific state may be stored in the object (in Thread Local Storage).

So basically the COM threading model has nothing to do with initialization of the threads theirselves—but rather with how the COM subsystem is allowed to call the methods of the COM objects you create on the COM-initialized threads.

IIUC, if you will COM-initialize a thread as COINIT_MULTITHREADED and create some COM object on it, and then pass its reference to some outside client of that object so that it is able to call that object's methods, those methods can be called by the OS on any thread in your process.

I really have no idea how this is supposed to interact with Go runtime, so I'd start small and would use a single thread with STA model and then maybe try to make it more complicated if needed.

On the other hand, if you only instantiate external COM objects and not transfer their descriptors outside (and it appears that's the case), the threading model should not be relevant. That is, only unless some code in the WUA API would call some "event-like" method on a COM object you have instantiated.