尽管使用了Java.NIO，Java Websockets的可伸缩性仍然很差

I tried to create about 5 thousand concurrent clients (non stop sending of data from server to client) connected to my glassfish websocket server. (CPU: Double-Core, 8 GB RAM)

After about 2500 clients were connected the connection time was about 67(!) seconds and I was not able to connect more clients cause of TimeOutException.

Some facts:

Thread Pool max size was set to 12.000.
At the time of first TimeoutException I had 2500 clients and about 2450 Threads. So we speak here about one thread per connection.
It is not a memory issue!

Then a wrote two simple Websocket Proxy Server in Node.js an golang to handle websocket connections. The data exchange between Proxy Server and glassfish server happen over simple Websocket connection.

Now I was able to create over 5 thousand concurrent clients without problems. The same issue I had with Wildfly 8 and Tomcat 8. Here is the evaluation:

Now my question. Tyrus is the implementation of Websocket protocol in glassfish and uses java.nio library under the hood for non blocking I/O.

So why it has anyway so poor scalability? Or why is the scalability so different. I mean I see no advantages of java.nio

P.S. just scheduling overhead?

EDIT: How to reproduce the issue

Client software for creating and connecting Clients and the Websocket Server are on different PCs.

Start a glassfish simple echo websocket server. There are about 76 active glassfish threads.
now connect 2000 clients without initiating any communication, just connect to the server - Idle mode. In my case it there was no problem to do it. Since all clients were connected there were about 80 active threads. Nearly unchanged.
Using a Timer let all your 2000 clients start communication with the websocket server simultaneously. You will notice the amount of threads is now about 2019. The response time is about 6.5 seconds.
Finally try to connect the next 500 clients. The first TimeoutException will occur. Only few clients will be connected.
Stop simultaneously communication.

There are a number possible answers to this including:

Maybe Tyrus really doesn't scale well.
Maybe you are not using it the right way; i.e. your code is doing something that causes Tyrus to perform poorly.
Maybe you are "comparing apples and oranges"; i.e. your tests are comparing different things.
Maybe it is a memory issue. What is your evidence that it isn't?
Maybe it is due to multiple causes.

Unfortunately, you have not provided any concrete information that would allow us to separate the likely from unlikely causes.

Based on what you've reported in your updates / comments, it would appear that Tyrus is using a thread for each WebSocket connection, but the others using a more scalable approach.

Use of NIO does not necessarily imply non-blocking I/O.

In the documentation for the Tomcat 7 implementation of the WebSocket APIs, it says this:

The Java WebSocket 1.0 specification requires that callbacks for asynchronous writes are performed on a different thread to the thread that initiated the write. Since the container thread pool is not exposed via the Servlet API, the WebSocket implementation has to provide its own thread pool. This thread pool is controlled by the following servlet context initialization parameters:
org.apache.tomcat.websocket.executorCoreSize: The core size of the executor thread pool. If not set, the default of 0 (zero) is used. Note that the maximum permitted size of the executor thread pool is hard coded to Integer.MAX_VALUE which effectively means it is unlimited.
org.apache.tomcat.websocket.executorKeepAliveTimeSeconds: The maximum time an idle thread will remain in the executor thread pool until it is terminated. If not specified, the default of 60 seconds is used.

This hints as to why there are threads being created in Tyrus, and is implies that maybe Tomcat would be more scalable than Tyrus on Glassfish. (And I'd try Tyrus on Grizzly too.)