When I say RPC request and response, that means the request and response were serialized/deserialized with messagepack. It's just message passing in the end.
ZeroMQ is fast I mean really really fast. ZeroMQ is a networking library and should be part of any toolkit where you need extreme high performance and reliability between applications.
Let's do some math
140k requests per second + 140k responses per second = 280k req/response total. That's 2.8 million in a 10 second span. This measly server can handle much more than this.
There are 1e+9 nanoseconds in a second. 1e+9ns/280k is 3.57 microseconds per request/response average. Less than 5 microseconds spent per request/response cycle on average.
For a 10 second period, it's around 4.8MB/s send throughput.
I may be able to eke out more on this setup.
Details
This is a Fedora 36 dell laptop with an Intel processor connected to the same router via WIFI to a very slim box home server. Might as well call it The Box 3 by Gavin B. I kid.
You might ask how many threads? There's only one ZMQ IO thread in the ZMQ context. Client has 2 sender thread and 2 receiver worker threads that move data from the proxy to you then 1 thread to move data over to and from the frontend socket. Server side, 4 receiver threads which also act as senders a.k.a reply then 1 thread to move data to and from the frontend socket. Not many threads are needed to move data from you to the server then back. The number of threads matter more on the server side since it has to handle many clients.
The program is written in rust (uses libzmq), takes around 40mb of ram and is also using my custom async RPC library designed with a multi-threaded queue in front of the senders connected to an in-process ZMQ socket that is proxied to a frontend (outgoing/incoming) ZMQ socket and vice versa on the server side. A little simplification of the architecture, but stems from my past ZeroMQ post where I described the architecture. It's unchanged, I decided to benchmark it in Rust. I mean I made some improvements code wise (moved to async) and added a "burst mode" which helped double the number of req/resps the linux server could pump out.
Conclusion
I did try on a Macbook M1, but I guess the networking stack suuuucks. My guess is ZMQ uses epoll on linux which is why it is so fast while using whatever Mac has, I guess kqueue? kpoll? iPoll? Who cares, it suuuuucks. Never use Macs over Linux for performance testing clients especially if your application is linux server to linux server.
My implementations are in JS, Java and Rust. Rust is the only one where I implemented a little more, but I'll get them to parity again. 60 microseconds on JVM, probably version 12, was claimed in the previous post, so I expected rust to be quicker (but not this quick!) since it uses libzmq directly. I'll need to test Java on these newer laptops to see how the performance is.
Who says ZMQ is dead? It's just stable. The cargo culting of GRPC is beyond words. The first thing engineers think now is GRPC when it comes to RPC. It's disgusting. It's weird to me how people complain about Google retiring products, but are two faced when it comes to Google Infrastructure projects like GRPC.
ZMQ just works and the server purrs through. Nah, you won't achieve this with GRPC, regular HTTP, or hell your own TCP or UDP library. ZMQ has been around for a long time and many brains have contributed to ZMQ, so trying to beat it naively with TCP or UDP? Ha!
Use ZMQ for large scale service to service infrastructure where you need low latency and high performance like a race car.