A Niggle or Two About Asynchronous Sockets And Thread Safety

Ian Griffiths finds a niggle about my post on sockets.

This may surprise a few friends of mine who regard me as a “human dictionary”, but I had to look up the word “niggle”. Apparently only the “human” part of the appellation applies. I’ve apparently fooled them by reading a lot of sci-fi fantasy and choosing to learn and use “impressive” words such as Bacchanalian in everyday conversation (“I wrote this code in a drunken stupor from a bacchanalian display of excessive beer drinking.”). It’s really all smoke and mirrors. But I digress…

His comment is quite insightful and well worth repeating here in full.

One minor niggle with this code…

Although the example is correct as it stands, it doesn’t mention an important issue: the Socket class is not thread-safe. This means that if you do use the async operations (and by the way, I’m completely with you here - I’m a big fan of the async operations) you need to take steps to synchronize access to the socket.

As it stands there’s nothing wrong with this example as far as I can see. But what if you also have an asynchronous read operation outstanding? Can you guarantee that a read and a send won’t complete simultaneously, and that you’ll be trying to access the socket from both completion handlers simultaneously.

So in practice, you tend to want to use some kind of locking to guarantee that your socket is only being used from one thread at a time, once you start using async socket IO.

(Also, you left out one of the clever parts of IO completion ports - the scheduler tracks which threads are associated with work from an IO port, and tries to make sure that you have exactly as many running as you have CPUs. If one of the threads handling work from an IO completion port blocks, the OS will release another work item from the completion port. Conversely, if loads of IO operations complete simultaneously, it only lets them out of the completion port as fast as your system can handle them, and no faster - this avoids swamping the scheduler under high load.)

I have to say, Ian’s depth of knowledge on such topics (or nearly any geek topic) never ceases to impress me. Fortunately for my app, the client socket only receives data every three seconds and never sends data back to the remotely connected socket (how boring, I know). In any case, I will double check that I am synchronizing access to the socket just in case. Perhaps I’ll use the TimedLock to do that. ;)

While we’re in the business of finding niggles (Ian, you’ve hooked me on this word. For some strange reason, I can’t stop saying it) I should also point out that IO Completion ports awaken threads from the ThreadPool in order to perform an asynchronouse action. The entire asynchronous invocation model of .NET is built on the ThreadPool. Remember that the next time you call a method that starts with “Begin” such as “BeginInvoke”. Chances are, it’s using a thread from the ThreadPool (especially if its a framework method. I’ll make no guarantees for methods written by your coworkers.)”

By default, the max threadcount for the ThreadPool is 25 per processor. In my application, the remote socket sends short packets of data on a regular interval, so the threads that handle the received data are very short lived. Sounds like an ideal use of the ThreadPool doesn’t it? However, if I were expecting a huge number of simultaneous connections, I might look into changing the machine.config file to support more than 25 ThreadPool threads per processor. Before making any such change, measure measure measure.

If you have a situation where the operations on the data are long lived, you might consider spawning a full-fledged thread to handle the remote client communications and operations. Long running operations aren’t necessarily the best place to use a thread from the .NET built in ThreadPool.

Comments

One response

Ian Griffiths • August 15th, 2004
"However, if I were expecting a huge number of simultaneous connections, I might look into changing the machine.config file to support more than 25 ThreadPool threads per processor"

Actually, even in these cases the threadpool is very often still the way to go.

Suppose you have a glut of requests all arriving in very quick succession from 100 clients. (They won't arrive simultaneously of course, unless you have managed to get 100 network adapters in your machine... But if the requests are small, and your network card is fast, then might arrive faster than you can process them, in which case they're as near to arriving at once as makes no difference.)

Given such a wodge of requests, which do you suppose will be the faster of these two approaches:

(1) Have a small number of threads processing the requests, and let the majority of the requests sit in the thread pool's queue until you're ready to process them.

(2) Attempt to handle all of them simultaneous on 100 threads.

Of course it depends on how much work you need to do to handle the request, but you said you only need to do a very short amount of processing. My guess is that for this scenario (1) will work better in that particular case. And in general, Windows doesn't really like having hundreds of runnable threads - it usually performs better if you have a small number. Indeed, server products like SQL Server and ASP.NET go out of their way to try and make sure that the number of runnable threads (i.e. threads with CPU work to do right now, and which aren't blocked waiting for something to happen) to be equal to the number of CPUs whenever possible.

If the work being done consists mostly of waiting for other stuff to happen, then it may be different. So if you have to send a request to a DB and wait for the response, then you'll spend most of your time blocking. In this case, bumping up the thread count can help things. (Although if you go too far, you may simply end up asking too much of the DB...)

But for anything where most of the work is CPU-bound, you tend to want to minimize the number of threads.

So the fact that the thread pool queues up work until there's a CPU available to do that work often improves performance.

Of course your advice - "measure measure measure" is the most important thing. I'm just pointing out that in my experience, increasing the number of concurrent threads more often makes things worse, not better. Indeed, I've known of some web sites that *reduced* the thread pool maximum from 25 down to 7 or 8, and found that their system throughput increased as a result. Mainly because the system simply wasn't able to process 25 requests simultaneously, and got better throughput by trying to process just 7 or 8 simultaneously.