2009-06-30

Application Protocol Specifications

(This post is part of the TCP/IP .NET Sockets FAQ)

When designing an application protocol, one should publish an application protocol specification document. Having a clearly-defined specification helps prevent errors on both sides.

Versioning

The application protocol specification document should include the protocol version number to which it applies. Protocols change over time as additional requirements are added.

There should also be a way for the protocol to perform some form of version negotiation. Usually, it is enough to have one side send a list of supported versions, and have the other side respond with the chosen version.

This is a bit of up-front work, but allows partial upgrades in the future without breaking backwards compatibility. When two separate vendors or teams are producing applications on different sides of the protocol, or if the protocol is an open specification, then version negotiation becomes much more important.

Terminology

The most important words in a specification are "must" and "may". When used consistently, these terms convey specific meanings. "Must" is used when an implementation absolutely must obey the specification. "May" is used when an implementation optionally may obey the specification.

When possible, use long-established terminology. The key reference for this is RFC 2119, which unambiguously defines MUST, MAY, SHOULD, etc. However, other standards often come into play; e.g., the Unicode standard has unambiguous definitions for "character", "code point", and "encoding", which are important to distinguish when writing an unambiguous protocol specification. Any special terms should be identified in the document, along with a reference to the defining standard.

Server and Client: First Contact

The first question that is often answered when writing a TCP/IP protocol is: who contacts whom? More specifically, one side must be chosen as the server and the other side as the client. In some cases, the choice of client and server sides is obvious. For other applications, it really doesn't matter which side is chosen for which role. Very loosely coupled applications (following more of a peer-to-peer model) may even act as a client, server, or both (for the same protocol).

Note that client and server only have meaning when the connection is being established. Once the TCP/IP connection is established, it will allow either side to send data to the other side at any time.

Usually, it is the responsibility of the server side to accept any incoming connections at any time; and it is the responsibility of the client side to retry dropped connections after a timeout. This timeout may be specified in the application protocol document, or it may be left as an implementation detail.

Choosing the Port

The application protocol document should include the port number used for that protocol. Choosing a port number should be done with care; one must consider reserved port ranges as well as ephemeral port ranges. Ephemeral port ranges must be considered because any random client socket may be given a port in that range, and a server would be unable to bind on its port if that port was already being used by a client socket.

The Internet Assigned Numbers Authority has reserved ports 0-1023 for specific, well-known protocols. A port in this range should never be used unless it is registered with IANA.

IANA has also reserved ports 1024-49151 in a similar manner (requiring registration). However, most people ignore this, and treat the 1024-49151 port range as available except for their ephemeral port ranges.

Ephemeral port ranges are trickier, since different operating systems use different ranges. Windows systems use 1025 to 5000 by default, but the upper value may be changed via the registry.

In short, private Windows protocols (used only within a certain network) may pick a port from the range 5001-65535, with preference given to higher port numbers (so that individual machines may increase their MaxUserPort registry setting). If Linux compatibility is necessary, the range becomes 5001-32767 and 61001-65535, again prefering higher port numbers.

Public (published) protocols should be registered with IANA and use the assigned port in the 1024-49151 range. As of this writing, both Windows' and Linux's ephemeral port ranges overlap with this reserved range, so some extra action may need to be taken to prevent any possibility of conflicts (i.e., Windows' ReservedPorts registry key; see KB812873 or The Cable Guy, Dec 2005).

Note: It is highly recommended that the port be configurable by the end user or administrator. Currently, there are not many "well-behaved" programs when it comes to choosing ports, so it is greatly beneficial to give the network admin the ability to change the port.

(This post is part of the TCP/IP .NET Sockets FAQ)

2009-06-19

Threadsafe Events

Disclaimer: This blog entry only deals with the common case of instance events; static events are ignored. Furthermore, the contents of this blog entry are 100% my own opinion. However, it is the opinion of someone who has specialized in multithreading for 13 years.

When writing components in a multithreaded world, one question that commonly crops up is, "how do I make my events threadsafe?" The asker is usually concerned with threadsafe subscription and unsubscription, but threadsafe raising must also be taken into consideration.

The Wrong Solution #1, from the C# Language Specification

The C# language authors attempted to make event subscription and unsubscription threadsafe by default. To do so, they allow (but do not require) locking on this, which is generally considered bad. This code:

public event MyEventHandler MyEvent;

Logically becomes this code:

private MyEventHandler __MyEvent;
public event MyEventHandler MyEvent
{
    add
    {
        lock (this)
        {
            this.__MyEvent += value;
        }
    }

    remove
    {
        lock (this)
        {
            this.__MyEvent -= value;
        }
    }
}

Chris Burrows, a Microsoft developer on the C# compiler team, explains why this is bad in his blog post Field-like Events Considered Harmful. His blog post covers the reasoning thoroughly, so it won't be repeated here.

Minor rant: The Java language fell into the same trap; see Practical API Design's Java Monitor page. Why is it that some language designers believe they can declaratively solve multithreading problems? If the solution was really as simple as that, then why haven't other people already figured it out? Multithreaded programming has consumed some of the brightest minds for decades, and it's hard. Language designers can't make multithreading complexities go away by sprinkling some magical fairy powder, even if they name the powder "MethodImplOptions.Synchronized". In fact, most of the time they're just making it worse.

Pretend for a minute that locking on this is OK. It actually would work, after all; it just raises the likelihood of unexpected deadlocks. It's also possible that a future C# compiler may lock on a super-secret private field instead of this. However, even if the implementation is OK, the design is still flawed. The problem becomes clear when one ponders how to raise the event in a threadsafe manner.

This is the standard, simple, and logical event-raising code:

if (this.MyEvent != null)
{
    this.MyEvent(this, args);
}

If there are multiple threads subscribing to and unsubscribing from an event, then the built-in field-like event locking works only for subscribing and unsubscribing. The event raising code exposes a problem: if another thread unsubscribes from the event after the if statement but before the event is raised, then this code may result in a NullReferenceException!

So, it turns out that "thread-safe" events weren't really thread-safe. Moving on...

The Wrong Solution #2, from the Framework Design Guidelines and MSDN

One solution to the problem described above is to make a copy of the event delegate before testing it. The event raising code becomes:

MyEventHandler myEvent = this.MyEvent;
if (myEvent != null)
{
    myEvent(this, args);
}

This is the solution used by MSDN examples and recommended by the semi-standard Framework Design Guidelines (my 2nd edition has it on page 157, but the relevant section of the book is available online here).

This solution is simple, obvious, and wrong. [By the way, I'm not dissing Framework Design Guidelines. They have lots of good advice, and I don't mean to be critical of the book in general. They're just mistaken in this particular recommendataion.]

Programmers without a strong background in multithreaded programming may not immediately detect why this solution is wrong. Delegates are immutable reference types, so the local variable copy is atomic; no problem there. The problem exists in the memory model: it is possible that an out-of-date value for the delegate field is held in one processor's cache. Without going into a painful level of detail, in order to ensure that one is reading the current value of a non-volatile field, one must either issue a memory barrier or wrap the copy operation within a lock (and it must be the same lock acquired by the event add/remove methods).

In short, this solution does prevent the NullReferenceException race condition; but it introduces another race condition in its place (raising an unsubscribed event handler).

The Wrong Solution #3, from Jon Skeet

[OK, let me say this first: Jon Skeet is an awesome programmer. I highly recommend his book C# in Depth to anyone and everyone using C# (I own the first edition and will buy the 2nd as soon as it comes out; he's writing it now and I'm so excited!). I follow his blog. I highly respect the man, and I can't believe my first mention of him on my blog is in a negative light... However, he did come up with a wrong solution for thread-safe events. To give him credit, though, he ended his paper recommending the right solution!]

Jon Skeet has a great treatment of this subject in his paper Delegates and Events (you may wish to skip to the section titled "Thread-safe events"). He covers everything that I've described above, but then proceeds on to propose another wrong solution. He dislikes the memory barrier solution (as do I), and attempts to solve it by wrapping the copy operation within the lock. As Jon points out, the event add/remove methods may lock this or they could lock something else (remember, a future C# compiler may choose to lock on a super-secret private field instead). So, the default add/remove methods have to be replaced with ones that perform an explicit lock, as such:

private object myEventLock = new object();
private MyEventHandler myEvent;
public MyEventHandler MyEvent
{
    add
    {
        lock (this.myEventLock)
        {
            this.myEvent += value;
        }
    }

    remove
    {
        lock (this.myEventLock)
        {
            this.myEvent -= value;
        }
    }
}

protected virtual OnMyEvent(MyEventArgs args)
{
    MyEventHandler localMyEvent;
    lock (this.myEventLock)
    {
        localMyEvent = this.myEvent;
    }

    if (localMyEvent != null)
    {
        localMyEvent(this, args);
    }
}

That's a fair amount of code for a single event! Some people have even written helper objects to reduce the amount of code. Before jumping on that bandwagon, though, remember that this solution is also wrong.

There is still a race condition.

Specifically, it is possible that the value of myEvent is modified after it has been read into localMyEvent but before it is raised. This can result in an unsubscribed handler being invoked, which could be problematic. So, this solution does solve the last solution's problem (with the memory model and processor cache), but it turns out there was an underlying race condition anyway (this same problem does affect the other two solutions above, too).

The Wrong Solution #4, from Nobody (but just in case you were thinking about it!)

A natural response is to extend the lock statement in Jon's code to include raising the event. That does prevent the race condition problem from all the solutions described above, but it introduces a more serious problem.

If this solution is used, then an event handler cannot wait on another thread that is attempting to subscribe or unsubscribe a handler to the same event. In other words, it's the original "unexpected deadlock" story (the same reason why locking on this is bad). Jon does make a note of this in Delegates and Events.

To my knowledge, no one has proposed this as a solution. In general, the community seems to favor solutions that fail "loudly" (with exceptions) instead of failing "silently" (with a deadlock).

Why All Solutions are Wrong, by Stephen Cleary (that's me!)

"Callbacks" (usually events in C#) have always been problematic for multithreaded programming. This is because a good rule of thumb for component design is: Do your best to allow the event handler to do anything. Since "communicate with another thread that is attempting to take any lock" is one example of "anything", a natural corollary of this rule is: Never hold locks during callbacks.

This is the reasoning behind why locking on exposed objects (such as this) is considered bad practice (see MSDN: lock Statement). Holding that lock while raising an event (such as solution 4 does) makes the bad practice even worse.

To review, all the solutions above fail in one of two situations.

Solutions 1-3 above all fail the same use case:

  • Thread A will raise the event.
  • Thread B subscribes a handler to the event. The handler code depends on a resource.
  • Thread A begins to raise the event. Immediately before the delegate is invoked, Thread A is preempted by Thread B.
  • Thread B no longer needs the event notification, so it unsubscribes the handler from the event and disposes of the resource.
  • Thread A proceeds to raise the event (that has been unsubscribed). The handler code depends on a resource that has now been disposed.

Solution 4 fails this use case:

  • Thread A will raise the event.
  • Thread B subscribes a handler to the event. The handler code communicates with Thread C.
  • Thread A begins to raise the event. Immediately before the delegate is invoked, Thread A is preempted by Thread C.
  • Thread C subscribes a handler to the event. Thread C blocks.
  • Thread A proceeds to raise the event. The handler code cannot communicate with Thread C because it is blocked.

A general-purpose "thread-safe event" solution does not exist - at least, not using the synchronization primitives we currently have at our disposal. The implementation must either have a race condition or deadlock possibility. A lock can prevent contention (solving the race condition), but only if it is held during the raising of the event (possibly causing deadlock). Alternatively, an unadorned raised event does not have the possibility of a deadlock, but loses the guarantees of the lock (causing a race condition).

A general-purpose solution does not exist, but it is possible to solve the problem for a specific event by imposing special requirements on the user. Some of the solutions above may work if one places restrictions on the event handlers.

Solution 3 (and solution 2 in Microsoft's current implementations) works if the event handler is coded to handle the situation where it is invoked after it has unsubscribed from the event. It is not difficult to write a handler this way; asynchronous callback contexts would help with the implementation. The drawback is that each event handler must include multithread-awareness code, which complicates the method.

Solution 4 may also be made to work if the event handler does not block on a thread that subscribes to or unsubscribes from that same event. For simplicity, APIs that take this route often just state that event handlers may not block. The drawback is that this can be difficult to guarantee, since many objects hide their locking logic from their callers.

Conclusion

A general-purpose solution does not exist, and all other solutions have serious drawbacks (placing severe restrictions on the actions available to the event handler).

For this reason, I recommend the same approach that Jon Skeet ends up recommending at the end of Delegates and Events: "don't do that", i.e., don't use events in a multithreaded fashion. If an event exists on an object, then only one thread should be able to subscribe to or unsubscribe from that event, and it's the same thread that will raise the event.

One nice side effect of this approach is that the code becomes much simpler:

public event MyEventHandler MyEvent;

protected virtual OnMyEvent(MyEventArgs args)
{
    if (this.MyEvent != null)
    {
        this.MyEvent(this, args);
    }
}

Efficiency freaks can go one step further and explicitly implement the backing field, add handler, and remove handler. By removing the default locking (which is useless), the code is more explicit but also more efficient:

private MyEventHandler myEvent;
public event MyEventHandler MyEvent
{
    add
    {
        this.myEvent += value;
    }

    remove
    {
        this.myEvent -= value;
    }
}

protected virtual OnMyEvent(MyEventArgs args)
{
    if (this.myEvent != null)
    {
        this.myEvent(this, args);
    }
}

Another side effect is that this type of event handling forces one towards Event-Based Asynchronous Programming (or something very similar to it). EBAP is a logical conclusion for asynchronous object design, yielding maximal reusability. EBAP is also more consistent with regards to normal object concurrency restrictions: "Public static members of this type are thread safe. Any instance members are not guaranteed to be thread safe." Events that can only be accessed by one thread follow this common pattern; the event, as an instance member, is not guaranteed to be thread safe.

A third side effect takes longer to realize: more correct communication among threads. Instead of various threads directly subscribing to events (which would be run on another thread anyway), one must implement some form of thread communication. This forces the programmer to more clearly state the requirements from each thread's perspective, and this in turn results in less buggy multithreading code. Usually, more appropriate ways for thread communciation are found. The event subscription model is naturally discarded as a thread communication method (due to its inherent unsuitability) in favor of much more proven design patterns. This will eventually result in more correct multithreading code, though the process requires a minor redesign.

A Final Note

As of this writing, it is still popular to promote solution 2 (copy the delegate before raising the event). However, I strongly discourage this practice; it makes the code more obscure, and provides a false sense of security because it does not solve the problem! It is far better to simply not have "thread-safe events".

2009-06-13

Using Socket as a Connected Socket

(This post is part of the TCP/IP .NET Sockets FAQ)

A connected socket is one which has a connection to the remote side. When a client socket connects to a listening server socket, the result is two connected sockets: the client socket becomes connected, and the listening server creates a new socket that is connected. For more details about establishing or listening for socket connections, see Using Socket as a Client Socket and Using Socket as a Server (Listening) Socket.

Important note: A socket only believes it is currently connected; it can never know for sure. It is possible for one side of a connection to realize it is no longer connected, while the other side continues believing it is connected. This is called the "half-open problem", and is covered in detail in Detection of Half-Open (Dropped) Connections.

There are two primary operations performed on connected sockets: Read and Write. Connected sockets may also Disconnect or Close the connection; these operations will be covered in more detail in a future FAQ entry.

Writing

A socket may be written to at any time. A Write operation places bytes into the outgoing stream. If using asynchronous Write operations, multiple Write operations may be started, and the bytes will be placed into the outgoing stream in the correct order.

Important note: The completion of a Write operation does not mean that the remote side has received the data.

The Write operation completes when the local OS has copied the entire write buffer, even though those bytes may not have been sent out on the network yet. Beginning TCP programmers often balk at this, because they think that they must know if data has been received by the remote side. This reaction is called "send anxiety", and will be covered in a future FAQ entry.

Write operations may not complete immediately. TCP allows one side to inform the other side of how much buffer space it has; therefore, if the remote application is reading the bytes slowly, then the socket's send buffer may fill up, and the socket may not send the outgoing bytes immediately. In fact, it is possible to end up in a deadlock situation if both sides send lots of data but read only a little. This is one reason why seasoned socket programmers almost always use asynchronous Write operations instead of synchronous.

A Write operation may (immediately) fail; this is the most common way to detect dropped connections. When a Write operation fails, the application should assume that the connection is no longer viable; see Error Handling for details.

Error Detection

It is possible that the Write operation may fail after it completes. TCP has a built-in retry mechanism, so the Write will only fail if it is quite sure the connection is no longer viable. In this situation, there is not a way for the OS to signal the application, so the it places the socket into an error state. This causes future socket operations to fail.

Most TCP protocols include a notion of a "keepalive message" which is written to the socket periodically (at least if there has been no other socket activity for some time). This enables the application to detect socket errors from "successful" Write operations that later failed. It also enables the application to detect lost connections, preventing the "half-open problem". Keepalive messages are discussed in more detail in Detection of Half-Open (Dropped) Connections.

Reading

As long as the socket is connected, the OS is constantly reading on behalf of the application (unless the socket's receive buffer has been disabled). The incoming bytes are stored in the socket's receive buffer and held there until the application starts a Read operation. It is possible to start more than one asynchronous Read operation at a time, but this is strongly discouraged because the operations may complete out of order.

When an application performs a Read operation, it is requesting to read N bytes from a socket. The OS will not wait until all N bytes arrive; rather, it may complete the Read operation when it has at least one byte to return to the application. When an application requests to Read N bytes, it actually receives at least one byte and at most N bytes. This clears out the OS receive buffers faster and gets the data to the application sooner, but this also means that the application must deal with "partial receives". Common ways of handling this are covered in Message Framing.

It is important for an application to Read from the connection on a regular basis, to prevent the deadlock situation described above under "Writing". For this reason, experienced socket programmers usually have a single asynchronous Read operation always running on a connected socket. Whenever the Read operation completes, another asynchronous Read operation is started.

Another advantage of reading constantly is that misbehaving applications are immediately detected. Most protocols have certain times when it would be an error for the remote side to send data. If the application does not constantly Read, then any data arriving at that time would be treated as data arriving at a later time. It is easier to debug misbehaving applications if the incoming data is read and logged at the time it arrives at the socket.

Reading Zero Bytes

Many stream-oriented objects (including sockets) will signal the end of the stream by returning 0 bytes in response to a Read operation. This means that the remote side of the connection has gracefully closed the connection, and the socket should be closed.

The zero-length read must be treated as a special case; if it is not, the receiving code usually enters an infinite loop attempting to read more data. A zero-length read is not an error condition; it merely means that the socket has been disconnected.

Important note: Most of the MSDN .NET socket examples do not handle this correctly! They will enter an infinite loop if the socket is closed by the remote side.

Disconnecting

Either side of a socket connection may initiate a Disconnect operation or Close the socket. Once one side of the connection starts disconnecting, the socket is no longer fully connected. It is possible for it to be partially connected for some time; this state is called "half-closed". Disconnecting socket connections (including the half-closed state) will be covered in a future FAQ entry.

(This post is part of the TCP/IP .NET Sockets FAQ)

2009-06-12

MSBuild.ExtensionPack Accepts DynamicExecute Task

Sorry for the lack of blog postings lately. I have been busy the last week polishing, refactoring, and documenting my first (and hopefully only) MSBuild custom task: DynamicExecute. DynamicExecute makes it possible to define and execute .NET methods using MSBuild 3.5.

Mike Fourie has accepted DynamicExecute for the next release of the MSBuild Extension Pack. It's available as Beta in the source code download until the next official release, when it will be included in the regular binaries.

DynamicExecute is similar to the inline tasks that are planned for MSBuild 4.0. Both DynamicExecute and inline tasks allow a build master to write C# code within the build script that is compiled and then executed as part of the build. There are a few differences, though:

  • DynamicExecute does not support referencing an assembly in the GAC by a partial name. e.g., the CTP of MSBuild 4.0 allows an assembly reference of "System.Windows.Forms", whereas DynamicExecute requires the full name "System.Windows.Forms, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089". Local assemblies may be referenced by short name, however. (This is not actually considered a limitation of DynamicExecute; loading GAC assemblies by partial names is not a good idea).
  • Once an inline task is created using the task factory, it may be referenced directly just like any other task. DynamicExecute methods must be called using the DynamicExecute task.

DynamicExecute does have one big advantage, though: it can be used now. :) Documentation (temporarily) is available online here.

2009-06-02

MSBuild: A Real-World Recursive Application

I recently posted on this blog a "toy application" of MSBuild that calculates factorials. Well, this weekend I was working on the new build script for the Nito.Async library, and surprised myself by finding an actual real-world application for this code!

It turns out that this is useful when autogenerating publisher policies. Nito.Async follows a simple major.minor version numbering scheme, where changes in minor are always fully backwards-compatible and changes in major never are. Publisher policies are a way of declaring backwards compatibility for strongly-named assemblies in the GAC (more info on MSDN and in KB891030).

To autogenerate publisher policies for a version maj.min, the build script must build a separate dll for each version in the range [maj.0, maj.min). It turns out that the recursive behavior in my "factorial.proj" toy was exactly what I needed; I just changed the return value to concatenate a list of numbers instead of multiplying them together.

There was one other small hurdle to overcome; I had to perform a cross product of two different item groups (the list of "previous minor versions" and the list of library dlls). This is not exactly straightforward in MSBuild, and is a common question (just Google for "MSBuild cross product").

The updated build script for Nito.Async has been checked into CodePlex, so if you want to see the details on how this works, you can view it online here. I'm not going to post it on the blog here, for sake of space.