2009-11-28

Rx Release Brings Nice Surprises!

Just in case you haven't heard, last week the Rx Framework has been (pre)released!

This is an exciting event! And I don't get excited often... :)

Up until the release, what had come out of Microsoft amounted to this: the .NET 4.0 BCL would include the basic supporting types for Rx, but some of the useful operators (e.g., conversions back and forth between IEnumerable and IObservable) would not be included; those additional operators would be released as the Rx framework after .NET 4.0 comes out.

However, last week's Rx prerelease had a very nice surprise: Microsoft is (currently) planning to backport Rx to .NET 3.5 SP1! This includes not just all the Rx operators, but also the gaps that the Rx team been filling in LINQ.

Even better: the Rx backport also has backports of Tasks and PLINQ!

I think this is awesome! It enables software companies to take advantage of the tremendous Task/PLINQ/Rx enhancements without having to upgrade everything to .NET 4.0 / VS2010. There is still motivation to eventually move to .NET 4, of course: DLR, better distribution story, etc. But backporting Task/PLINQ/Rx is a great help to those of us who don't have time to upgrade everyhing just yet.

I've downloaded the (prerelease) Rx for .NET 3.5 SP1 backport. It appears to include:

  • System.CoreEx.dll: General-purpose supporting types.
  • System.Threading.dll: Tasks, PLINQ, and other .NET 4 concurrency stuff (Concurrent collections, Lazy initialization, new synchronization objects, etc).
  • System.Reactive.dll: Rx
  • System.Interactive.dll: Additions to LINQ, several of which were inspired by Rx operators.

Tasks, PLINQ, and other System.Threading items like the concurrent collections have been discussed elsewhere; for pre-release software, there's a surprising amount of documentation already available.

The Rx framework is a relative newcomer, and they're currently on fast-forward to get videos and blog posts out, so there's at least some documentation on Rx. Rx can be a bit confusing for many programmers because it's rooted in functional programming concepts, and the majority of programmers have traditionally only used imperative languages. Anyway, there's currently not much documentation, but keep an eye on the new Rx Team blog; they're currently doing a video per day, with blog posts (on individual team member blogs) as well.

One piece of the Rx framework isn't getting the love that the others are getting, though: the LINQ extensions. I've decided to document at least a little bit on some of the operators that the Rx team has added to IEnumerable. I've been working on some of my own additional LINQ operators, and Jon Skeet has a MoreLINQ project in the same vein.

One place to bookmark for Rx is the Rx Wiki, a community-run site about all things Rx.

2009-11-18

Generic Generics and Method Overloads

I was happily coding along this week, adding more IList<T> extension methods to my general utility library, when I came across an annoying problem. The following code works fine:

int test1<T>(IList<T> x) { return 0; }
int test1<T>(IEnumerable<T> x) { return 1; }

[TestMethod]
public void TestMethod1()
{
    var list = new[] { 13 };
    IEnumerable<int> seq = list;

    Assert.AreEqual(0, test1(list));
    Assert.AreEqual(1, test1(seq));
}

The behavior is just as you'd expect; the correct overloaded method is chosen based on the better conversion of the static types of the arguments.

So far, so good. The problem that I came across is when generic generics are used:

int test2<T>(IList<IList<T>> x) { return 0; }
int test2<T>(IEnumerable<IEnumerable<T>> x) { return 1; }

[TestMethod]
public void TestMethod2()
{
    var list = new[] { 13 };
    IList<IList<int>> list2 = new[] { list };
    var list3 = new[] { list };

    Assert.AreEqual(0, test2(list2));
    // The following line does not compile:
    //  "The call is ambiguous between the following methods or properties..."
    //Assert.AreEqual(0, test2(list3));
}

The compiler can choose the correct overload when the argument matches the specific expected type (e.g., "list2"), but fails to deduce that one overload is better than another when the argument is not as specific (e.g., "list3").

The reasoning behind this is a bit obscure, but understandable. The compiler determines that it is able to convert the argument to either type:

// These implicit conversions are why both methods are considered.
IList<IList<int>> tmp1 = list3;
IEnumerable<IEnumerable<int>> tmp2 = list3;

However, when determining which overload is "better", the compiler cannot convert from IList<IList<int>> to IEnumerable<IEnumerable<int>>, so it decides that neither overload is better, and therefore they are ambiguous. The first example worked because there is a conversion from IList<T> to IEnumerable<T>, so the IList<T> overload was chosen.

// The lack of this implicit conversion is why the methods are ambiguous.
//tmp2 = tmp1;

Note also that this situation may change when .NET 4 comes out. .NET 4 introduces covariance and contravariance for generics. The concepts don't apply to APIs that are both readable and writeable (e.g., IList<T>), but they do apply to APIs that are one or the other (e.g., IEnumerable<T>). It's expected that .NET 4 will have an implicit conversion from IList<IList<int>> to IEnumerable<IEnumerable<int>> (because IList<IList<int>> implements IEnumerable<IList<int>>), but it's unclear exactly how "smart" the compiler will be while resolving overload resolution.

We live in interesting times.

Unit Testing Without Design Suicide

One of the big problems when doing unit testing is that it's easy enough to test simple classes (without many dependencies), but testing more complex classes requires changes to the actual design of the code.

Mocks and stubs are common approaches to substitute other types on which the class under test depends. A number of frameworks have sprung up to make mocking and stubbing easier (I like Moq). However, every mock or stub has another problem: how does one force the class under test to use the mock/stub instead of the real implementation?

There are a few common solutions:

  1. Define an interface for each dependency, and pass references to the interfaces into the constructor for the class.
  2. Define an interface for each dependency, and add a property to the class for each interface with a public setter.
  3. Make every class unsealed and virtual, moving the dependency code to one of many protected virtual methods, and then create a new derived type that is used for testing, overriding the virtual methods representing dependent code.

None of these approaches are suitable for all situations. They become particularly problematic when the type under test depends on static properties or methods.

I had a choice two weeks ago when writing unit tests for a rollover logger. It depended on DateTime.Now as well as a few static methods from the File and Directory classes. Should I create an interface for getting the current date and time (which is unlikely to change)? An interface for the file system (also unlikely to change)? Should I make the class unsealed and all methods virtual (opening up a second API - the protected API - that would have required much more work in terms of API definition and documentation)?

Some unit testing advocates say those are good ideas. I say it's design suicide.

I ended up just writing integration tests; I didn't want to overcomplicate my design just for the sake of some unit tests.

A Better Solution

Just this morning I was reading a PDC-related blog post (man, I wish I could go some year...), and Sasha mentioned the existence of Moles/Stubs.

The whole idea behind the Moles/Stubs framework is to inject replacement implementation code for any public property or method of any type. This includes static properties and methods. This also includes methods and properties of sealed types.

Now that's sweet.

I haven't had a chance to play with it much, but it apparently uses profiling hooks to forward any types defined in an XML file. So, you could stub out mscorlib.dll by adding mscorlib.stubx. The Moles framework then creates a substitute types for mscorlib.dll, which have delegate properties that you can set to override the properties/methods of the original class.

If we wanted to override the getter for System.DateTime.Now, then we would set a property on System.Stubs.MDateTime. Here's the DateTime.Now example code from the Moles/Stubs site:

// let's detour DateTime.Now
MDateTime.NowGet = () => new DateTime(2000,1,1);

if (DateTime.Now == new DateTime(2000, 1, 1))
    throw new Y2KBugException(); // take cover!

By setting the MDateTime.NowGet property, you're able to specify the behavior of DateTime.Now.

I don't often get excited, but this is one of the exceptions. There are some limitations to the Mole framework: it's not an official/production level release, and the replaced properties/methods "must match one of the predefined set of code signatures" that they support. However, even with these limitations, I think it's something I'll be using quite a lot of!

Because it allows me to do unit testing without design suicide.

2009-11-17

ICollection.IsReadOnly (and Arrays)

Today I had a simple question that ended up having a bit of a complex answer: how does one implement ICollection<T>.IsReadOnly?

The fundamental problem is that there's more than one definition of "read-only". Various collection types permit different types of updates. Generally, updates fall into one of two categories:

  1. An update that changes the value of an element already in the collection, and does not change the number of elements in the collection. e.g., the index setter.
  2. An update that changes the number of values in the collection, but does not change any of the values of the elements in the collection. e.g., Add(), Clear(), etc.

I Googled for the proper semantics to use, and was able to find three decent sources of information: a StackOverflow question on the Contract of ICollection<T>.IsReadOnly, a blog post by Peter Golde titled "IList, ICollection, and IsReadOnly", and a blog post by Krzysztof Cwalina on "Generic interfaces, IsReadOnly, IsFixedSize, and array". From the (older) blog posts and some quick tests on array behavior, I've reached the conclusions below regarding the history and current state of the IsReadOnly property.

The Traditional Interpretation

The value of IsReadOnly is false if either type of update is allowed. It is only set to true if both types of updates are not allowed.

The built-in array type (which only allows one type of update) honors this interpretation by returning false for IsReadOnly:

[TestMethod]
public void Array_IsNotReadOnly()
{
    int[] array = new[] { 1, 2, 3, 4 };
    Assert.AreEqual("Int32[]", array.GetType().Name);
    bool arrayIsReadOnly = array.IsReadOnly;
    Assert.IsFalse(arrayIsReadOnly);

    System.Collections.IList arrayAsIList = array;
    Assert.AreEqual("Int32[]", arrayAsIList.GetType().Name);
    bool arrayAsIListIsReadOnly = arrayAsIList.IsReadOnly;
    Assert.IsFalse(arrayAsIListIsReadOnly);
}

The Modern Interpretation

It appears that with .NET 2.0, the meaning of IsReadOnly has changed. It should now be true if either type of update is not allowed. It is only set to false if both types of updates are allowed.

Interestingly, the built-in array type honors this interpretation as well. It returns true for IsReadOnly (but only if accessed through a generic interface):

[TestMethod]
public void Array_IsReadOnly()
{
    int[] array = new[] { 1, 2, 3, 4 };

    IList<int> arrayAsIListOfT = array;
    Assert.AreEqual("Int32[]", arrayAsIListOfT.GetType().Name);
    bool arrayAsIListOfTIsReadOnly = arrayAsIListOfT.IsReadOnly;
    Assert.IsTrue(arrayAsIListOfTIsReadOnly);
}

Presumably, any new list types that implement IList as well as IList<T> may need to return different values for IList.IsReadOnly and IList<T>.IsReadOnly. This is confusing, to say the least.

Identical Documentation; Confusing Behavior

As of the time of blog post, the Microsoft documentation for IList.IsReadOnly and ICollection<T>.IsReadOnly are nearly identical, ignoring the fact that the semantics are quite different:

  1. IList.IsReadOnly: "A collection that is read-only does not allow the addition, removal, or modification of elements after the collection is created."
  2. ICollection<T>.IsReadOnly: "A collection that is read-only does not allow the addition, removal, or modification of elements after the collection is created."

Furthermore, the behavior of the common array class is confusing, especially in light of the current Microsoft documentation for Array.IsReadOnly: "This property is always false for all arrays."

Conclusion: Does Anyone Care?

This blog post has been an attempt to sort out the proper way of implementing IsReadOnly. However, due to the complexity of the semantics, it seems unlikely that any client code is actually using it correctly.

For future code, I recommend only implementing IList<T> with the modern interpretation, and not implementing IList. If one does need IList, however (e.g., for binding purposes), then they must implement both interpretations.

2009-11-16

Reverse Compiling Windows Forms

Today I had a fun task: the source code for an existing executable had been lost, and I got the job of getting it back. The good news is that Red Gate's Reflector (formerly Lutz Roeder's Reflector) is a standard tool for any serious .NET programmer, and it does quite a decent job of decompiling (nonobfuscated) .NET code. The bad news is that I had to also reverse-engineer the GUI.

After finding nothing on Google, and a bit of trial and error, I discovered the following procedure worked adequately, at least for my (simple) executable on Visual Studio 2008:

  1. First, export the source from Reflector, create a solution, and ensure it builds.
  2. Convert the ".resource" files into ".resx" files. Reflector just dumps out the binary .NET resources, but VS prefers them as XML. Fire up your VS command prompt and run this command: "resgen My.Long.Resource.Name.resource Name.resx".
  3. Move the resulting ".resx" files into their appropriate directories (e.g., "My\Long\Resource"). The rest of these steps must be done for each ".resx" file.
  4. Add the ".resx" files to your solution (they should be inserted under the matching ".cs" file), remove the old ".resource" file from the solution, and rebuild.
  5. Add a new empty C# code file named "Name.Designer.cs" file in the same directory, and paste in the following code:
    namespace My.Long.Resource
    {
        partial class Name
        {
            /// 
            /// Required designer variable.
            /// 
            private System.ComponentModel.IContainer components = null;
    
            /// 
            /// Clean up any resources being used.
            /// 
            protected override void Dispose(bool disposing)
            {
                if (disposing && (components != null))
                {
                    components.Dispose();
                }
                base.Dispose(disposing);
            }
    
            #region Windows Form Designer generated code
            /// 
            /// Required method for Designer support - do not modify
            /// the contents of this method with the code editor.
            /// 
    
            #endregion
    
        }
    }
    
  6. Open up the parent "Name.cs" file (right-click -> View Code) and add the "partial" attribute to its class declaration.
  7. Delete the member variable "private IContainer components;".
  8. Move all GUI member variables from "Name.cs" to the end of "Name.Designer.cs" (placing them after the "#endregion"). GUI member variables are anything that is added from the Toolbox, so that would include System.Windows.Forms.Timer components, etc.
  9. Delete the "Name.Dispose" method from the "Name.cs" file.
  10. Move "Name.InitializeComponent" from the "Name.cs" file into the "Name.Designer.cs" file, placing it before the "#endregion".
  11. For each unrecognized type in the member variables and InitializeComponent, either fully qualify it or add a using declaration. Fully qualifying each type is more time consuming, but matches exactly what the designer expects. After this step, the solution should build.
  12. If InitializeComponent contains a line assigning the member variable "this.components = new Container();", then it must be changed to be "this.components = new System.ComponentModel.Container();" and moved to the top of the method.
  13. If InitializeComponent contains a line creating a resource manager, e.g., "ComponentResourceManager manager = new ComponentResourceManager(typeof(Name));", the local variable "manager" must be renamed to "resources" (and update references to the renamed object).
  14. Repeat attempting to load it in the designer, fully qualifying any types that it complains about (this step is necessary because the designer's code parser is not as smart as the C# compiler):
    • "The designer cannot process the code..." - Any enum member variables that have the same name as their type need to have their value fully qualified, e.g., "base.AutoScaleMode = AutoScaleMode.Font;" needs to be "base.AutoScaleMode = System.Windows.Forms.AutoScaleMode.Font;").
    • "The variable ... is either undeclared or was never assigned" - Many types seem to require fully qualified type names when declared (e.g., "private OpenFileDialog openFileDialog;" needs to be "private System.Windows.Forms.OpenFileDialog openFileDialog;").

After following the (rather tedious) procedure above, you should have a form that can be opened in the VS designer. If I had more time, I'd wrap it up as a Reflector add-in, but time seems to be a fleeting resource these days.

2009-10-22

Windows Services and the Network

Let's make this very clear: a service should not use or change drive mappings at all. See KB180362 (INFO: Services and Redirected Drives) and Services and Redirected Drives (MSDN) for more information. If a service needs to use network resources, it should use UNC paths.

Network drive mappings are handled differently on different Windows versions. In addition, network drive mappings are one type of an "MS-DOS Device Name", so they fall under the additional complications described in Local and Global MS-DOS Device Names.

Note that a service running as LocalService uses anonymous credentials to access network resources, and services running as NetworkService or LocalSystem use machine account credentials.

2009-10-21

Managed Windows Services - The Basics

Managed (.NET) Windows Services suffer from a lack of sufficient information in the .NET MSDN documentation. Earlier this year, the BCL team put a post on their blog that fills in the gaps: How .NET Managed Services Interact with the Service Control Manager. The Service Control Manager (SCM) is the part of Windows that controls starting and stopping Windows Services.

Services and the .NET ServiceBase Class

In a nutshell, the static ServiceBase.Run method provides a main loop for services, giving the service's main thread to the SCM. Once control has been passed off, ServiceBase will invoke the service entry points such as ServiceBase.OnStart and ServiceBase.OnStop as a response to SCM requests.

Properly Implementing ServiceBase.OnStart and ServiceBase.OnStop

The service enters the "starting" state before ServiceBase.OnStart is called, and only enters the "started" state when OnStart returns. So, a service that is always "starting" and never "started" is a pretty good indication that OnStart isn't returning.

OnStart cannot be a "main loop" for a service. Many services work just fine without a main loop, but if one is required, then OnStart should start a thread and then return, letting the thread run the actual main loop. If OnStart will take more than 30 seconds to return, then it should call ServiceBase.RequestAdditionalTime.

Similarly, the service enters the "stopping" state before ServiceBase.OnStop is called, and enters the "stopped" state when OnStop returns. If OnStop will take more than 20 seconds, then it should call ServiceBase.RequestAdditionalTime.

The Current Directory

Services do not start with their current directory set to where their executable is. They usually end up running with their current directory set to the Windows or Windows System folder. It's not unusual for Windows Services to set their current directory near the beginning of their Main method, before calling ServiceBase.Run:

Environment.CurrentDirectory = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location);

Services and Threading

Deep within the bowels of the OS, Windows Services are treated as a special sort of Console application. A Console application has a single thread by default and exits when that thread returns from Main; a Windows Service starts as a Console application and then passes ownership of its thread to the SCM by calling ServiceBase.Run. When the SCM decides to exit the service process (after all its services have been stopped), it will return control back to Main, which is expected to immediately exit.

The ServiceBase events (such as OnStart and OnStop) execute within the context of a worker thread. Therefore, the default synchronization context for .NET services is unsynchronized (e.g., SynchronizationContext.Current is null). Windows Services usually employ one of two threading models:

  1. Create a "main loop" thread within OnStart, and have this thread respond to events (including the OnStop event).
  2. Start at least one asynchronous operation (such as a Timer, listening socket, or FileSystemWatcher), and have the completion handlers take the appropriate actions.

Note that both of these models return from OnStart after a short period of time (either starting the main thread or starting an asynchronous operation).

A reminder about garbage collection is in order: if the only reference to an object is in a completion routine, then that object is eligible for garbage collection. This is true for any type of .NET process, but most often causes problems with services that choose to use the second threading model described above.

Even if a service uses a "main loop" thread, the default SynchronizationContext is still in effect, resulting in free-threaded completion routines even for EBAP components (EBAP: Event-Based Asynchronous Pattern). This means that EBAP components such as BackgroundWorker may not perform as expected. The Nito.Async library contains an ActionThread that is ideal for the "main loop" thread of a Windows Service; see the Nito.Async documentation for details and examples.