Friday, June 13, 2008

.NET Generics

Generics in .NET 2.0

Generics in .NET 2.0 is an exciting feature. But what are generics? Are they for you? Should you use them in your applications? In this article, we'll answer these questions and take a closer look at generics usage, and their capabilities and limitations.

Type Safety
Many of the languages in .NET, like C#, C++, and VB.NET (with option strict on), are strongly typed languages. As a programmer using these languages, you expect the compiler to perform type-safety checks. For instance, if you try to treat or cast a reference of the type Book as a reference of the type Vehicle, the compiler will tell you that such a cast is invalid.

However, when it comes to collections in .NET 1.0 and 1.1, there is no help with type safety. Consider an ArrayList, for example. It holds a collection of objects. This allows you to place an object of just about any type into an ArrayList. Let's take a look at the code in Example 1.

Example 1. Lack of type safety in ArrayList

using System;
using System.Collections;

namespace TestApp
{
class Test
{
[STAThread]
static void Main(string[] args)
{
ArrayList list = new ArrayList();

list.Add(3);
list.Add(4);
//list.Add(5.0);

int total = 0;
foreach(int val in list)
{
total = total + val;
}

Console.WriteLine(
"Total is {0}", total);
}
}
}

I am creating an instance of ArrayList and adding 3 and 4 to it. Then I loop though the ArrayList, fetching the int values from it and adding them. This program will produce the result "Total is 7." Now, if I uncomment the statement:

list.Add(5.0);
the program will produce a runtime exception:

Unhandled Exception: System.InvalidCastException: Specified cast is not valid.
at TestApp.Test.Main(String[] args) in c:\workarea\testapp\class1.cs:line 18

What went wrong? Remember that ArrayList holds a collection of objects. When you add a 3 to the ArrayList, you are boxing the value 3. When you loop though the list, you are unboxing the elements as int. However, when you add the value 5.0, you are boxing a double. On line 18, that double value is being unboxed as an int, and that is the cause of failure.

(The above example, if it was written using VB.NET would not fail, however. The reason is VB.NET, instead of unboxing, invokes a method that converts the values into Integers. The VB.NET code will also fail if the value in ArrayList is not convertible to Integer. See Gotcha #9, "Typeless ArrayList Isn't Type-Safe," in my book .NET Gotchas for further details.)

As a programmer who is used to the type safety provided by the language, you would rather have the problems pop up during compile time instead of runtime. This is where generics come in.

What Are Generics?
Generics allow you to realize type safety at compile time. They allow you to create a data structure without committing to a specific data type. When the data structure is used, however, the compiler makes sure that the types used with it are consistent for type safety. Generics provide type safety, but without any loss of performance or code bloat. While they are similar to templates in C++ in this regard, they are very different in their implementation.

Using Generics Collections
The System.Collections.Generics namespace contains the generics collections in .NET 2.0. Various collections/container classes have been "parameterized." To use them, simply specify the type for the parameterized type and off you go. See Example 2:

Example 2. Type-safe generic List

List aList = new List();
aList.Add(3);
aList.Add(4);
// aList.Add(5.0);
int total = 0;
foreach(int val in aList)
{
total = total + val;
}
Console.WriteLine("Total is {0}", total);

In Example 2, I am creating an instance of the generic List with the type int, given within the angle brackets (<>), as the parameterized type. This code, when executed, will produce the result "Total is 7." Now, if I uncomment the statement doubleList.Add(5.0);, I will get a compilation error. The compiler determines that it can't send the value 5.0 to the method Add(), as it only accepts an int. Unlike the example in Example 1, this code has type safety built into it.

CLR Support for Generics
Generics is not a mere language-level feature. The .NET CLR recognizes generics. In that regard, the use of generics is a first-class feature in .NET. For each type of parameter used for a generic, a class is not rolled out in the Microsoft Intermediate Language (MSIL). In other words, your assembly contains only one definition of your parameterized data structure or class, irrespective of how many different types are used for that parameterized type. For instance, if you define a generic type MyList, only one definition of that type is present in MSIL. When the program executes, different classes are dynamically created, one for each type for the parameterized type. If you use MyList and MyList, then two classes are created on the fly when your program executes. Let's examine this further in Example 3.

Example 3. Writing a generic class

//MyList.cs
#region Using directives

using System;
using System.Collections.Generic;
using System.Text;

#endregion

namespace CLRSupportExample
{
public class MyList
{
private static int objCount = 0;

public MyList()
{
objCount++;
}

public int Count
{
get
{
return objCount;
}
}
}
}

//Program.cs
#region Using directives

using System;
using System.Collections.Generic;
using System.Text;

#endregion

namespace CLRSupportExample
{
class SampleClass {}

class Program
{
static void Main(string[] args)
{
MyList myIntList = new MyList();
MyList myIntList2 = new MyList();

MyList myDoubleList
= new MyList();

MyList mySampleList
= new MyList();

Console.WriteLine(myIntList.Count);
Console.WriteLine(myIntList2.Count);
Console.WriteLine(myDoubleList.Count);
Console.WriteLine(mySampleList.Count);
Console.WriteLine(
new MyList().Count);

Console.ReadLine();
}
}
}

I have created a generic class named MyList. To parameterize it, I simply inserted an angle bracket. The T within <> represents the actual type that will be specified when the class is used. Within the MyList class, I have a static field, objCount. I am incrementing this within the constructor so I can find out how many objects of that type are created by the user of my class. The Count property returns the number of instances of the same type as the instance on which it is called.

In the Main() method, I am creating two instances of MyList, one instance of MyList, and two instances of MyList, where SampleClass is a class I have defined. The question is: what will be the value of Count? That is, what is the output from the above program? Go ahead and think on this and try to answer this question before you read further.

Have you worked the above question? Did you get the following answer?

2
2
1
1
2

The first two values of 2 are for MyList. The first value of 1 is for MyList. The second value of 1 is for MyList; only one instance of this type had been created at that point in the control flow. The last value of 2 is also for MyList, since another instance of this type has been created at this point in the code. The above example illustrates that MyList is a different class from MyList, which in turn is a different class from MyList. So, in this example, we have four classes of MyList: MyList, MyList, MyList, and MyList. Again, while there are four classes of MyList, only one is stored in MSIL. How can we prove this? Figure 1 shows the MSIL using the ildasm.exe tool.



Figure 1. A look at MSIL for Example 3

Generics Methods
In addition to having generic classes, you may also have generic methods. Generic methods may be part of any class. Let's look at Example 4:

Example 4. A generic method

public class Program
{
public static void Copy(List source, List destination)
{
foreach (T obj in source)
{
destination.Add(obj);
}
}

static void Main(string[] args)
{
List lst1 = new List();
lst1.Add(2);
lst1.Add(4);

List lst2 = new List();
Copy(lst1, lst2);
Console.WriteLine(lst2.Count);
}
}

The Copy() method is a generic method that works with the parameterized type T. When Copy() is invoked in Main(), the compiler figures out the specific type to use, based on the arguments presented to the Copy() method.

Unbounded Type Parameters
If you create generics data structures or classes, like MyList in Example 3, there are no restrictions on what type the parametric type you may use for the parameteric type. This leads to some limitations, however. For example, you are not allowed to use ==, !=, or < on instances of the parametric type:

if (obj1 == obj2) …

The implementation of operators such as == and != are different for value types and reference types. The behavior of the code may not be easier to understand if these were allowed arbitrarily. Another restriction is the use of default constructor. For instance, if you write new T(), you will get a compilation error, because not all classes have a no-parameter constructor. What if you do want to create an object using new T(), or you want to use operators such as == and !=? You can, but first you have to constraint the type that can be used for the parameterized type. Let's look at how to do that.

Constraints and Their Benefits
A generic class allows you to write your class without committing to any type, yet allows the user of your class, later on, to indicate the specific type to be used. While this gives greater flexibility by placing some constraints on the types that may be used for the parameterized type, you gain some control in writing your class. Let's look at an example:

Example 5. The need for constraints: code that will not compile

public static T Max(T op1, T op2)
{
if (op1.CompareTo(op2) < 0)
return op1;
return op2;
}

The code in Example 5 will produce a compilation error:

Error 1 'T' does not contain a definition for 'CompareTo'
Assume I need the type to support the CompareTo() method. I can specify this by using the constraint that the type specified for the parameterized type must implement the IComparable interface. Example 6 has the code:
Example 6. Specifying a constraint

public static T Max(T op1, T op2) where T : IComparable
{
if (op1.CompareTo(op2) < 0)
return op1;
return op2;
}

In Example 6, I have specified the constraint that the type used for parameterized type must inherit from (implement) IComparable. The following constraints may be used:

where T : struct type must be a value type (a struct)
where T : class type must be reference type (a class)
where T : new() type must have a no-parameter constructor
where T : class_name type may be either class_name or one of its sub-classes (or is below class_name in the inheritance hierarchy)
where T : interface_name type must implement the specified interface

You may specify a combination of constraints, as in: where T : IComparable, new(). This says that the type for the parameterized type must implement the IComparable interface and must have a no-parameter constructor.

Inheritance and Generics
A generic class that uses parameterized types, like MyClass1, is called an open-constructed generic. A generic class that uses no parameterized types, like MyClass1, is called a closed-constructed generic.

You may derive from a closed-constructed generic; that is, you may inherit a class named MyClass2 from another class named MyClass1, as in:

public class MyClass2 : MyClass1
You may derive from an open-constructed generic, provided the type is parameterized.

For example:

public class MyClass2 : MyClass2
is valid, but

public class MyClass2 : MyClass2
is not valid, where Y is a parameterized type. Non-generic classes may derive from closed-constructed generic classes, but not from open-constructed generic classes. That is,

public class MyClass : MyClass1
is valid, but

public class MyClass : MyClass1
is not.

Generics and Substitutability
When we deal with inheritance, we need to be careful about substitutability. If B inherits from A, then anywhere an object of A is used, an object of B may also be used. Let's assume we have a Basket of Fruits (Basket). We have Apple and Banana (kinds of Fruits) inherit from Fruit. Should Basket of Apples (Basket) inherit from Basket of Fruits (Basket)? The answer is no, if we think about substitutability. Why? Consider a method that works with a Basket of Fruits:

public void Package(Basket aBasket)
{
aBasket.Add(new Apple());
aBasket.Add(new Banana());
}

If an instance of Basket is sent to this method, the method would add an Apple and a Banana. However, what would the effect be of sending an instance of a Basket to this method? You see, this gets tricky. That is why if you write:

Basket anAppleBasket = new Basket();
Package(anAppleBasket);
You will get an error:

Error 2 Argument '1':
cannot convert from 'TestApp.Basket'
to 'TestApp.Basket'

The compiler protects us from shooting ourselves in the foot by making sure we don't arbitrarily pass a collection of derived where a collection of base is expected. That is pretty good, isn't it?

Wait a minute, though! That was great in the above example, but there are times when I do want to pass a collection of derived where a collection of base is needed. For instance, consider an Animal (such as Monkey), which has a method named Eat that takes a Basket, as shown below:

public void Eat(Basket fruits)
{
foreach (Fruit aFruit in fruits)
{
// code to eat fruit
}
}

Now, you may call:

Basket fruitsBasket = new Basket();
… // Fruits added to Basket
anAnimal.Eat(fruitsBasket);

What if you have a Basket with you? Would it make sense to send a Basket to the Eat method? In this case, it would, no? But the compiler will give us an error if we try:

Basket bananaBasket = new Basket();
//…
anAnimal.Eat(bananaBasket);

The compiler is protecting us here. How can we ask the compiler to let us through in this particular case? Again, constraints come in handy for this:

public void Eat(Basket fruits) where T : Fruit
{
foreach (Fruit aFruit in fruits)
{
// code to eat fruit
}
}

In writing the Eat() method, I am asking the compiler to allow a Basket of any type T, where T is of the type Fruit or any class that inherits from Fruit.

Generics and Delegates
Delegates can be generics as well. This provides quite a bit of flexibility.

Assume we are interested in writing a framework. We need to provide a mechanism for an event source to talk to an object that is interested in the event. Our framework may not be able to control what the events are. You may be dealing with a stock price change (double price). I may be dealing with temperature change in a boiler (temperature value), where Temperature may be an object that has some information such as value, units, threshold, and so on. How can I define an interface for these events?

Let's take a look at how we can realize this by using pre-generic delegates:

public delegate void NotifyDelegate(Object info);

public interface ISource
{
event NotifyDelegate NotifyActivity;
}

We have the NotifyDelegate accepting an Object. This is the best we could do in the past, as Object can be use to represent different types such as double, Temperature, and so on, though it involves boxing overhead for value types. ISource is an interface that different sources will support. The framework exposes the NotifyDelegate delegate and the ISource interface.

Let's look at two different sources:

public class StockPriceSource : ISource
{
public event NotifyDelegate NotifyActivity;
//…
}

public class BoilerSource : ISource
{
public event NotifyDelegate NotifyActivity;
//…
}

If we have an object of each of the above classes, we would register a handler for events, as shown below:

StockPriceSource stockSource = new StockPriceSource();
stockSource.NotifyActivity
+= new NotifyDelegate(
stockSource_NotifyActivity);

// Not necessarily in the same program… we may have
BoilerSource boilerSource = new BoilerSource();
boilerSource.NotifyActivity
+= new NotifyDelegate(
boilerSource_NotifyActivity);

In the delegate handler methods, we would do something like the following:
For the handler for stock event, we would have:

void stockSource_NotifyActivity(object info)
{
double price = (double)info;
// downcast required before use
}
The handler for the temperature event may look like this:

void boilerSource_NotifyActivity(object info)
{
Temperature value = info as Temperature;
// downcast required before use
}

The above code is not intuitive, and is messy with the downcasts. With generics, the code is more readable and easier to work with. Let's take a look at the code with generics at work:

Here is the delegate and the interface:

public delegate void NotifyDelegate(T info);

public interface ISource
{
event NotifyDelegate NotifyActivity;
}

We have parameterized the delegate and the interface. The implementor of the interface can now say what the type should be.

The Stock source would look like this:

public class StockPriceSource : ISource
{
public event NotifyDelegate NotifyActivity;
//…
}

and the Boiler source would look like this:

public class BoilerSource : ISource
{
public event NotifyDelegate NotifyActivity;
//…
}

If we have an object of each of the above classes, we would register a handler for events, as shown below:

StockPriceSource stockSource = new StockPriceSource();
stockSource.NotifyActivity
+= new NotifyDelegate(
stockSource_NotifyActivity);

// Not necessarily in the same program… we may have
BoilerSource boilerSource = new BoilerSource();
boilerSource.NotifyActivity
+= new NotifyDelegate(
boilerSource_NotifyActivity);
Now, the event handler for stock price would be:

void stockSource_NotifyActivity(double info)
{
//…
}
and the event handler for the temperature is:

void boilerSource_NotifyActivity(Temperature info)
{
//…
}

This code has no downcast and the types involved are very clear.

Generics and Reflection
Since generics are supported at the CLR level, you may use reflection API to get information about generics. One thing may be a bit confusing when you are new to generics: you have to keep in mind that there is the generics class you write and then there are types created from it at runtime. So, when using the reflection API, you have to make an extra effort to keep in mind which type you are dealing with. I illustrate this in the Example 7:

Example 7. Reflection on generics

public class MyClass { }

class Program
{
static void Main(string[] args)
{
MyClass obj1 = new MyClass();
MyClass obj2 = new MyClass();

Type type1 = obj1.GetType();
Type type2 = obj2.GetType();

Console.WriteLine("obj1's Type");
Console.WriteLine(type1.FullName);
Console.WriteLine(
type1.GetGenericTypeDefinition().FullName);

Console.WriteLine("obj2's Type");
Console.WriteLine(type2.FullName);
Console.WriteLine(
type2.GetGenericTypeDefinition().FullName);
}
}

I have an instance of MyClass. I ask for the class name of this instance. Then I ask for the GenericTypeDefinition() of this type. GenericTypeDefinition() will return the type metadata for MyClass in this example. You may call IsGenericTypeDefinition to ask if this is a generic type (like MyClass) or if its type parameters have been specified (like MyClass). Similarly, I query an instance of MyClass for its metadata. The output from the above program is shown below:

obj1's Type
TestApp.MyClass`1
[[System.Int32, mscorlib, Version=2.0.0.0, Culture=neutral,
PublicKeyToken=b77a5c561934e089]]
TestApp.MyClass`1
obj2's Type
TestApp.MyClass`1
[[System.Double, mscorlib, Version=2.0.0.0, Culture=neutral,
PublicKeyToken=b77a5c561934e089]]
TestApp.MyClass`1
We can see that MyClass and MyClass are classes that belong to the mscorlib assembly (dynamically created), while the class MyClass belongs to my assembly.

Generics' Limitations
We have seen the power of generics so far in this article. Are there any limitations? There is one significant limitation, which I hope Microsoft addresses. In expressing constraints, we can specify that the parameter type must inherit from a class. How about specifying that the parameter must be a base class of some class? Why do we need that?

In Example 4, I showed you a Copy() method that copied contents of a source List to a destination list. I can use it as follows:

List appleList1 = new List();
List appleList2 = new List();

Copy(appleList1, appleList2);
However, what if I want to copy apples from one list into a list of Fruits (where Apple inherits from Fruit). Most certainly, a list of Fruits can hold Apples. So I want to write:

List appleList1 = new List();
List fruitsList2 = new List();

Copy(appleList1, fruitsList2);
This will not compile. You will get an error:

Error 1 The type arguments for method
'TestApp.Program.Copy(System.Collections.Generic.List,
System.Collections.Generic.List)' cannot be inferred from the usage.

The compiler, based on the call arguments, is not able to decide what T should be. What I really want to say is that the Copy should accept a List of some type as the first parameter, and a List of the same type or a List of its base type as the second parameter.

Even though there is no way to say that a type must be a base type of another, you can get around this limitation by still using the constraints. Here is how:

public static void Copy(List source,
List destination) where T : E

Here I have specified that the type T must be the same type as, or a sub-type of, E. We got lucky with this. Why? Both T and E are being defined here. We were able to specify the constraint (though the C# specification discourages using E to define the constraint of T when E is being defined as well).

Consider the following example, however:

public class MyList
{
public void CopyTo(MyList destination)
{
//…
}
}

I should be able to call CopyTo:

MyList appleList = new MyList();
MyList appleList2 = new MyList();
//…
appleList.CopyTo(appleList2);

I must also be able to do this:

MyList appleList = new MyList();
MyList fruitList2 = new MyList();
//…
appleList.CopyTo(fruitList2);

This, of course, will not work. How can we fix this? We need to say that the argument to CopyTo() can be either MyList of some type or MyList of the base type of that type. However, the constraints do not allow us to specify the base type. How about the following?

public void CopyTo(MyList destination) where T : E
Sorry, this does not work. It gives a compilation error that:

Error 1 'TestApp.MyList.CopyTo()' does not define type
parameter 'T'
Of course, you may write the code to accept MyList of any arbitrary type and then within your code, you may verify that the type is one of acceptable type. However, this pushes the checking to runtime, losing the benefit of compile-time type safety.

Conclusion
Generics in .NET 2.0 are very powerful. They allow you to write code without committing to a particular type, yet your code can enjoy type safety. Generics are implemented in such a way as to provide good performance and avoid code bloat. While there is the drawback of constraints' inability to specify that a type must be a base type of another type, the constraints mechanism gives you the flexibility to write code with a greater degree of freedom than sticking with the least-common-denominator capability of all types.

Labels: ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home