Jon Jagger
jon@jaggersoft.com

A Programmers Overview of C#

Introduction

The Common Language Infrastructure (CLI) is a specification submitted jointly by Microsoft, Intel, and Hewlett-Packard as an ECMA standard. The Common Language Runtime (CLR) is Microsoft's implementation of the CLI.

The CLI is designed for strongly types languages and has 5 partitions. Partition 1 specifies the CLI foundation: the Common Type System (CTS), the Virtual Execution System (VES), and the Common Language Specification (CLS).

Compiling a C# program does not create a regular executable file. Instead it creates a program in Common Intermediate Language (CIL, specified in partition 3 of the ECMA standard). A compiled C# program also contains a block of metadata (data about the program itself) called a manifest (specified in partition 2). This metadata facilitates powerful reflection capabilities.

The VES translates the CIL into native executable code (which can be done just-in-time or at installation).

The Common Type System (CTS) is a set of types designed to allow language interoperability. All CTS types are either value types or reference types.

The Common Language Specification is a set of rules designed to allow language interoperability. For example, unsigned integer types are not in the CLS so your C# programs must not expose unsigned integers if you want them to be fully interoperable.

Hello World

The obligatory console Hello World in C# looks like this.

class HelloWorld
{
    static void Main()
    {
        System.Console.WriteLine("Hello, world!");
    }
}

C# has a sensibly limited preprocessor. There are no macro functions. What you see is what you get. A C# source file is not required to have the same name as the class it contains. Identifiers should follow the camelCasing or PascalCasing notation depending on whether they are private or non-private respectively. Hungarian notation is officially not recommended. C# is a case sensitive language so Main must be spelled with a capital M. A C# program exposing two identifiers differing only in case is not CLS compliant. C# offers exception handling features using the try/catch/finally keywords. Exceptions are used extensively in the Base Class Library (BCL). C# also supports C++ like namespaces as a purely a logical scoping/naming mechanism. You can write using directives to bring the typenames in a namespace into scope.

using System; // System.Exception

class HelloWorld
{
    static void Main()
    {
        try 
        {
            NotMain()
        }
        catch (Exception caught) 
        {
            ...
        }
    }
    ...
}

C# Fundamentals

Numeric Types

C# supports 8 integer types (not all of which are CLS compliant) and three floating point types. The floating point literal suffixes for these three types are F/f, D/d, and M/m (think m for money).

C# expressions follow the standard C/C++/Java rules of precedence and associativity. As in Java, the order of operand evaluation is left to right (in C/C++ it's unspecified), an expression must have a side effect (in C/C++ it needn't) and a variable can only be used once it has definitely been assigned (not true in C/C++).

Checked Arithmetic

The CIL allows expressions or statements that contain integer arithmetic to be checked to detect integer overflow. C# uses the checked and unchecked keywords to access this feature. An integer overflow throws an OverflowException when checked. (Integer division by zero always throws a DivideByZeroException.) Floating point expressions never throw exceptions (except when being cast to integers). For example:

using System;

class Overflow
{
    static void Main()
    {
        try 
        {
            int x = int.MaxValue + 1;          // wraps to int.MinValue
            int y = checked(int.MaxValue + 1); // throws
        }
        catch (OverflowException caught) 
        {
            Console.WriteLine(caught);
        }
    }
}

Control Flow

C# supports the if/while/for/do statements familiar to C/C++/Java programmers. As in Java, a C# boolean expression must be a genuine boolean expression. There are never any conversions from a built in type to true/false. A variable introduced in a for statement initialization is scoped to that for statement. C# supports a foreach statement which you can use to effortlessly iterate through an array (or any type that supports the correct interface/pattern).

using System;

class Foreach
{
    static void Main(string[] args)
    {
        foreach (string arg in args) 
        {
            Console.WriteLine(arg);
        }
    }
}

The C# switch statement does not allow fall-through behavior. Every case section (including the optional default section) must end in a break statement, a return statement, a throw statement, or a goto statement. You are only allowed to switch on integral types, bools, chars, strings and enums (these types all have a literal syntax).

Methods and Parameters

C# does not allow global methods; all methods must be declared in a struct or a class. C# does not have a C/C++ header/source file separation; all methods must be declared inline. Arguments can be passed to methods in three different ways:

The ref/out keywords must appear on the method declaration and the method call. For example:

class Calling
{
    static void Copies  (    int param) { ... }
    static void Accesses(ref int param) { ... }
    static void Modifies(out int param) { ... }

    static void Main()
    {
        int arg = 42;
        Copies  (    arg); // arg won't change
        Accesses(ref arg); // arg might change
        Modifies(out arg); // arg will change
    }
}

C# supports method overloading but not return type covariance. Unlike Java, C# does not support method throw specifications (all exceptions are effectively unchecked).

Value Types

C# makes a clear distinction between value types and reference types. Value type instances (values) live on the stack and are used directly whereas reference type instances (objects) live on the heap and are used indirectly. C# has excellent language support for declaring user-defined value types (unlike Java).

Enums and Structs

You can declare enum types in C#. For example:

enum Suit { Hearts, Clubs, Diamonds, Spades }

You can also declare a user-defined value type using the struct keyword. For example:

struct CoOrdinate
{
    int x, y;
}

Unlike C++, the default accessibility of struct fields is private. You control the initialization of struct values using constructors. You use the static keyword to declare shared methods and shared fields. The readonly keyword is used for fields that can't be modified and are initialised at runtime. The const keyword is used for fields (and local variables) that can't be modified and are initialised at compile time (and is therefore restricted to enums and built in types). As in Java, each declaration must repeat its access specifier.

struct CoOrdinate
{
    public CoOrdinate(int initialX, initialY)
    {
        x = rangeCheckedX(initialX);
        y = rangeCheckedY(initialY);
    }
    public const int MaxX = 600;
    public static readonly CoOrdinate Empty = new CoOrdinate(0, 0);
    ...
    private int x, y;
}

The built in value type keywords are in fact just a notational convenience. The keyword int (for example) is an alias for System.Int32, a struct called Int32 that lives in the System namespace. Whether you use int or System.Int32 in a C# program makes no difference.

Operator Overloading

C# supports operator overloading. Enum types automatically support most operators but struct types do not. For example, to allow struct values to be compared for equality/inequality you must write == and != operators:

struct CoOrdinate
{
    public static bool operator==(CoOrdinate lhs, CoOrdinate rhs)
    {
        return lhs.x == rhs.x && lhs.y == rhs.y;
    }
    public static bool operator!=(CoOrdinate lhs, CoOrdinate rhs)
    {
        return !(lhs == rhs);
    }
    ...
    private int x, y;
}

Operators must be public static methods. Operator parameters can only be passed by copy (no ref or out parameters). One or more of the operator parameter types must be of the containing type so you can't change the meaning of the built in operators. The increment (and decrement) operators can be overloaded and work correctly when used in either prefix or postfix form. C# also supports conversion operators which must be declared using the implicit or explicit keyword. Some operators (such as simple assignment) cannot be overloaded.

Properties

Rather than using a Java Bean-like naming convention, C# uses properties to declare read/write access to a logical field without breaking encapsulation. Properties contain only get and set accessors. The get accessor is automatically called in a read context and the set accessor is automatically called in a write context. For example (note the x and X case difference):

struct CoOrdinate
{   ...
    public int X
    {
        get 
        { 
            return x; 
        }
        set  
        { 
            x = rangeCheckedX(value); 
        }
    }
    ...
    private static int rangeCheckedX(int argument) 
    {
        if (argument < 0 || argument > MaxX) 
        {
            throw new ArgumentOutOfRange("x"); 
        }
        return argument;
    }
    ...
    private int x, y;
}

Indexers

An indexer allows a user-defined type to be used as an array. An indexer, like a property, can contain only get/set accessors. For example:

struct StringSection
{   ...
    public char this[int at]
    {
        get 
        { 
            return adapted[start_at + at];
        }
    }
    ...
    private readonly string adapted;
    private readonly int start_at; 
}

Reference Types

Classes

Classes allow you to create user-define reference types. One or more reference type variables can easily refer to the same object. A variable whose declared type is a class can be assigned to null to signify that the reference does not refer to an object (struct variables cannot be assigned to null). Assignment to null counts as a Definite Assignment. Classes can declare constructors, destructors, fields, properties, indexers, and operators. Despite identical syntax, classes and structs have subtly different rules and semantics. For example, you can declare a parameterless constructor in a class but not in a struct. You can initialise fields declared in a class at their point of declaration, but struct fields can only be initialized inside a constructor. Here is a class called MyForm that implements the GUI equivalent of Hello World in C#.NET.

using System.Windows.Forms;

class Launch
{
    static void Main()
    {
        Application.Run(new MyForm());
    }
}

class MyForm : Form
{
    public MyForm()
    {
         Text = captionText;
    }
    private string captionText = "Hello, world!";
}

Variables whose declared type is a class can be passed by copy, ref, and out exactly as before.

class WrappedInt
{
    public WrappedInt(int initialValue)
    {
        value = initialValue;
    }
    ...
    private int value;
}
class Calling
{
    static void Copies  (    WrappedInt param) { ... }
    static void Accesses(ref WrappedInt param) { ... }
    static void Modifies(out WrappedInt param) { ... }

    static void Main()
    {
        WrappedInt arg = new WrappedInt(42);
        Copies  (    arg); // arg won't change
        Accesses(ref arg); // arg might change
        Modifies(out arg); // arg will change 
    }  
}

Strings

C# string literals are double quote delimited (char literals are single quote delimited). Strings are reference types so it is easy for two or more string variables to refer to the same string object. The keyword string is an alias for the System.String class in exactly the same way that int is an alias for the System.Int32 struct.

namespace System
{
    public sealed class String : ...
    {   ...
        public static bool operator==(String lhs, String rhs) { ... }
        public static bool operator!=(String lhs, String rhs) { ... }
        ...
        public int Length { get { ... } }
        public char this[int index] { get { ... } }
        ...
        public CharEnumerator GetEumerator() { ... }
        ...
    }
}

The string class supports a readonly indexer (it contains a get accessor but no set accessor). The C# string type is an immutable type (just like in Java). The string equality and inequality operators are overloaded but the relational operators (< <= > >=) are not. The StringBuilder class is the mutable companion to string and lives in the System.Text namespace. You can iterate through a string expression using a foreach statement.

Arrays

C# arrays are reference types. The size of the array is not part of the array type. You can declare rectangular arrays of any rank (Java supports only one dimensional rectangular arrays).

    int[]  row;
    int[,] grid;

Array instances are created using the new keyword. Array elements are default initialised to zero (enums and numeric types), false (bool), or null (reference types).

    row  = new int[42];
    grid = new int[9,6];

Array instances can be initialised. A useful initialisation shorthand does not work for assignment.

    int[] row = new int[4]{ 1, 2, 3, 4 }; // longhand
    int[] row =           { 1, 2, 3, 4 }; // shorthand
          row = new int[4]{ 1, 2, 3, 4 }; // okay
          row =           { 1, 2, 3, 4 }; // compile-time error

Array indexes start at zero and all array accesses are bounds checked (IndexOutOfRangeException). All arrays implicitly inherit from the System.Array class. This class brings array types into the CTS and provides some handy properties and methods:

namespace System
{
    public abstract class Array : ...
    {   ...
        public int Length { get { ... } }
        public int Rank { get { ... } }
        public int GetLength(int rank) { ... }
        public virutal IEnumerator GetEnumerator() { ... }
        ...
    }
}

The element type of an array can itself be an array creating a so called "ragged" array. Ragged arrays are not CLS compliant. You can use a foreach statement to iterate through a ragged array or through a rectangular array of any rank:

class ArrayIteration
{
    static void Main()
    {
        int[] row = { 1, 2, 3, 4 };
        foreach (int number in row) 
        {
            ... 
        }

        int[,] grid = { { 1, 2 }, { 3, 4 } };
        foreach (int number in grid) 
        {
            ... 
        }
 
        int[][] ragged = 
        { 
            new int[2]{ 1, 2 }, 
            new int[4]{ 3, 4, 5, 6 } 
        };
        foreach (int[] array in ragged) 
        { 
            foreach (int number in array) 
            {
                ...
            }
        }
    }
}

Boxing

An object reference can be initialised with a value. This does not create a reference referring into the stack (which is just as well!). Instead the VES makes a copy of the value on the heap and the reference refers to this copy. The copy is created using a plain bitwise copy (the copying is guaranteed to never throw an exception). This is called boxing. Extracting a boxed value back into a local value is called unboxing and requires an explicit cast. When unboxing the VES checks if the boxed value has the exact type specified in the cast (conversions are not considered). If it doesn't the CLI runtime throws an InvalidCastException. C# uses boxing as part of the params mechanism to create typesafe variadic methods (methods that can accept a variable number of arguments of any type).

struct CoOrdinate
{   ...
    private int x, y;
}

class Boxing
{
    static void Main()
    {
        CoOrdinate pos;
        pos.X = 1;
        pos.Y = 2;
        object o = pos;                  // boxes
        ...
        CoOrdinate copy = (CoOrdinate)o; // cast to unbox
    }
}

Type Relationships

Inheritance

C# supports the same inheritance model as Java; a class can extend at most one other class (in fact a class always extends exactly one class since all classes implicitly extend System.Object). A struct cannot act as a base type or be derived from. A derived class can access non-private members of its immediate base class using the base keyword. Unlike Java (and like C++) by default C# methods, indexers, properties, and events are not virtual.

class Symbol
{   ...   
    public virtual string Name
    {  
        get { ... }
    }
}

class TerminalSymbol : Symbol
{   ...  
    public override string Name
    { 
        get 
        { 
            return "terminal";
        }
    }
}

class NonTerminalSymbol : Symbol
{   ...
    public NonTerminalSymbol(string name) 
    { 
        this.name = name;
    }
    
    public sealed override string Name
    { 
        get 
        {
            return name;
        }
    } 

    private readonly string name;
}

Interfaces

C# interfaces contain only the names of methods. Methods bodies are not allowed. Access modifiers are not allowed (all methods are implicitly public). Fields are not allowed (not even static ones). Static methods are not allowed (so no operators). Nested types are not allowed. Properties, indexers, and events (again with no bodies) are allowed though. An interface, struct, or class can have as many base interfaces as it likes.

interface Symbol
{   ...
    string Name { get; } 
}

A struct or class must implement all its inherited interface methods. Interface methods can be implemented implicitly (in which case they must be public and can be virtual to allow overriding) or explicitly (in which case they are not public, can only be called via the interface, are not virtual, and cannot be overriden).

class TerminalSymbol : Symbol
{   ...
    public string Name // implicit implementation
    { 
        get 
        { 
            return "terminal"; 
        }
    } 
}
class TerminalSymbol : Symbol
{   ...
    string Symbol.Name // explicit implementation
    {
        get 
        { 
            return "terminal"; 
        }
    }
}

You use the abstract keyword to declare an abstract class or an abstract method (only abstract classes can declare abstract methods). You use the sealed keyword to declare a class that cannot be derived from. The inheritance notation is positional; base class first, followed by base interfaces.

interface Symbol
{   ... 
    string Name { get; }
}

abstract class NamedSymbol
{   ...
    protected NamedSymbol(string name)
    {
        this.name = name;
    }

    public string Name
    { 
        get 
        { 
            return name; 
        }
    } 

    private readonly string name;
}

class TerminalSymbol : NamedSymbol, Symbol
{   
    public TerminalSymbol()
        : base("terminal")
    {
    }
    ...
}

sealed class NonTerminalSymbol : NamedSymbol, Symbol
{
    public NonTerminalSymbol(string name)
        : base(name)
    {
    }
    ...
}

Runtime type information is available via the is, as, and typeof keywords as well as the object.GetType() method.

Resource Management

You can declare a destructor in a class. A C# destructor has the same name as its class, prefixed with a tilde (~). A destructor is not allowed an access modifier or any parameters. The compiler converts your destructor into an override of the object.Finalize method. For example, this:

public class StreamWriter : TextReader
{   ...
    ~StreamWriter()
    {
        Close();
    }

    public override void Close() 
    {
        ... 
    }
}

is converted into this: (You can use the ILDASM tool to see this transformation in the CIL.)

public class StreamWriter : TextReader 
{   ...
    protected override void Finalize()
    {
        try 
        {
            Close();
        }
        finally 
        {
            base.Finalize();
        }
    }
    public override void Close() 
    {  
        ... 
    }
}

You are not allowed to call a destructor or the Finalize method in code (which is why they never have an access modifier or parameters). Instead, the generational garbage collector (which is part of the VES) calls Finalize on objects sometime after they become unreachable but definitely before the program ends. You can force a garbage collection using the System.GC.Collect() method. C# does not support struct destructors (although CIL does). However, C# does have a using statement which you can use to scope a resource to a local block in an exception safe way. For example, this:

class Example
{
    void Method(string path)
    {
        using (AutoStreamWriter local = new StreamWriter(path))
        {
            StreamWriter writer = local.StreamWriter;
            ...
        }
    }
}

is automatically translated into this:

class Example
{
    void Method(string path)
    {
        {
            AutoStreamWriter local = new StreamWriter(path);
            try 
            {
                StreamWriter writer = local.StreamWriter;
                ...
            }
            finally 
            {
                local.Dispose();
            }
         }
    }
}

which relies on AutoStreamWriter implementing the System.IDisposable interface:

public struct AutoStreamWriter : IDisposable
{
    public AutoStreamWriter(StreamWriter decorated)
    {
        local = decorated;
    }

    public static implicit operator AutoStreamWriter(StreamWriter decorated)
    {
        return new AutoStreamWriter(decorated);
    }

    public StreamWriter StreamWriter
    {
        get 
        { 
            return local; 
        }
    }

    void IDisposable.Dispose()
    {
        local.Close();
    }

    private readonly StreamWriter local;
}

Program Relationships

Delegates and Events

The delegate is the last C# type. A delegate is a named method signature (similar to a function pointer in C/C++). For example, the System namespace declares a delegate called EventHandler that's used extensively in the Windows.Forms classes:

namespace System
{
    public delegate void EventHandler(object sender, EventArgs sent);
    ...
}

EventHandler is a now a reference type; EventHandler is the name of a type. Calling a delegate variable calls all the delegate instances attached to it.

namespace Not.System.Windows.Forms
{
    public class Button
    {   ...
        public EventHandler Click;
        ...
        protected void OnClick(EventArgs sent)
        {
            if (Click != null) 
            {
                Click(this, sent); // call here
            }
        }
    }
}

All delegate types implicitly derive from the System.Delegate class (again you can use the ILDASM tool to see this transformation in the CIL). You use the event keyword to modify the declaration of a delegate variable. Event delegates can be used only in restricted, safe ways (for example, you can't call an event delegate from outside the struct/class it is declared in):

namespace System.Windows.Forms
{
    public class Button
    {   ...
        public event EventHandler Click;
    }
}

You create an instance of a delegate type by naming a method whose signature matches the signature of the delegate type. You attach a delegate instance to a matching delegate variable using the += operator and remove it using the -= operator.

using System.Windows.Forms;

class MyForm : Form
{   ...
    private void initializeComponent()
    {   ...
        okButton = new Button("OK");
        okButton.Click += new EventHandler(this.okClick); // create + attach
    }    

    private void okClick(object sender, EventArgs sent)  
    {   ...
    }
    ...
    private Button okButton;
}

Assemblies

You can compile a working set of source files (all written in the same supported language) into a CLI module. For example, using the Microsoft C# command line compiler:

csc /target:module /out:grammar.netmodule *.cs

The default file extension for a module is .netmodule. A module contains (directly) types and CIL instructions and forms the smallest unit of dynamic download. However, a module cannot be run. The only thing you can do with a module is add it to an assembly. An assembly contains a manifest (a module does not). The manifest is metadata that describes the contents of the assembly and makes the assembly self describing. An assembly knows:

The Microsoft C# compiler option to create an assembly is /target:library. For example (there are various other options for adding modules and referencing other assemblies):

csc /target:library /out:parser.dll *.cs

The Microsoft C# compiler option to create an executable assembly is /target:exe (one of the structs/classes must contain a Main method).

csc /target:exe /out:application.exe *.cs

Assemblies comes in two forms. A private assembly is not versioned, and is used only by a single application. A shared assembly is versioned, and lives in a special shared directory called the Global Assembly Cache (GAC). Shared assembly version numbers are created using an IP like numbering scheme:

<major> . <minor> . <build> . <revision>

Shared applications that differ only by version number can co-exist in the GAC (this is called side-by-side execution). This provides a solution to the DLL-Hell problem. The particular version of an assembly that an individual application uses when running can be controlled from an XML application configuration file. For example:

...
<BindingPolicy>
    <BindingRedir Name="application" ... 
                  Version="*" 
                  VersionNew="6.1.1212.14"
                  UseLatestBuildRevision="no"/>
</BindingPolicy>
...

You can edit this config file to choose your binding policy. For example:

Attributes

You use attributes to tag code elements with declarative information. This information is added to the metadata, and can be queried and acted upon at translation/run time using reflection. For example, you use the [Conditional] attribute to tag methods you want removed from the release build (calls to conditional methods are also removed):

using System.Diagnostics;

class Trace
{
    [Conditional("DEBUG")]
    public static void Write(string message) 
    {   ...
    }
}

You use the [CLSCompliant] attribute to declare (or check) that a source file conforms to the Common Language Specification:

[assembly:System.CLSCompliant(true)]
...

You can use the [MethodImpl] attribute to synchronize a method:

using System.Runtime.CompilerSerives;

class Example 
{
    [MethodImpl(MethodImplOptions.Synchronized)]
    void SynchronizedMethod() 
    {   ...
    }
}

The attribute mechanism is extensible; you can easily create and use your own attribute types:

public sealed class DeveloperAttribute : Attribute
{
    public DeveloperAttribute(string name) 
    {   ...
    }
}
[Developer("Patrick Jagger")]
public struct AutoStreamWriter : IDisposable
{   ...
}

Summary

C# programs compile into Common Intermediate Language (CIL). CIL types that conform to the Common Language Specification (CLS) can be used by any .NET language. For example, in Microsoft's .NET, the types in the System namespace are implemented in the mscorlib.dll assembly. Programs written in C#, in VB.NET, in managed C++, or any supported language can all use this assembly. There isn't one version of the assembly for each language.

CIL programs are translated into executable programs either at installation time or just-in-time as they are executed by the Virtual Execution System (VES). The Common Language Infrastructure (the CTS, the VES, the CLS, and the metadata specification) is an ECMA standard and efforts are already underway to implement the CLI on non Windows platforms (eg http://www.go-mono.com.)

C# is a modern general purpose programming language. It has clear similarities to Java (reference types, inheritance model, garbage collection) and to C++ (value types, operator overloading, logical namespaces, explicit interface implementation, by default methods are not virtual). It has no backward compatibility constraints (as C++ does to C) and avoids/resolves some known problems in Java. The Common Type System (CTS) makes a clear distinction between value types and reference types. The more I use C# the more I like it and the more I appreciate the careful and consistent decisions taken during its design.

In keeping this overview to a reasonable length I have necessarily omitted numerous important aspects of C#. Nevertheless I hope this article has given you a flavour of C# and its relationship to .NET.

{ JSL }
Jagger Software Ltd
Company # 4070126
VAT # 762 5213 42