From what I understand by now, I can say that a reference in C# is a kind of pointer to an object
If by "kind of" you mean "is conceptually similar to", yes. If you mean "could be implemented by", yes. If you mean "has the is-a-kind-of relationship to", as in "a string is a kind of object" then no. The C# type system does not have a subtyping relationship between reference types and pointer types.
which has reference count
Implementations of the CLR are permitted to use reference counting semantics but are not required to do so, and most do not.
and knows about the type compatibility.
I'm not sure what this means. Objects know their own actual type. References have a static type which is compatible with the actual type in verifiable code. Compatibility checking is implemented by the runtime's verifier when the IL is analyzed.
My question is not about how a value type is different than a
reference type, but more about how a reference is implemented.
How references are implemented is, not surprisingly, an implementation detail.
Can somebody provide me with a complete, but hopefully simple explanation about what references really are in C#
References are things that act as references are specified to act by the C# language specification. That is:
- objects (of reference type) have identity independent from the values of their fields
- any object may have a reference to it
- such a reference is a value which may be passed around like any other value
- equality comparison is implemented for those values
- two references are equal if and only if they refer to the same object; that is, references reify object identity
- there is a unique null reference which refers to no object and is unequal to any valid reference to an object
- A static type is always known for any reference value, including the null reference
- If the reference is non-null then the static type of the reference is always compatible with the actual type of the referent. So for example, if we have a reference to a string, the static type of the reference could be string or object or IEnumerable, but it cannot be Giraffe. (Obviously if the reference is null then there is no referent to have a type.)
There are probably a few rules that I've missed, but that gets across the idea. References are anything that behaves like a reference. That's what you should be concentrating on. References are a useful abstraction because they are the abstraction which enables object identity independent of object value.
and a bit about how they are implemented?
In practice, objects of reference type in C# are implemented as blocks of memory which begin with a small header that contains information about the object, and references are implemented as pointers to that block. This simple scheme is then made more complicated by the fact that we have a multigenerational mark-and-sweep compacting collector; it must somehow know the graph of references so that it can move objects around in memory when compacting the heap, without losing track of referential identity.
As an exercise you might consider how you would implement such a scheme. It builds character to try to figure out how you would build a system where references are pointers and objects can move in memory. How would you do it?
it is hard for me to understand what really is a reference when I have tried to explain to my colleagues why a parameter sent by reference can not be stored inside a closure
This is tricky. It is important to understand that conceptually, a reference to a variable -- a ref
parameter in C# -- and a reference to an object of reference type are conceptually similar but actually different things.
In C# you can think of a reference to a variable as an alias. That is, when you say
void M()
{
int x = 123;
N(ref x);
}
void N(ref int y)
{
y = 456;
Essentially what we are saying is that x
and y
are different names for the same variable. The ref
is an unfortunate choice of syntax because it emphasizes the implementation detail -- that behind the scenes, y
is a special "reference to variable" type -- and not the semantics of the operation, which is that logically y
is now just another name for x
; we have two names for the same variable.
References to variables and references to objects are not the same thing in C#; you can see this in the fact that they have different semantics. You can compare two references to objects for equality. But there is no way in C# to say:
static bool EqualAliases(ref int y, ref int z)
{
return true iff y and z are both aliases for the same variable
}
the way you can with references:
static bool EqualReferences(object x, object y)
{
return x == y;
}
Behind the scenes both references to variables and references to objects are implemented by pointers. The difference is that a reference to a variable might refer to a variable on the short-term storage pool (aka "the stack"), whereas a reference to an object is a pointer to the heap-allocated object header. That's why the CLR restricts you from storing a reference to a variable into long-term storage; it does not know if you are keeping a long-term reference to something that will be dead soon.
Your best bet to understand how both kinds of references are implemented as pointers is to take a step down from the C# type system into the CLI type system which underlies it. Chapter 8 of the CLI specification should prove interesting reading; it describes different kinds of managed pointers and what each is used for.