pointers - What exactly is a reference in C#

Question

Welcome To Ask or Share your Answers For Others

pointers - What exactly is a reference in C#

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

pointers - What exactly is a reference in C#

From what I understand by now, I can say that a reference in C# is a kind of pointer to an object which has reference count and knows about the type compatibility. My question is not about how a value type is different than a reference type, but more about how a reference is implemented.

I have read this post about what differences are between references and pointers, but that does not cover that much about what a reference is but it it's describing more it's properties compared with a pointer in C++. I also understand the differences between passing by reference an passing by value (as in C# objects are by default passed by value, even references), but it is hard for me to understand what really is a reference when I have tried to explain to my colleagues why a parameter sent by reference can not be stored inside a closure as in the Eric Lippert blog entry about the stack as an implementation detail.

Can somebody provide me with a complete, but hopefully simple explanation about what references really are in C# and a bit about how they are imlemented?

Edit: this is not a duplicate, because in the Reference type in C# it is explained how a reference works and how is it different of a value, but what am I asking is how a reference is defined at a low level.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:23:27+0000

From what I understand by now, I can say that a reference in C# is a kind of pointer to an object

If by "kind of" you mean "is conceptually similar to", yes. If you mean "could be implemented by", yes. If you mean "has the is-a-kind-of relationship to", as in "a string is a kind of object" then no. The C# type system does not have a subtyping relationship between reference types and pointer types.

which has reference count

Implementations of the CLR are permitted to use reference counting semantics but are not required to do so, and most do not.

and knows about the type compatibility.

I'm not sure what this means. Objects know their own actual type. References have a static type which is compatible with the actual type in verifiable code. Compatibility checking is implemented by the runtime's verifier when the IL is analyzed.

My question is not about how a value type is different than a reference type, but more about how a reference is implemented.

How references are implemented is, not surprisingly, an implementation detail.

Can somebody provide me with a complete, but hopefully simple explanation about what references really are in C#

References are things that act as references are specified to act by the C# language specification. That is:

objects (of reference type) have identity independent from the values of their fields
any object may have a reference to it
such a reference is a value which may be passed around like any other value
equality comparison is implemented for those values
two references are equal if and only if they refer to the same object; that is, references reify object identity
there is a unique null reference which refers to no object and is unequal to any valid reference to an object
A static type is always known for any reference value, including the null reference
If the reference is non-null then the static type of the reference is always compatible with the actual type of the referent. So for example, if we have a reference to a string, the static type of the reference could be string or object or IEnumerable, but it cannot be Giraffe. (Obviously if the reference is null then there is no referent to have a type.)

There are probably a few rules that I've missed, but that gets across the idea. References are anything that behaves like a reference. That's what you should be concentrating on. References are a useful abstraction because they are the abstraction which enables object identity independent of object value.

and a bit about how they are implemented?

In practice, objects of reference type in C# are implemented as blocks of memory which begin with a small header that contains information about the object, and references are implemented as pointers to that block. This simple scheme is then made more complicated by the fact that we have a multigenerational mark-and-sweep compacting collector; it must somehow know the graph of references so that it can move objects around in memory when compacting the heap, without losing track of referential identity.

As an exercise you might consider how you would implement such a scheme. It builds character to try to figure out how you would build a system where references are pointers and objects can move in memory. How would you do it?

it is hard for me to understand what really is a reference when I have tried to explain to my colleagues why a parameter sent by reference can not be stored inside a closure

This is tricky. It is important to understand that conceptually, a reference to a variable -- a ref parameter in C# -- and a reference to an object of reference type are conceptually similar but actually different things.

In C# you can think of a reference to a variable as an alias. That is, when you say

void M() 
{
  int x = 123;
  N(ref x);
}
void N(ref int y)
{ 
    y = 456;

Essentially what we are saying is that x and y are different names for the same variable. The ref is an unfortunate choice of syntax because it emphasizes the implementation detail -- that behind the scenes, y is a special "reference to variable" type -- and not the semantics of the operation, which is that logically y is now just another name for x; we have two names for the same variable.

References to variables and references to objects are not the same thing in C#; you can see this in the fact that they have different semantics. You can compare two references to objects for equality. But there is no way in C# to say:

static bool EqualAliases(ref int y, ref int z)
{
  return true iff y and z are both aliases for the same variable
}

the way you can with references:

static bool EqualReferences(object x, object y)
{
  return x == y;
}

Behind the scenes both references to variables and references to objects are implemented by pointers. The difference is that a reference to a variable might refer to a variable on the short-term storage pool (aka "the stack"), whereas a reference to an object is a pointer to the heap-allocated object header. That's why the CLR restricts you from storing a reference to a variable into long-term storage; it does not know if you are keeping a long-term reference to something that will be dead soon.

Your best bet to understand how both kinds of references are implemented as pointers is to take a step down from the C# type system into the CLI type system which underlies it. Chapter 8 of the CLI specification should prove interesting reading; it describes different kinds of managed pointers and what each is used for.

Categories

pointers - What exactly is a reference in C#

pointers - What exactly is a reference in C#

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags