Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
118 views
in Technique[技术] by (71.8m points)

c# - Get a reference to a struct inside array

I want to modify a field of a struct which is inside an array without having to set entire struct. In the example below, I want to set one field of element 543 in the array. I don't want to have to copy entire element (because copying MassiveStruct would hurt performance).

class P
{
    struct S
    {
      public int a;
      public MassiveStruct b;
    }

    void f(ref S s)
    {
      s.a = 3;
    }

    public static void Main()
    {
      S[] s = new S[1000];
      f(ref s[543]);  // Error: An object reference is required for the non-static field, method, or property
    }
}

Is there a way to do it in C#? Or do I always have to copy entire struct out of array, modify the copy, and then put the modified copy back into array.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

[edit 2017: see important comments regarding C# 7 at the end of this post]

After many years of wrestling with this exact problem, I'll summarize the few techniques and solu-tions I have found. Stylistic tastes aside, arrays of structs are really the o?n?l?y in-memory bulk storage method available in C#. If your app truly processes millions of medium-sized objects under high throughput conditions, there no other managed alternative.

I agree with @kaalus that object headers and GC pressure can quickly mount; nevertheless my NLP grammar pro-cess-ing system can manipulate 8-10 gigabytes (or more) of structural analyses in less than a min-ute when parsing and/or generating lengthy natural language sentences. Cue the chorus: “C# isn't meant for such problems...,” “Switch to assembly language...,” “Wire-wrap up an FPGA...,” etc.

Well, instead let's run some tests. First of all, it is critical to have total understanding of the full spectrum of value-type (struct) man-agement issues and the class vs. struct tradeoff sweet-spots. Also of course boxing, pinning/unsafe code, fixed buffers, GCHandle, IntPtr, and more, but most importantly of all in my opinion, wise use of managed pointers (a.k.a. "interior pointers").

Your mastery of these topics will also include knowledge of the fact that, should you happen to include in your struct one or more references to managed types (as opposed to just blittable primitives), then your options for accessing the struct with unsafe pointers are greatly reduced. This is not a problem for the managed pointer method I'll mention below. So generally, including object referen-ces is fine and doesn't change much regarding this discussion.

Oh, and if you do really need to preserve your unsafe access, you can use a GCHandle in 'Normal' mode to store object reference(s) in your struct indefinitely. Fortunately, putting the GCHandle into your struct does not trigger the unsafe-access prohibition. (Note that GCHandle is itself a value-type, and you can even define and go to town with

var gch = GCHandle.Alloc("spookee",GCHandleType.Normal);
GCHandle* p = &gch;
String s = (String)p->Target;

...and so forth. As a value type, the GCHandle is imaged directly into your struct, but obviously any reference types it stores are not. They are out in the heap, not included in the physical layout of your array. Finally on GCHandle, beware of its copy-semantics, though, because you'll have a memory leak if you don't eventually Free each GCHandle you allocate.

@Ani reminds us that some people consider mutable struct instances "evil," but it's really the fact that they are accident prone that's the problem. Indeed, the OP's example...

s[543].a = 3;

...illustrates exactly what we're trying to achieve: access our data records in-situ. (Beware: the syntax for an array of reference-type 'class' instances has identical appearance, but in this article we're specifically discussing only non-jagged arrays of user-defined value-types here.) For my own programs, I generally consider it a severe bug if I encounter an oversized blittable struct that has (accidentally) been wholly imaged out of its array storage row:

rec no_no = s[543];   // don't do
no_no.a = 3           // it like this

As far as how big (wide) your struct can or should be, it won't matter, because you are going to be careful never to let the struct do what was just shown in the previous example, that is, migrate in-toto out of its embedding array. In fact, this points to a fundamental premise of this entire article:

rule:
For arrays of struct, always access individual fields in-situ; never "mention" (in C#) the struct instance itself in-toto.

Unfortunately, the C# language offers no way to systematically flag or forbid code that violates this rule, so success here generally depends on careful programming discipline.

Since our "jumbo-structs" are never imaged out of their array, they're really just templates over memory. In other words, the right thinking is to conceive of the struct as overlaying the array elements. We always think of each as a vacuous "memory temp-late," as opposed to a transferrable or portable encapsulator or data container. For array-bound "jumbo" value-types, we never want to invoke that most existential characteristic of a "struct", namely, pass-by-value.

Example:

public struct rec
{
    public int a, b, c, d, e, f;
}

Here we overlay 6 ints for a total of 24 bytes per "record." You'll want to consider and be aware of packing options to obtain an alignment-friendly size. But excessive padding can cut into your memory budget: because a more important consideration is the 85,000 byte limit on non-LOH objects. Make sure your record size multiplied by the expected number of rows does not exceed this limit.

So for the example given here, you would be best advised to keep your array of recs to no more 3,000 rows each. Hopefully your application can be designed around this sweet-spot. This is not so limiting when you remember that--alternatively--each row would be a separate garbage-collected object, instead of just the one array. You've cut your object proliferation by a three orders of magnitude, which is good for a day's work. Thus the .NET environment here is strongly steering us with a pretty specific constraint: it seems that if you target your app's memory design towards monolithic alloc-ations in the 30-70 KB range, then you really can get away with lots and lots of them, and in fact you'll instead become limited by a thornier set of performance bottlenecks (namely, bandwidth on the hardware memory bus).

So now you have a single .NET reference type (array) with 3,000 6-tuples in physically contiguous tabular storage. First and foremost, we must be super-careful to never "pick up" one of the structs. As Jon Skeet notes above, "Massive structs will often perform worse than classes," and this is absolutely correct. There's no better way to paralyze your memory bus than to start throwing plump value types around willy-nilly.

So let's capitalize on an infrequently-mentioned aspect of the array of structs: All objects (and fields of those objects or structs) of all rows of the entire array are always initialized to their default values. You can start plugging values in, one at a time, in any row or column (field), anywhere in the array. You can leave some fields at their default values, or replace neighbor fields without dis-turbing one in the middle. Gone is that annoying manual initialization required with stack-resident (local variable) structs before use.

Sometimes it's hard to maintain the field-by-field approach because .NET is always trying to get us to blast in an entire new'd-up struct--but to me, this so-called "initialization" is just a violation of our taboo (against plucking the whole struct out of the array), in a different guise.

Now we get to the crux of the matter. Clearly, accessing your tabular data in-situ minimizes data-shuffling busywork. But often this is an inconvenient hassle. Array accesses can be slow in .NET, due to bounds-checking. So how do you maintain a "working" pointer into the interior of an array, so as to avoid having the system constantly recomputing the indexing offsets.

Evaluation

Let's evaluate the performance of five different methods for the manipulation of individual fields within value-type array storage rows. The test below is designed to measure the efficiency of intensively accessing the data fields of a struct positioned at some array index, in situ--that is, "where they lie," without extracting or rewriting the entire struct (array element). Five different access methods are compared, with all other factors held the same.

The five methods are as follows:

  1. Normal, direct array access via square-brackets and the field specifier dot. Note that, in .NET, arrays are a special and unique primitive of the Common Type System. As @Ani mentions above, this syntax cannot be used to change an individual field of a reference instance, such as a list, even when it is parameterized with a value-type.
  2. Using the undocumented __makeref C# language keyword.
  3. Managed pointer via a delegate which uses the ref keyword
  4. "Unsafe" pointers
  5. Same as #3, but using a C# function instead of a delegate.

Before I give the C# test results, here's the test harness implementation. These tests were run on .NET 4.5, an AnyCPU release build running on x64, Workstation gc. (Note that, because the test isn't interested the efficiency of allocating and de-allocating the array itself, the LOH consideration mentioned above does not apply.)

const int num_test = 100000;
static rec[] s1, s2, s3, s4, s5;
static long t_n, t_r, t_m, t_u, t_f;
static Stopwatch sw = Stopwatch.StartNew();
static Random rnd = new Random();

static void test2()
{
    s1 = new rec[num_test];
    s2 = new rec[num_test];
    s3 = new rec[num_test];
    s4 = new rec[num_test];
    s5 = new rec[num_test];

    for (int x, i = 0; i < 5000000; i++)
    {
        x = rnd.Next(num_test);
        test_m(x); test_n(x); test_r(x); test_u(x); test_f(x);
        x = rnd.Next(num_test);
        test_n(x); test_r(x); test_u(x); test_f(x); test_m(x);
        x = rnd.Next(num_test);
        test_r(x); test_u(x); test_f(x); test_m(x); test_n(x);
        x = rnd.Next(num_test);
        test_u(x); test_f(x); test_m(x); test_n(x); test_r(x);
        x = rnd.Next(num_test);
        test_f(x); test_m(x); test_n(x); test_r(x); test_u(x);
        x = rnd.Next(num_test);
    }
    Debug.Print("Normal (subscript+field):          {0,18}", t_n);
    Debug.Print("Typed-reference:                   {0,18}", t_r);
    Debug.Print("C# Managed pointer: (ref delegate) {0,18}", t_m);
    Debug.Print("C# Unsafe pointer:                 {0,18}", t_u);
    Debug.Print("C# Managed pointer: (ref func):    {0,18}", t_f);
}

Because the code fragments which implement the test for each specific method are long-ish, I'll give the results first. Time is 'ticks;' lower means better.

Normal (subscript+field):             20,804,691
Typed-reference:                      30,920,655
Managed pointer: (ref delegate)       18,777,666   // <- a close 2nd
Unsafe pointer:                       22,395,806
Managed pointer: (ref func):          18,767,179   // <- winner

I was surprised that these results were s


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...