Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
148 views
in Technique[技术] by (71.8m points)

c# - How Can a Stack Trace Point to the Wrong Line (the "return" Statement) - 40 Lines Off

I have twice now seen a NullReferenceException logged from a Production ASP.NET MVC 4 web application - and logged on the wrong line. Not wrong by a line or two (like you would get with a PDB mismatch), but wrong by the length of the entire controller action. Example:

public ActionResult Index()
{
    var someObject = GetObjectFromService();
    if (someObject.SomeProperty == "X") { // NullReferenceException here if someObject == null
        // do something
    }

    // about 40 more lines of code

    return View();    // Stack trace shows NullReferenceException here
}

This has happened twice for actions on the same controller. The second case was logged on

// someObject is known non-null because of earlier dereferences
return someObject.OtherProperty
    ? RedirecToAction("ViewName", "ControllerName")
    : RedirectToAction("OtherView", "OtherController");

This is very disturbing. NullReferenceException is very easy to fix once you know what line it occurs on. It's not quite so easy if the exception could have happened anywhere within the controller action!

Has anyone ever seen anything like this at all, either in ASP.NET MVC or elsewhere? I'm willing to believe it's the difference between a Release build and a Debug build, but still, to be off by 40 lines?


EDIT:

To be clear: I'm the original author of "What is a NullReferenceException and how do I fix it?". I know what a NullReferenceException is. This question is about why could the stack trace be so far off. I have seen cases where a stack trace is off by a line or two due to PDB mismatch. I have seen cases where there is no PDB, so you don't get line numbers. But I have never seen a case where the stack trace is off by 32 lines.

EDIT 2:

Note that this has happened with two separate controller actions within the same controller. Their code is quite different from each other. In fact, in the first case, the NullReferenceException didn't even occur in a conditional - it was more like this:

SomeMethod(someObject.SomeProperty);

There was some chance that the code had been reorganized during optimization so that the actual NullReferenceException occurred closer to the return, and the PDB was in fact only off by a few lines. But I don't see an opportunity to rearrange a method call in a way that would cause the code to move by 32 lines. In fact, I just looked at the decompiled source, and it does not appear to have been rearranged.

What these two cases have in common are:

  1. They occur in the same controller (so far)
  2. In both cases, the stack trace points to the return statement, and in both cases, the NullReferenceException occurred 30 or more lines away from the return statement.

EDIT 3:

I just did an experiment - I just rebuilt the solution using the "Production" build configuration that we have deployed to our Production servers. I ran the solution on my local IIS, without changing the IIS configuration at all.

The stack trace showed the correct line number.

EDIT 4:

I don't know if this is relevant, but the circumstance causing the NullReferenceException is as unusual as this "wrong line number" issue itself. We appear to be losing session state for no good reason (no restarts or anything). That's not too strange. The strange part is that our Session_Start should be redirecting to the login page when that happens. Any attempt to reproduce the session loss causes the redirect to the login page. Subsequently using the browser "Back" button or manually entering the previous URL goes right back to the login page without hitting the controller in question.

So maybe two weird problems is really one very weird problem.

EDIT 5:

I was able to obtain the .PDB file, and to look at it with dia2dump. I thought it was possible that the PDB was messed up, and only had line 72 for the method. That's not the case. All the line numbers are present in the PDB.

EDIT 6:

For the record, this just happened again, in a third controller. The stack trace points directly to the return statement of a method. This return statement is simply return model;. I don't think there's any way for that to cause a NullReferenceException.

Edit 6a:

In fact, I just looked more closely at the log and found several exceptions which are not NullReferenceException, and which still have the stack trace point at the return statement. Both of these cases are in methods called from the controller action, not directly in the action method itself. One of these was an explicitly thrown InvalidOperationException, and one was a simple FormatException.


Here are a couple of facts I haven't thought relevant until now:

  1. The Application_Error in the global.asax is what causes these exceptions to be logged. It picks up the exceptions by using Server.GetLastError().
  2. The logging mechanism logs the message and stack trace separately (instead of logging ex.ToString(), which would have been my recommendation). In particular, the stack trace that I've been asking about comes from ex.StackTrace.
  3. The FormatException was raised in System.DateTime.Parse, called from System.Convert.ToDate, called from our code. The stack trace line that points to our code is the line that points to "return model;".
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Can PDBs be off more than 2 or 3 lines?

You give the statement that you have never seen PDBs off more than a few lines. 40 lines seem to be too much, especially when the decompiled code doesn't look much of a difference.

However, that's not true and can be proven by a 2 liner: create a String object, set it to nulland call ToString(). Compile and run. Next, insert a 30 line comment, save the file, but do not recompile. Run the application again. The application still crashes, but gives a 30 lines difference in what it reports (line 14 vs. 44 in the screenshot).

It is not related at all to the code that gets compiled. Such things can easily happen:

  • code reformat, which e.g. sorts the methods by visibility, so the method moved up 40 lines
  • code reformat, which e.g. breaks long lines at 80 characters, usually this moves things down
  • optimize usings (R#) which removes 30 lines of unneeded imports, so the method moved up
  • insertion of comments or newlines
  • having switched to a branch while the deployed version (matching the PDB) is from trunk (or similar)

PDBs off by 30 lines

How can this happen in your case?

If it's really as you say and you seriously reviewed your code, there are two potential issues:

  • EXE or DLL does not match to PDBs, which can easily be checked
  • PDBs do not match to source code, which is harder to identify

Multithreading can set objects to null when you least expect it, even if it has been initialized before. In such a case, NullReferenceExceptions can not only be 40 lines away, it can even be in a totally different class and therefore file.

How to continue

Capture a dump

I'd first try to get a dump of the situation. This allows you to capture the state and look at everything in detail without the need of reproducing it on your developer machine.

For ASP.NET, see the MSDN blog Steps to Trigger a User Dump of a Process with DebugDiag when a Specific .net Exception is Thrown or Tess' blog.

In any case, always capture a dump including full memory. Also remember to collect all necessary files (SOS.dll and mscordacwks.dll) from the machine where the crash occured. You can use MscordacwksCollector (Disclaimer: I'm the author of it).

Check the symbols

See if the EXE/DLL really matches to your PDBs. In WinDbg, the following commands are helpful

!sym noisy
.reload /f
lm
!lmi <module>

Outside WinDbg, but still using debugging tools for Windows:

symchk /if <exe> /s <pdbdir> /av /od /pf

3rd party tool, ChkMatch:

chkmatch -c <exe> <pdb>

Check the source code

If PDBs match to DLLs, the next step is to check whether the source code belongs to the PDBs. This is best possible if you commit PDBs to version control together with the source code. If you did that, you can search the matching PDBs in source control and then get the same revision of source code and PDBs.

If you didn't do that, you're unlucky and you should probably not use source code but work with PDBs only. In case of .NET, this works out pretty well. I'm debugging a lot in 3rd party code with WinDbg without receiving the source code and I can get pretty far.

If you go with WinDbg, the following commands are useful (in this order)

.symfix c:symbols
.loadby sos clr
!threads    
~#s
!clrstack
!pe

Why code is so important on StackOverflow

Also, I looked at the code of the View() method, and there is no way for it to throw a NullReferenceException

Well, other people have made similar statements before. It's easy to overlook something.

The following is a real world example, just minimized and in pseudo code. In the first version, the lock statement didn't exist yet and DoWork() could be called from several threads. Quite soon, the lock statement was introduced and everything went well. When leaving the lock, someobj will always be a valid object, right?

var someobj = new SomeObj(); 
private void OnButtonClick(...)
{
    DoWork();
}

var a = new object();   
private void DoWork()
{
    lock(a) {
        try {
            someobj.DoSomething();
            someobj = null;
            DoEvents();             
        }
        finally
        {
            someobj = new SomeObj();
        }
    }   
}

Until one user reported the same bug again. We were sure that the bug was fixed and this was impossible to happen. However, this was a "double-click user", i.e. someone who does a double click on anything that can be clicked.

The DoEvents() call, which was of course not in such a prominent place, caused the lock to be entered again by the same thread (which is legal). This time, someobj was null, causing a NullReferenceException in a place where it seemed impossible to be null.

That second time, it was return boolValue ? RedirectToAction("A1","C1") : RedirectToAction("A2", "C2"). The boolValue was an expression which could not have thrown the NullReferenceException

Why not? What is boolValue? A property with a getter and setter? Also consider the following (maybe a bit off) case, where RedirectToAction takes only constant parameters, looks like a method, throws an exception but is still not on the callstack. This is why it is so important to see code on StackOverflow...

Screenshot: method with constant parameters not on callstack


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...