Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
284 views
in Technique[技术] by (71.8m points)

.net - Know when to retry or fail when calling SQL Server from C#?

I have a C# application that fetches data from SQL Server hosted in a somewhat flaky environment. There is nothing I can do to address the environmental issues, so I need to handle them as gracefully as possible.

To do so, I want to retry operations that are the result of infrastructure failures such as network glitches, SQL servers going off-line because they're being rebooted, query time-outs etc. At the same time, I don't want to retry queries if they've failed for logical errors. I just want those to bubble the exception up to the client.

My question is this: what is the best way to distinguish between environmental problems (lost connections, time-outs) and other kinds of exceptions (things like logical errors that would have happened even if the environment was stable).

Is there a commonly used pattern in C# for dealing with things like this? For example, is there a property I can check on the SqlConnection object to detect failed connections? If not, what is the best way to approach this problem?

For what it is worth, my code isn't anything special:

using (SqlConnection connection = new SqlConnection(myConnectionString))
using (SqlCommand command = connection.CreateCommand())
{
  command.CommandText = mySelectCommand;
  connection.Open();

  using (SqlDataReader reader = command.ExecuteReader())
  {
    while (reader.Read())
    {
      // Do something with the returned data.
    }
  }
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

One single SqlException (may) wraps multiple SQL Server errors. You can iterate through them with Errors property. Each error is SqlError:

foreach (SqlError error in exception.Errors)

Each SqlError has a Class property you can use to roughly determine if you can retry or not (and in case you retry if you have to recreate connection too). From MSDN:

  • Class < 10 is for errors in information you passed then (probably) you can't retry if first you don't correct inputs.
  • Class from 11 to 16 are "generated by user" then probably again you can't do anything if user first doesn't correct his inputs. Please note that class 16 includes many temporary errors and class 13 is for deadlocks (thanks to EvZ) so you may exclude these classes if you handle them one by one.
  • Class from 17 to 24 are generic hardware/software errors and you may retry. When Class is 20 or higher you have to recreate connection too. 22 and 23 may be serious hardware/software errors, 24 indicates a media error (something user should be warned but you may retry in case it was just a "temporary" error).

You can find a more detailed description of each class here.

In general if you handle errors with their class you won't need to know exactly each error (using error.Number property or exception.Number which is just a shortcut for first SqlError in that list). This has the drawback that you may retry when it's not useful (or error can't be recovered). I'd suggest a two steps approach:

  • Check for known error codes (list error codes with SELECT * FROM master.sys.messages) to see what you want to handle (knowing how). That view contains messages in all supported languages so you may need to filter them by msglangid column (for example 1033 for English).
  • For everything else rely on error class, retrying when Class is 13 or higher than 16 (and reconnecting if 20 or higher).
  • Errors with severity higher than 21 (22, 23 and 24) are serious errors and little waiting won't fix that problems (database itself may also be damaged).

One word about higher classes. How to handle these errors isn't simple and it depends on many factors (including risk management for your application). As a simple first step I wouldn't retry for 22, 23, and 24 when attempting a write operation: if database, file system or media are seriously damaged then writing new data may deteriorate data integrity even more (SQL Server is extremely careful to do not compromise DB for a query even in critical circumstances). A damaged server, it depends on your DB network architecture, might even be hot-swapped (automatically, after a specified amount of time, or when a specified trigger is fired). Always consult and work close to your DBA.

Strategy for retrying depends on error you're handling: free resources, wait for a pending operation to complete, take an alternative action, etc. In general you should retry only if all errors are "retry-able":

bool rebuildConnection = true; // First try connection must be open

for (int i=0; i < MaximumNumberOfRetries; ++i) {
    try {
        // (Re)Create connection to SQL Server
        if (rebuildConnection) {
            if (connection != null)
                connection.Dispose();

            // Create connection and open it...
        }

        // Perform your task

        // No exceptions, task has been completed
        break;
    }
    catch (SqlException e) {
        if (e.Errors.Cast<SqlError>().All(x => CanRetry(x))) {
            // What to do? Handle that here, also checking Number property.
            // For Class < 20 you may simply Thread.Sleep(DelayOnError);

            rebuildConnection = e.Errors
                .Cast<SqlError>()
                .Any(x => x.Class >= 20);

            continue; 
        }

        throw;
    }
}

Wrap everything in try/finally to properly dispose connection. With this simple-fake-naive CanRetry() function:

private static readonly int[] RetriableClasses = { 13, 16, 17, 18, 19, 20, 21, 22, 24 };

private static bool CanRetry(SqlError error) {
    // Use this switch if you want to handle only well-known errors,
    // remove it if you want to always retry. A "blacklist" approach may
    // also work: return false when you're sure you can't recover from one
    // error and rely on Class for anything else.
    switch (error.Number) {
        // Handle well-known error codes, 
    }

    // Handle unknown errors with severity 21 or less. 22 or more
    // indicates a serious error that need to be manually fixed.
    // 24 indicates media errors. They're serious errors (that should
    // be also notified) but we may retry...
    return RetriableClasses.Contains(error.Class); // LINQ...
}

Some pretty tricky ways to find list of non critical errors here.

Usually I embed all this (boilerplate) code in one method (where I can hide all the dirty things done to create/dispose/recreate connection) with this signature:

public static void Try(
    Func<SqlConnection> connectionFactory,
    Action<SqlCommand> performer);

To be used like this:

Try(
    () => new SqlConnection(connectionString),
    cmd => {
             cmd.CommandText = "SELECT * FROM master.sys.messages";
             using (var reader = cmd.ExecuteReader()) {
                 // Do stuff
         }
    });

Please note that skeleton (retry on error) can be used also when you're not working with SQL Server (actually it can be used for many other operations like I/O and network related stuff so I'd suggest to write a general function and to reuse it extensively).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...