知道从C#调用SQL Server时何时重试或失败?

我有一个C#应用程序,它从一个有点片状环境中托管的SQL Server中获取数据。 我无法解决环境问题,所以我需要尽可能优雅地处理它们。

为此,我想重新尝试基础设施故障导致的操作,例如网络故障,SQL服务器脱机,因为它们正在重新启动,查询超时等。同时,我不想要如果逻辑错误失败,则重试查询。 我只是希望那些将exception冒泡到客户端。

我的问题是:区分环境问题(丢失连接,超时)和其他类型的exception(即使环境稳定会发生逻辑错误之类的事情)的最佳方法是什么。

C#中有一个常用的模式来处理这样的事情吗? 例如,是否有一个属性我可以在SqlConnection对象上检查以检测失败的连接? 如果没有,解决这个问题的最佳方法是什么?

对于它的价值,我的代码并不特别:

using (SqlConnection connection = new SqlConnection(myConnectionString)) using (SqlCommand command = connection.CreateCommand()) { command.CommandText = mySelectCommand; connection.Open(); using (SqlDataReader reader = command.ExecuteReader()) { while (reader.Read()) { // Do something with the returned data. } } } 

一个SqlException (可能)包装多个SQL Server错误。 您可以使用Errors属性迭代它们。 每个错误都是SqlError

 foreach (SqlError error in exception.Errors) 

每个SqlError都有一个Class属性,您可以使用它来粗略地确定您是否可以重试(如果您还需要重新创建连接,则重试)。 来自MSDN :

  • Class <10表示您传递的信息中的错误(可能)如果您没有更正输入,则无法重试。
  • 从11到16的Class是“由用户生成的”然后如果用户首先不纠正他的输入,则可能再次无法做任何事情。 请注意,第16类包含许多临时错误,第13类包含死锁(感谢EvZ),因此如果您逐个处理这些类,则可以排除这些类。
  • 17到24之间是一般硬件/软件错误,您可以试试。 当Class为20或更高时,您还必须重新创建连接 。 22和23可能是严重的硬件/软件错误,24表示媒体错误(应警告用户,但如果它只是一个“临时”错误,您可能会重试)。

您可以在此处找到每个class级的更详细说明。

通常,如果您使用其类处理错误,则无需确切地知道每个错误(使用error.Number属性或exception.Number ,它只是该列表中第一个SqlError的快捷方式)。 这样做的缺点是,当它无用时(或无法恢复错误),您可以重试。 我建议采用两步法

  • 检查已知错误代码(使用SELECT * FROM master.sys.messages列出错误代码)以查看要处理的内容(知道如何处理)。 该视图包含所有受支持语言的消息,因此您可能需要通过msglangid列过滤它们(例如,英语为1033)。
  • 对于其他一切依赖于错误类,当Class为13或高于16时重试(并在20或更高时重新连接)。
  • 严重性高于21(22,23和24)的错误是严重错误,很少等待也无法解决问题(数据库本身也可能被损坏)。

关于高年级的一个词。 如何处理这些错误并不简单,这取决于许多因素(包括应用程序的风险管理 )。 作为一个简单的第一步,我在尝试写入操作时不会重试22,23和24:如果数据库,文件系统或介质严重受损,那么写入新数据可能会使数据完整性恶化(SQL Server非常小心)即使在严峻的情况下,也不要破坏DB的查询。 损坏的服务器,取决于您的数据库网络架构,甚至可能是热插拔(自动,在指定的时间后,或触发指定的触发器)。 始终咨询并在您的DBA附近工作。

重试策略取决于您正在处理的错误:免费资源,等待待处理操作完成,采取替代操作等。一般情况下,只有在所有错误都是“可重试”时才应重试:

 bool rebuildConnection = true; // First try connection must be open for (int i=0; i < MaximumNumberOfRetries; ++i) { try { // (Re)Create connection to SQL Server if (rebuildConnection) { if (connection != null) connection.Dispose(); // Create connection and open it... } // Perform your task // No exceptions, task has been completed break; } catch (SqlException e) { if (e.Errors.Cast().All(x => CanRetry(x))) { // What to do? Handle that here, also checking Number property. // For Class < 20 you may simply Thread.Sleep(DelayOnError); rebuildConnection = e.Errors .Cast() .Any(x => x.Class >= 20); continue; } throw; } } 

try / finally包装所有内容以正确处理连接。 有了这个简单的假天真的CanRetry()函数:

 private static readonly int[] RetriableClasses = { 13, 16, 17, 18, 19, 20, 21, 22, 24 }; private static bool CanRetry(SqlError error) { // Use this switch if you want to handle only well-known errors, // remove it if you want to always retry. A "blacklist" approach may // also work: return false when you're sure you can't recover from one // error and rely on Class for anything else. switch (error.Number) { // Handle well-known error codes, } // Handle unknown errors with severity 21 or less. 22 or more // indicates a serious error that need to be manually fixed. // 24 indicates media errors. They're serious errors (that should // be also notified) but we may retry... return RetriableClasses.Contains(error.Class); // LINQ... } 

这里有一些非常棘手的方法可以找到非关键错误列表。

通常我将所有这些(样板)代码嵌入到一个方法中(我可以隐藏所有用于创建/ dispose /重新创建连接的脏东西 ),并使用此签名:

 public static void Try( Func connectionFactory, Action performer); 

要像这样使用:

 Try( () => new SqlConnection(connectionString), cmd => { cmd.CommandText = "SELECT * FROM master.sys.messages"; using (var reader = cmd.ExecuteReader()) { // Do stuff } }); 

请注意,当您不使用SQL Server时,也可以使用骨架(出错时重试)(实际上它可以用于许多其他操作,如I / O和网络相关的东西,所以我建议编写一般function并广泛地重复使用它。

您可以简单地将SqlConnectionStringBuilder属性重命名为sql连接。

var conBuilder = new SqlConnectionStringBuilder(Configuration["Database:Connection"]); conBuilder.ConnectTimeout = 90; conBuilder.ConnectRetryInterval = 15; conBuilder.ConnectRetryCount = 6;

注意: – 必需.Net 4.5或更高版本。

我不知道任何标准,但这里有一个Sql-Serverexception列表,我通常认为它是可重复的,还有DTC调味:

 catch (SqlException sqlEx) { canRetry = ((sqlEx.Number == 1205) // 1205 = Deadlock || (sqlEx.Number == -2) // -2 = TimeOut || (sqlEx.Number == 3989) // 3989 = New request is not allowed to start because it should come with valid transaction descriptor || (sqlEx.Number == 3965) // 3965 = The PROMOTE TRANSACTION request failed because there is no local transaction active. || (sqlEx.Number == 3919) // 3919 Cannot enlist in the transaction because the transaction has already been committed or rolled back || (sqlEx.Number == 3903)); // The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION. } 

关于重试,建议在重试之间添加随机延迟,以减少例如相同的2个事务再次死锁的可能性。

对于某些与DTC相关的错误,可能需要删除连接(或者最坏的情况是, SqlClient.SqlConnection.ClearAllPools() ) – 否则会将dud连接返回到池中。

本着将问题分开的精神,我在这个案例中描绘了三个逻辑层……

  1. 应用程序层,它调用“片状依赖处理程序”层
  2. “片状依赖处理程序”层,它调用数据访问层
  3. 数据访问层,它不知道任何片状

重试的所有逻辑都在该处理程序层中,以便不使用除与数据库通信之外的逻辑来污染数据访问层。 (因此,您的数据访问代码不需要更改。如果逻辑上需要更改新function,则无需担心“flakiness”。)

重试的模式可以基于在计数器循环中捕获特定exception。 (计数器只是为了防止无限重试。)这样的事情:

 public SomeReturnValue GetSomeData(someIdentifier) { var tries = 0; while (tries < someConfiguredMaximum) { try { tries++; return someDataAccessObject.GetSomeData(someIdentifier); } catch (SqlException e) { someLogger.LogError(e); // maybe wait for some number of milliseconds? make the method async if possible } } throw new CustomException("Maximum number of tries has been reached."); } 

这将循环一些配置的次数,重新尝试直到它工作或达到最大值。 在该最大数量之后,将引发应用程序要处理的自定义exception。 您可以通过检查捕获的特定SqlException来进一步微调exception处理。 可能基于错误消息,您可能希望继续循环或抛出CustomException

您可以通过捕获其他exception类型,检查这些逻辑来进一步优化此逻辑。此处的主要观点是,此职责与应用程序中的特定逻辑层隔离,尽可能对其他层透明。 理想情况下,处理程序层和数据访问层实现相同的接口。 这样,如果您将代码移动到更稳定的环境并且不再需要处理程序层,则删除它而不需要对应用程序层进行任何更改将是微不足道的。

我不知道真正的标准。 您可以尝试查看瞬态故障处理应用程序块 。 它非常强大,但对某些用户来说可能有点太“企业化”。 另一种方法可能是使用方面框架来捕获错误。 或者好的旧尝试/捕获将工作。

至于确定要重试的内容,您通常希望查看exception。 SqlException提供了有关问题根源的大量信息,但解析它可能会很痛苦。 我把一些代码放在一起,将它们拆开并尝试确定哪些是可重试的,哪些不是。 这暂时没有维持,所以你应该把它作为起点而不是成品。 此外,这是针对SQL Azure的,因此它可能无法完全适用于您的情况(例如,资源限制是特定于Azure的function,IIRC)。

 ///  /// Helps to extract useful information from SQLExceptions, particularly in SQL Azure ///  public class SqlExceptionDetails { public ResourcesThrottled SeriouslyExceededResources { get; private set; } public ResourcesThrottled SlightlyExceededResources { get; private set; } public OperationsThrottled OperationsThrottled { get; private set; } public IList Errors { get; private set; } public string ThrottlingMessage { get; private set; } public bool ShouldRetry { get; private set; } public bool ShouldRetryImmediately { get; private set; } private SqlExceptionDetails() { this.ShouldRetryImmediately = false; this.ShouldRetry = true; this.SeriouslyExceededResources = ResourcesThrottled.None; this.SlightlyExceededResources = ResourcesThrottled.None; this.OperationsThrottled = OperationsThrottled.None; Errors = new List(); } public SqlExceptionDetails(SqlException exception) :this(exception.Errors.Cast()) { } public SqlExceptionDetails(IEnumerable errors) : this() { List errorWrappers = (from err in errors select new SqlErrorWrapper(err)).Cast().ToList(); this.ParseErrors(errorWrappers); } public SqlExceptionDetails(IEnumerable errors) : this() { ParseErrors(errors); } private void ParseErrors(IEnumerable errors) { foreach (ISqlError error in errors) { SqlErrorCode code = GetSqlErrorCodeFromInt(error.Number); this.Errors.Add(code); switch (code) { case SqlErrorCode.ServerBusy: ParseServerBusyError(error); break; case SqlErrorCode.ConnectionFailed: //This is a very non-specific error, can happen for almost any reason //so we can't make any conclusions from it break; case SqlErrorCode.DatabaseUnavailable: ShouldRetryImmediately = false; break; case SqlErrorCode.EncryptionNotSupported: //this error code is sometimes sent by the client when it shouldn't be //Therefore we need to retry it, even though it seems this problem wouldn't fix itself ShouldRetry = true; ShouldRetryImmediately = true; break; case SqlErrorCode.DatabaseWorkerThreadThrottling: case SqlErrorCode.ServerWorkerThreadThrottling: ShouldRetry = true; ShouldRetryImmediately = false; break; //The following errors are probably not going to resolved in 10 seconds //They're mostly related to poor query design, broken DB configuration, or too much data case SqlErrorCode.ExceededDatabaseSizeQuota: case SqlErrorCode.TransactionRanTooLong: case SqlErrorCode.TooManyLocks: case SqlErrorCode.ExcessiveTempDBUsage: case SqlErrorCode.ExcessiveMemoryUsage: case SqlErrorCode.ExcessiveTransactionLogUsage: case SqlErrorCode.BlockedByFirewall: case SqlErrorCode.TooManyFirewallRules: case SqlErrorCode.CannotOpenServer: case SqlErrorCode.LoginFailed: case SqlErrorCode.FeatureNotSupported: case SqlErrorCode.StoredProcedureNotFound: case SqlErrorCode.StringOrBinaryDataWouldBeTruncated: this.ShouldRetry = false; break; } } if (this.ShouldRetry && Errors.Count == 1) { SqlErrorCode code = this.Errors[0]; if (code == SqlErrorCode.TransientServerError) { this.ShouldRetryImmediately = true; } } if (IsResourceThrottled(ResourcesThrottled.Quota) || IsResourceThrottled(ResourcesThrottled.Disabled)) { this.ShouldRetry = false; } if (!this.ShouldRetry) { this.ShouldRetryImmediately = false; } SetThrottlingMessage(); } private void SetThrottlingMessage() { if (OperationsThrottled == Sql.OperationsThrottled.None) { ThrottlingMessage = "No throttling"; } else { string opsThrottled = OperationsThrottled.ToString(); string seriousExceeded = SeriouslyExceededResources.ToString(); string slightlyExceeded = SlightlyExceededResources.ToString(); ThrottlingMessage = "SQL Server throttling encountered. Operations throttled: " + opsThrottled + ", Resources Seriously Exceeded: " + seriousExceeded + ", Resources Slightly Exceeded: " + slightlyExceeded; } } private bool IsResourceThrottled(ResourcesThrottled resource) { return ((this.SeriouslyExceededResources & resource) > 0 || (this.SlightlyExceededResources & resource) > 0); } private SqlErrorCode GetSqlErrorCodeFromInt(int p) { switch (p) { case 40014: case 40054: case 40133: case 40506: case 40507: case 40508: case 40512: case 40516: case 40520: case 40521: case 40522: case 40523: case 40524: case 40525: case 40526: case 40527: case 40528: case 40606: case 40607: case 40636: return SqlErrorCode.FeatureNotSupported; } try { return (SqlErrorCode)p; } catch { return SqlErrorCode.Unknown; } } ///  /// Parse out the reason code from a ServerBusy error. ///  /// Basic idea extracted from http://msdn.microsoft.com/en-us/library/gg491230.aspx ///  ///  private void ParseServerBusyError(ISqlError error) { int idx = error.Message.LastIndexOf("Code:"); if (idx < 0) { return; } string reasonCodeString = error.Message.Substring(idx + "Code:".Length); int reasonCode; if (!int.TryParse(reasonCodeString, out reasonCode)) { return; } int opsThrottledInt = (reasonCode & 3); this.OperationsThrottled = (OperationsThrottled)(Math.Max((int)OperationsThrottled, opsThrottledInt)); int slightResourcesMask = reasonCode >> 8; int seriousResourcesMask = reasonCode >> 16; foreach (ResourcesThrottled resourceType in Enum.GetValues(typeof(ResourcesThrottled))) { if ((seriousResourcesMask & (int)resourceType) > 0) { this.SeriouslyExceededResources |= resourceType; } if ((slightResourcesMask & (int)resourceType) > 0) { this.SlightlyExceededResources |= resourceType; } } } } public interface ISqlError { int Number { get; } string Message { get; } } public class SqlErrorWrapper : ISqlError { public SqlErrorWrapper(SqlError error) { this.Number = error.Number; this.Message = error.Message; } public SqlErrorWrapper() { } public int Number { get; set; } public string Message { get; set; } } ///  /// Documents some of the ErrorCodes from SQL/SQL Azure. /// I have not included all possible errors, only the ones I thought useful for modifying runtime behaviors ///  ///  /// Comments come from: http://social.technet.microsoft.com/wiki/contents/articles/sql-azure-connection-management-in-sql-azure.aspx ///  public enum SqlErrorCode : int { ///  /// We don't recognize the error code returned ///  Unknown = 0, ///  /// A SQL feature/function used in the query is not supported. You must fix the query before it will work. /// This is a rollup of many more-specific SQL errors ///  FeatureNotSupported = 1, ///  /// Probable cause is server maintenance/upgrade. Retry connection immediately. ///  TransientServerError = 40197, ///  /// The server is throttling one or more resources. Reasons may be available from other properties ///  ServerBusy = 40501, ///  /// You have reached the per-database cap on worker threads. Investigate long running transactions and reduce server load. /// http://social.technet.microsoft.com/wiki/contents/articles/1541.windows-azure-sql-database-connection-management.aspx#Throttling_Limits ///  DatabaseWorkerThreadThrottling = 10928, ///  /// The per-server worker thread cap has been reached. This may be partially due to load from other databases in a shared hosting environment (eg, SQL Azure). /// You may be able to alleviate the problem by reducing long running transactions. /// http://social.technet.microsoft.com/wiki/contents/articles/1541.windows-azure-sql-database-connection-management.aspx#Throttling_Limits ///  ServerWorkerThreadThrottling = 10929, ExcessiveMemoryUsage = 40553, BlockedByFirewall = 40615, ///  /// The database has reached the maximum size configured in SQL Azure ///  ExceededDatabaseSizeQuota = 40544, ///  /// A transaction ran for too long. This timeout seems to be 24 hours. ///  ///  /// 24 hour limit taken from http://social.technet.microsoft.com/wiki/contents/articles/sql-azure-connection-management-in-sql-azure.aspx ///  TransactionRanTooLong = 40549, TooManyLocks = 40550, ExcessiveTempDBUsage = 40551, ExcessiveTransactionLogUsage = 40552, DatabaseUnavailable = 40613, CannotOpenServer = 40532, ///  /// SQL Azure databases can have at most 128 firewall rules defined ///  TooManyFirewallRules = 40611, ///  /// Theoretically means the DB doesn't support encryption. However, this can be indicated incorrectly due to an error in the client library. /// Therefore, even though this seems like an error that won't fix itself, it's actually a retryable error. ///  ///  /// http://social.msdn.microsoft.com/Forums/en/ssdsgetstarted/thread/e7cbe094-5b55-4b4a-8975-162d899f1d52 ///  EncryptionNotSupported = 20, ///  /// User failed to connect to the database. This is probably not recoverable. ///  ///  /// Some good info on more-specific debugging: http://blogs.msdn.com/b/sql_protocols/archive/2006/02/21/536201.aspx ///  LoginFailed = 18456, ///  /// Failed to connect to the database. Could be due to configuration issues, network issues, bad login... hard to tell ///  ConnectionFailed = 4060, ///  /// Client tried to call a stored procedure that doesn't exist ///  StoredProcedureNotFound = 2812, ///  /// The data supplied is too large for the column ///  StringOrBinaryDataWouldBeTruncated = 8152 }