Skip to content

V17 - Background exception gracefully stops app instead of a fail state #21083

@AaronSadlerUK

Description

@AaronSadlerUK

Which Umbraco version are you using?

17.0.1

Bug summary

When a background service exits with an exception the app will stop gracefully instead of in a failed state.

This causes Docker to see the app as "complete" instead of "fail", when Docker is configured to only restart "failed" apps it will not restart the site as it sees it as a successful stop and not a failed stop.

Image

The log immediately before the app stopped:

{"@t":"2025-12-07T11:58:05.8844896Z","@mt":"Exception ({InstanceId}).","@l":"Error","@x":"Microsoft.Data.SqlClient.SqlException (0x80131904): Lock request time out period exceeded.\nThe statement has been terminated.\n   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)\n   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)\n   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, SqlCommand command, Boolean callerHasConnectionLock, Boolean asyncClose)\n   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)\n   at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)\n   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean isAsync, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)\n   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String method)\n   at Microsoft.Data.SqlClient.SqlCommand.ExecuteNonQuery()\n   at Umbraco.Cms.Infrastructure.Persistence.FaultHandling.RetryPolicy.ExecuteAction[TResult](Func`1 func)\n   at NPoco.Database.Execute(String sql, CommandType commandType, Object[] args)\n   at Umbraco.Cms.Persistence.SqlServer.Services.SqlServerDistributedLockingMechanism.SqlServerDistributedLock.ObtainWriteLock()\n   at Umbraco.Cms.Core.Scoping.LockingMechanism.ObtainWriteLock(Int32 lockId, Nullable`1 timeout)\nClientConnectionId:13cdc954-bc58-400b-86d0-ab85b208c78b\nError Number:1222,State:51,Class:16","InstanceId":"805555b6","SourceContext":"Umbraco.Cms.Infrastructure.Persistence.UmbracoDatabase","ProcessId":1,"ProcessName":"dotnet","ThreadId":35,"ApplicationId":"d2b6c12ab2d22552b3a7faee0509e6139f633afc","MachineName":"963a2d424b37","Log4NetLevel":"ERROR"}
{"@t":"2025-12-07T11:58:05.8914787Z","@mt":"BackgroundService failed","@l":"Error","@x":"Umbraco.Cms.Core.DistributedLocking.Exceptions.DistributedWriteLockTimeoutException: Failed to acquire write lock for id: -347.\n   at Umbraco.Cms.Core.Scoping.LockingMechanism.ObtainWriteLock(Int32 lockId, Nullable`1 timeout)\n   at Umbraco.Cms.Core.Scoping.LockingMechanism.LockInner(Guid instanceId, Dictionary`2& locks, HashSet`1& locksSet, Action`2 obtainLock, Nullable`1 timeout, Int32 lockId)\n   at Umbraco.Cms.Infrastructure.Services.Implement.DistributedJobService.TryTakeRunnableAsync()\n   at Umbraco.Cms.Infrastructure.BackgroundJobs.DistributedBackgroundJobHostedService.RunRunnableJob()\n   at Umbraco.Cms.Infrastructure.BackgroundJobs.DistributedBackgroundJobHostedService.ExecuteAsync(CancellationToken stoppingToken)\n   at Microsoft.Extensions.Hosting.Internal.Host.TryExecuteBackgroundServiceAsync(BackgroundService backgroundService)","EventId":{"Id":9,"Name":"BackgroundServiceFaulted"},"SourceContext":"Microsoft.Extensions.Hosting.Internal.Host","ProcessId":1,"ProcessName":"dotnet","ThreadId":35,"ApplicationId":"d2b6c12ab2d22552b3a7faee0509e6139f633afc","MachineName":"963a2d424b37","Log4NetLevel":"ERROR"}
{"@t":"2025-12-07T11:58:05.8919737Z","@mt":"The HostOptions.BackgroundServiceExceptionBehavior is configured to StopHost. A BackgroundService has thrown an unhandled exception, and the IHost instance is stopping. To avoid this behavior, configure this to Ignore; however the BackgroundService will not be restarted.","@l":"Fatal","@x":"Umbraco.Cms.Core.DistributedLocking.Exceptions.DistributedWriteLockTimeoutException: Failed to acquire write lock for id: -347.\n   at Umbraco.Cms.Core.Scoping.LockingMechanism.ObtainWriteLock(Int32 lockId, Nullable`1 timeout)\n   at Umbraco.Cms.Core.Scoping.LockingMechanism.LockInner(Guid instanceId, Dictionary`2& locks, HashSet`1& locksSet, Action`2 obtainLock, Nullable`1 timeout, Int32 lockId)\n   at Umbraco.Cms.Infrastructure.Services.Implement.DistributedJobService.TryTakeRunnableAsync()\n   at Umbraco.Cms.Infrastructure.BackgroundJobs.DistributedBackgroundJobHostedService.RunRunnableJob()\n   at Umbraco.Cms.Infrastructure.BackgroundJobs.DistributedBackgroundJobHostedService.ExecuteAsync(CancellationToken stoppingToken)\n   at Microsoft.Extensions.Hosting.Internal.Host.TryExecuteBackgroundServiceAsync(BackgroundService backgroundService)","EventId":{"Id":10,"Name":"BackgroundServiceStoppingHost"},"SourceContext":"Microsoft.Extensions.Hosting.Internal.Host","ProcessId":1,"ProcessName":"dotnet","ThreadId":35,"ApplicationId":"d2b6c12ab2d22552b3a7faee0509e6139f633afc","MachineName":"963a2d424b37","Log4NetLevel":"FATAL"}
{"@t":"2025-12-07T11:58:05.8920798Z","@mt":"Application is shutting down...","SourceContext":"Microsoft.Hosting.Lifetime","ProcessId":1,"ProcessName":"dotnet","ThreadId":35,"ApplicationId":"d2b6c12ab2d22552b3a7faee0509e6139f633afc","MachineName":"963a2d424b37","Log4NetLevel":"INFO "}
{"@t":"2025-12-07T11:58:06.7534111Z","@mt":"Exception ({InstanceId}).","@l":"Error","@x":"Microsoft.Data.SqlClient.SqlException (0x80131904): Lock request time out period exceeded.\nThe statement has been terminated.\n   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)\n   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)\n   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, SqlCommand command, Boolean callerHasConnectionLock, Boolean asyncClose)\n   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)\n   at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)\n   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean isAsync, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)\n   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String method)\n   at Microsoft.Data.SqlClient.SqlCommand.ExecuteNonQuery()\n   at Umbraco.Cms.Infrastructure.Persistence.FaultHandling.RetryPolicy.ExecuteAction[TResult](Func`1 func)\n   at NPoco.Database.Execute(String sql, CommandType commandType, Object[] args)\n   at Umbraco.Cms.Persistence.SqlServer.Services.SqlServerDistributedLockingMechanism.SqlServerDistributedLock.ObtainWriteLock()\n   at Umbraco.Cms.Core.Scoping.LockingMechanism.ObtainWriteLock(Int32 lockId, Nullable`1 timeout)\nClientConnectionId:2370dd21-83dc-4023-93e3-e95e608dd5b6\nError Number:1222,State:51,Class:16","InstanceId":"fe0feddd","SourceContext":"Umbraco.Cms.Infrastructure.Persistence.UmbracoDatabase","ProcessId":1,"ProcessName":"dotnet","ThreadId":37,"ApplicationId":"d2b6c12ab2d22552b3a7faee0509e6139f633afc","MachineName":"963a2d424b37","Log4NetLevel":"ERROR"}
{"@t":"2025-12-07T11:58:06.7580452Z","@mt":"Failed to update server record in database.","@l":"Error","@x":"Umbraco.Cms.Core.DistributedLocking.Exceptions.DistributedWriteLockTimeoutException: Failed to acquire write lock for id: -331.\n   at Umbraco.Cms.Core.Scoping.LockingMechanism.ObtainWriteLock(Int32 lockId, Nullable`1 timeout)\n   at Umbraco.Cms.Core.Scoping.LockingMechanism.LockInner(Guid instanceId, Dictionary`2& locks, HashSet`1& locksSet, Action`2 obtainLock, Nullable`1 timeout, Int32 lockId)\n   at Umbraco.Cms.Core.Services.Implement.ServerRegistrationService.TouchServer(String serverAddress, TimeSpan staleTimeout)\n   at Umbraco.Cms.Infrastructure.BackgroundJobs.Jobs.ServerRegistration.TouchServerJob.RunJobAsync()","SourceContext":"Umbraco.Cms.Infrastructure.BackgroundJobs.Jobs.ServerRegistration.TouchServerJob","ProcessId":1,"ProcessName":"dotnet","ThreadId":37,"ApplicationId":"d2b6c12ab2d22552b3a7faee0509e6139f633afc","MachineName":"963a2d424b37","Log4NetLevel":"ERROR"}

This particular log doesn't include the shutdown log, but the previous log does:

{"@t":"2025-12-06T15:02:22.7318995Z","@mt":"The HostOptions.BackgroundServiceExceptionBehavior is configured to StopHost. A BackgroundService has thrown an unhandled exception, and the IHost instance is stopping. To avoid this behavior, configure this to Ignore; however the BackgroundService will not be restarted.","@l":"Fatal","@x":"Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.\nOperation cancelled by user.\n ---> System.ComponentModel.Win32Exception (258): Unknown error 258\n   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)\n   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, SqlCommand command, Boolean callerHasConnectionLock, Boolean asyncClose)\n   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)\n   at Microsoft.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest)\n   at Microsoft.Data.SqlClient.SqlInternalTransaction.Commit()\n   at Microsoft.Data.SqlClient.SqlTransaction.Commit()\n   at NPoco.Database.CompleteTransactionImp(Boolean sync, CancellationToken cancellationToken)\n   at Umbraco.Cms.Infrastructure.Scoping.Scope.DisposeLastScope()\n   at Umbraco.Cms.Infrastructure.Scoping.Scope.Dispose()\n   at Umbraco.Cms.Infrastructure.Services.Implement.DistributedJobService.TryTakeRunnableAsync()\n   at Umbraco.Cms.Infrastructure.BackgroundJobs.DistributedBackgroundJobHostedService.RunRunnableJob()\n   at Umbraco.Cms.Infrastructure.BackgroundJobs.DistributedBackgroundJobHostedService.ExecuteAsync(CancellationToken stoppingToken)\n   at Microsoft.Extensions.Hosting.Internal.Host.TryExecuteBackgroundServiceAsync(BackgroundService backgroundService)\nClientConnectionId:3f6d85ca-8c8f-4d6c-8554-d040bf21d789\nError Number:-2,State:0,Class:11","EventId":{"Id":10,"Name":"BackgroundServiceStoppingHost"},"SourceContext":"Microsoft.Extensions.Hosting.Internal.Host","ProcessId":1,"ProcessName":"dotnet","ThreadId":9,"ApplicationId":"d2b6c12ab2d22552b3a7faee0509e6139f633afc","MachineName":"b781351e0405","Log4NetLevel":"FATAL"}
{"@t":"2025-12-06T15:02:22.7320158Z","@mt":"Application is shutting down...","SourceContext":"Microsoft.Hosting.Lifetime","ProcessId":1,"ProcessName":"dotnet","ThreadId":9,"ApplicationId":"d2b6c12ab2d22552b3a7faee0509e6139f633afc","MachineName":"b781351e0405","Log4NetLevel":"INFO "}

Specifics

The site is a "brochure" style site running in a Docker Swarm with 2 replicas.

Steps to reproduce

Not sure, it seems to happen whenever a BackgroundService exits with an exception, in this case a MainDom lock

Expected result / actual result

It should be doing 2 things:

  1. The site should stop in a failed state - So that the docker swarm restarts the site
  2. The MainDom should not be locked - This looks to be an ongoing unresolved issue? Failed to acquire write lock for id: -333 #14195

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions