Service Fabric - "object is closed" exception via ActorProxy - azure-service-fabric

I'm catching the following exception on my cluster, randomly when an actor calls another one via ActorProxy:
System.Fabric.FabricObjectClosedException: The object is closed. ---> System.Runtime.InteropServices.COMException: Exception from HRESULT: 0x80071BFE
at System.Fabric.Interop.NativeRuntime.IFabricKeyValueStoreReplica6.CreateTransaction()
at System.Fabric.KeyValueStoreReplica.CreateTransactionHelper(KeyValueStoreTransactionSettings settings)
at System.Fabric.Interop.Utility.WrapNativeSyncInvoke[TResult](Func1 func, String functionTag, String functionArgs)
--- End of inner exception stack trace ---
at System.Fabric.Interop.Utility.WrapNativeSyncInvoke[TResult](Func1 func, String functionTag, String functionArgs)
at Microsoft.ServiceFabric.Actors.Runtime.KvsActorStateProvider.<>c__DisplayClass14.b__13()
at Microsoft.ServiceFabric.Actors.Runtime.ActorStateProviderHelper.d__61.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ServiceFabric.Actors.Runtime.ActorStateManager.<ContainsStateAsync>d__17.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at PosteItaliane.Sin.StateManagement.ObservableState.SingleValueWrapper1.d__2.MoveNext() in C:\Users\maurosag\Source\Repos\Equitalia3\SIN\PI.Sin.StateManagement\ObservableState\SingleValueWrapper.cs:line 21
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at PosteItaliane.Sin.Utility.ServiceFabric.StatefulActor`1.get_State() ....
Searching on the web, i cannot figure out how to resolve it; the only discussions concern with Powershell exec:
“This means that this replica got demoted from primary to secondary. The client will re-resolve and reconnect to the new primary if you let the exception bubble up. The existing processing will drain on the primary. I am resolving the issue as "By Design". Please feel free to reopen if you still have question.”
Anyone can help me?
Thanks in advance.

Related

Cosmos DB EF ReadItemAsync exception occurs Response status code does not indicate success: Unauthorized (401);

The command I'm executing:
var feature = await container.ReadItemAsync<CosmosNormalizedFeatureModel>(guid, new Microsoft.Azure.Cosmos.PartitionKey(partitionKey));
Throws an exception:
Response status code does not indicate success: Unauthorized (401); Substatus: 0; ActivityId: ; Reason: ();
I don't believe this is true, but I don't see anything wrong either.
when I use GetItemLinqQueryable I have no issues connecting to Cosmos
I've verified the partition key exists + set to correct property and returns data
I've verified the guid/id exists and returns data
I've verified the container is set to the correct container
Microsoft.Azure.Cosmos 3.20.1
Not sure what else I can check to troubleshoot the issue. Thanks!
Stack trace
at Microsoft.Azure.Cosmos.ResponseMessage.EnsureSuccessStatusCode()
at Microsoft.Azure.Cosmos.CosmosResponseFactoryCore.ProcessMessage[T](ResponseMessage responseMessage, Func`2 createResponse)
at Microsoft.Azure.Cosmos.CosmosResponseFactoryCore.CreateItemResponse[T](ResponseMessage responseMessage)
at Microsoft.Azure.Cosmos.ContainerCore.<ReadItemAsync>d__56`1.MoveNext()
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at Microsoft.Azure.Cosmos.ClientContextCore.<RunWithDiagnosticsHelperAsync>d__38`1.MoveNext()
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at Microsoft.Azure.Cosmos.ClientContextCore.<OperationHelperWithRootTraceAsync>d__29`1.MoveNext()
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at xxx.<GetFeatureByGuid>d__7.MoveNext() in D:\xxx.cs:line 183
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at xxx.<GetNormalizedILIReportFeatureByGuid>d__10.MoveNext() in D:\xxx.cs:line 280
Based on comments - you are using Bulk mode.
When using Bulk mode, operations are packed together to improve network performance (the operation type is not relevant, read operations can be packed with write operations) and sent as a single payload to the backend.
The payload is of a different type calling a different API (so the backend can unpack them and process them and return a packed response).
This API uses the Write keys (because inside the package there could be any type of operation). The fact that you are using the Read-only keys is what is causing the 401. Ideally the backend should be more explicit in the error it returns though.
The key being used to connect to cosmos is a read key, it appears that the point read requires a read/write key.

Some windows container pods restarting on AKS & throwing RabbitMq connection failure from 60 pods and after several restarts come to running state

After deploying around 60 pods on AKS which uses Rebus RabbitMq. During the initialization, some around 15 pods restart several times and then come into running state. Below error thrown by the components,
*Unhandled Exception: Rebus.Injection.ResolutionException: Could not resolve Rebus.Bus.IBus with decorator depth 0 - registrations: Rebus.Injection.Injectionist+Handler ---> RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable ---> System.AggregateException: One or more errors occurred. ---> RabbitMQ.Client.Exceptions.ConnectFailureException: Connection failed ---> System.Net.Sockets.SocketException: No such host is known
at System.Net.Dns.HostResolutionEndHelper(IAsyncResult asyncResult)
at System.Net.Dns.EndGetHostAddresses(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at RabbitMQ.Client.TcpClientAdapter.<ConnectAsync>d__2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at RabbitMQ.Client.Impl.TaskExtensions.<TimeoutAfter>d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectOrFail(ITcpClient socket, AmqpTcpEndpoint endpoint, Int32 timeout)
--- End of inner exception stack trace ---
at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectUsingAddressFamily(AmqpTcpEndpoint endpoint, Func`2 socketFactory, Int32 timeout, AddressFamily family)
at RabbitMQ.Client.Impl.SocketFrameHandler..ctor(AmqpTcpEndpoint endpoint, Func`2 socketFactory, Int32 connectionTimeout, Int32 readTimeout, Int32 writeTimeout)
at RabbitMQ.Client.ConnectionFactory.CreateFrameHandler(AmqpTcpEndpoint endpoint)
at RabbitMQ.Client.EndpointResolverExtensions.SelectOne[T](IEndpointResolver resolver, Func`2 selector)
--- End of inner exception stack trace ---
at RabbitMQ.Client.EndpointResolverExtensions.SelectOne[T](IEndpointResolver resolver, Func`2 selector)
at RabbitMQ.Client.Framing.Impl.AutorecoveringConnection.Init(IEndpointResolver endpoints)
at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)
--- End of inner exception stack trace ---
at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)
at Rebus.Internals.ConnectionManager.GetConnection()
at Rebus.RabbitMq.RabbitMqTransport.CreateQueue(String address)
at Rebus.Config.RebusConfigurer.<>c__DisplayClass12_0.<Start>b__26(IResolutionContext c)
at Rebus.Injection.Injectionist.ResolutionContext.Get[TService]()
--- End of inner exception stack trace ---
at Rebus.Injection.Injectionist.ResolutionContext.Get[TService]()
at Rebus.Injection.Injectionist.Get[TService]()
at Rebus.Config.RebusConfigurer.Start()
at Castle.Windsor.Installer.AssemblyInstaller.Install(IWindsorContainer container, IConfigurationStore store)
at Castle.Windsor.WindsorContainer.Install(IWindsorInstaller[] installers, DefaultComponentInstaller scope)
at Castle.Windsor.WindsorContainer.Install(IWindsorInstaller[] installers)
at RebusHost.Main(String[] args)*
Although there is a connection available to RabbitMq server but some pods on start give this error and after 3 to 5 restarts they are in successful running state. So not sure what will be causing pod to not get connected on first attempt itself. Any clue will be appreciated.
We are using Rebus 4.0 & RabbitMq 5.1.0.0 versions. Deploying the components(pods) on windows node of AKS. And on AKS running docker image of "rabbitmq:3-management" under linux node ofcourse.

Postgres Npgsql.PostgresException deadlock detected

We are developing a group chat application and using PostgreSQL to store chat messages.
CREATE TABLE public.chatmessage
(
chatmessageid uuid NOT NULL DEFAULT uuid_generate_v4(),
text character varying COLLATE pg_catalog."default",
planid uuid NOT NULL,
userid uuid NOT NULL,
createdat timestamp with time zone NOT NULL DEFAULT timezone('utc'::text, now()),
updatedat timestamp with time zone,
deleted boolean NOT NULL DEFAULT false,
viewedallstatus boolean,
vieweduserids uuid[],
alloweduserids uuid[],
CONSTRAINT chatmessage_pkey PRIMARY KEY (chatmessageid)
)
To manage read status, we are storing the userIds of viewed participants in vieweduserIds column.
While heavily using the group chat with 5 or more participants, we are getting the following exception,
Npgsql.PostgresException (0x80004005): 40P01: deadlock detected
at Npgsql.NpgsqlConnector.<DoReadMessage>d__157.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlConnector.<ReadMessage>d__156.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at Npgsql.NpgsqlConnector.<ReadMessage>d__156.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlDataReader.<NextResult>d__32.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Npgsql.NpgsqlDataReader.<<NextResultAsync>b__31_0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Npgsql.NpgsqlCommand.<Execute>d__71.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlCommand.<ExecuteNonQuery>d__84.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Npgsql.NpgsqlCommand.<>c__DisplayClass83_0.<<ExecuteNonQueryAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Dapper.SqlMapper.<ExecuteImplAsync>d__37.MoveNext() in C:\projects\dapper\Dapper\SqlMapper.Async.cs:line 646
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
I feel since the application simultaneously updates the same column this issue is coming. The query which updates vieweduserIds column is as follows,
"Update ChatMessage Set ViewedUserIds = ViewedUserIds || #UserId Where PlanId = #PlanId And #UserId = Any(AllowedUserIds) And Not (#UserId = Any(ViewedUserIds)) And Deleted = False;"
How can we resolve this issue?
A deadlock is no problem unless it happens frequently.
The correct solution is for the application not to panic, but simply to retry the database transaction.
If deadlocks happen too frequently, you need to investigate and remedy the cause. Usually that means to keep your transactions short and small and to always lock objects in a certain order.
If you need specific help, read the PostgreSQL log file to find out which queries are involved. You'll have to understand which locks conflict to determine the root cause.

Service fabric TypeInitializationException during Application Upgrade

I am trying to upgrade the app version for one our SF solutions. But failed multiple times as one of the services is reporting an issue during start with the new version.
Here is what I see as 2 exceptions happening almost at the same time:
OnApply
Unexpected service exception. Type: System.TypeInitializationException Message: The type initializer for 'MyCompany.MyService.Interfaces.Models.MyUser' threw an exception. HResult: 0x80131534
Log record. Type: BeginTransaction LSN: 103498
at System.Fabric.Store.TStore`5.OnApplyAdd(TransactionBase txn, MetadataOperationData metadataOperationData, RedoUndoOperationData operationRedoUndo, Boolean isIdempotent, String applyType)
at System.Fabric.Store.TStore`5.<OnRecoveryApplyAsync>d__299.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Fabric.Store.TStore`5.<Microsoft-ServiceFabric-Replicator-IStateProvider2-ApplyAsync>d__237.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ServiceFabric.Replicator.DynamicStateManager.<OnApplyAsync>d__106.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ServiceFabric.Replicator.DynamicStateManager.<OnApplyAsync>d__105.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ServiceFabric.Replicator.OperationProcessor.<ApplyCallback>d__36.MoveNext().
And
Exception in OpenAsync. Type: System.TypeInitializationException Message: The type initializer for 'MyCompany.MyService.Interfaces.Models.MyUser' threw an exception. HResult: 0x80131534. Stack Trace: at Microsoft.ServiceFabric.Replicator.RecoveryManager.<PerformRecoveryAsync>d__31.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ServiceFabric.Replicator.LoggingReplicator.<PerformRecoveryAsync>d__137.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ServiceFabric.Replicator.DynamicStateManager.<OpenAsync>d__109.MoveNext().
I would expect to see this kind of error in case any changes happened to the "MyUser" model, but it was not changed at all.
Not sure if it has to do with some kind of an issue with the SF version we are using: 5.7.198 as this is not the latest one.
Anyone faced something similar or have good ideas for a work around?
P.S. This is a production system with real customers. Being able to make upgrades and not loosing their data is a must. Hence re-creation of the SF/cluster is not an option.

get_Error_BPAAsimovNotReachedRetrying() not found when creating/validating cluster

I'm trying to create a standalone Service Fabric cluster using 5.4.145.9494 SDK bits and when running .\TestConfiguration.ps1 .\ClusterConfig.Unsecure.DevCluster.json with no changes to the downloaded SDK whatsoever I'm getting following error:
Test Config failed with exception: System.AggregateException: One or
more errors occurred. ---> System.MissingMethodExce ption: Method not
found: 'System.String
System.Fabric.Strings.StringResources.get_Error_BPAAsimovNotReachedRetrying()'.
at
Microsoft.ServiceFabric.DeploymentManager.Common.StandaloneSettingsValidator.Validate()
at
Microsoft.ServiceFabric.DeploymentManager.BPA.BestPracticesAnalyzer.IsJsonConfigModelValid(StandAloneInstallerJson
Model config) at
Microsoft.ServiceFabric.DeploymentManager.BPA.BestPracticesAnalyzer.AnalyzeClusterSetup(String
configPath, String cabPath, Boolean usingClusterManifest,
FabricPackageType fabricPackageType) at
System.Threading.Tasks.Task`1.InnerInvoke() at
System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown --- at
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task
task) at
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
task) at
Microsoft.ServiceFabric.DeploymentManager.BPA.BestPracticesAnalyzer.d__3.MoveNext()
--- End of inner exception stack trace --- at System.Threading.Tasks.Task`1.GetResultCore(Boolean
waitCompletionNotification) at
Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.BpaAnalyzeClusterSetup(String
clusterConfigPat h, String fabricPackagePath) at
Microsoft.ServiceFabric.Powershell.ClusterCmdletBase.TestConfig(String
clusterConfigPath, String fabricPackagePath ) at
System.Management.Automation.CommandProcessor.ProcessRecord()
---> (Inner Exception #0) System.MissingMethodException: Method not found: 'System.String System.Fabric.Strings.StringRe
sources.get_Error_BPAAsimovNotReachedRetrying()'. at
Microsoft.ServiceFabric.DeploymentManager.Common.StandaloneSettingsValidator.Validate()
at
Microsoft.ServiceFabric.DeploymentManager.BPA.BestPracticesAnalyzer.IsJsonConfigModelValid(StandAloneInstallerJson
Model config) at
Microsoft.ServiceFabric.DeploymentManager.BPA.BestPracticesAnalyzer.AnalyzeClusterSetup(String
configPath, String cabPath, Boolean usingClusterManifest,
FabricPackageType fabricPackageType) at
System.Threading.Tasks.Task`1.InnerInvoke() at
System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown --- at
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task
task) at
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
task) at
Microsoft.ServiceFabric.DeploymentManager.BPA.BestPracticesAnalyzer.d__3.MoveNext()<---
The same error is printed when trying to use createservicefabriccluster.ps1.
I'm trying it on a Windows Server 2012R2 machine. Interestingly, the same works just fine on another Windows 10 machine. There are other differences (Windows Server 2012 R2 machine is in secure environment with a bunch of access policies around network, disk access, etc.) but it's hard to tell what's actually causing validation to fail with a message like that ...
My question: How do I get pass that "MissingMethodException" noise and learn the real issue?
Turns out that's the exception you get when the machine is already part of a previously defined standalone, non-Dev cluster. Running .\cleanFabric.ps1 made it work again.
Somebody should make the error message better ...