Keycloak Cached clientScope not found - keycloak

Getting repeatedly this error in Keycloak logs.
Attached below are the logs for reference:
The said client scope is not found if I try to search the same under Keycloak admin console too.
2022-09-22 04:04:12,718 ERROR [org.key.ser.err.KeycloakErrorHandler] (executor-thread-610) Uncaught server error: java.lang.IllegalStateException: Cached clientScope not found: 1e84ef04-9ef9-44fe-b1bd-f45e6d4 at org.keycloak.models.cache.infinispan.RealmAdapter.lambda$getClientScopesStream$3(RealmAdapter.java:1495) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) at java.base/java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1632) at
Steps to recreate:
Create client scope and attached to a client
Remove client scope attached with a client
Delete client scope
On the occurrence of this issue, I see well-known and many other endpoints to start giving 500 as they seem to be fetching cached scoped and doing certain validation
Any configuration miss that needs to be considered?

Related

OpenSearch 1.3 > 2.3 upgrade, CloudFormation fails on domain update

I recently updated our CDK code to move our OpenSearch cluster from version 1.3 to 2.3. The cluster itself seems to have upgraded to a healthy state and is still accessible / usable by our application, but CloudFormation failed when attempting to update our domain resource with:
Resource handler returned message: "Resource handler returned message: "Invalid request provided: DP Nodes are OOS, Tags operation is not allowed"
This kicked the stack into UPDATE_ROLLBACK_FAILED, which is not allowed. The cluster cannot be downgraded back to 1.3.
I'm struggling to find any information about this error it's kicking out and not quite sure how to resolve it to unblock the CloudFormation stack.
Things I have tried:
Digging through CloudWatch logs only revealed information pertaining to queries.
Forcing the rollback to occur without Domain resource. This got me back to an UPDATE_COMPLETE state, but each subsequent deploy of this stack will cause it to fail again since the core issue is not resolved.
This was an odd presentation of a permissions issue. As I was reading through some docs, I stumbled upon this section, which discusses changes to tag-based access control.
This lead me start looking into CloudTrail a bit and stumbled upon the exact error that was firing when this deploy happened. It was a little odd because the assumed role granted admin access to CloudFormation, but the last line of this event record caught my eye:
"sourceIPAddress": "cloudformation.amazonaws.com",
"userAgent": "cloudformation.amazonaws.com",
"errorCode": "ValidationException",
"errorMessage": "DP Nodes are OOS, Tags operation is not allowed",
"eventSource": "es.amazonaws.com",
Upon adding es.amazonaws.com to the trust relationship of that role, the deploy fully re-ran successfully.
Hopefully this helps someone else.

Add Relying Party Trust is failing in ADFS SAML

I've spent quite a few hours fighting with these issues so I though a quick recap might be helpful for somebody else too.
First, when trying to import an RP from a metadata URL:
I was getting this error:
An error occured during an attempt to read the federation metadata. Verify that the specified URL or hostname is a valid federation metadata endpoint.
...
Error message: The underlying connection was closed: An unexpected error occured on a send.
The problem turned out to be caused by the fact that Windows Server at least up to 2016 is using TLS 1.0 for .NET framework (in which the ADFS configuration wizard is implemented) while my service hosting the metadata document only allowed TLS 1.2 as the minimum version:
Dropping the minimum version to TLS 1.0 is a no-go from security point of view, so the proper fix would be to enable TLS 1.2 as the default version on the ADFS server.
That would solve the issue (which I confirmed with a test) but then some of the other RPs only supporting TLS 1.0 would stop working, so I had to give up on importing metadata directly from a URL and use the file import option:
In this case another error popped up, which happened to be:
An error occured during an attempt to read the federation metadata. Verify that the specified URL or hostname is a valid federation metadata endpoint.
...
Error message: Entity descriptor '...'. ID6018: Digest verification failed for reference '...'.
This one turned out to be caused by me when I formatted the XML in the metadata file with line breaks and tabs to improve readability - it's all on a single line originally. ADFS won't allow that so the document must be exactly the same it came out of the metadata endpoint.
The same issue might result in different error messages and codes, depending on Windows and ADFS versions. For example this one is possible caused by a failed metadata integrity check as well:
An error occured during an attempt to read the federation metadata. Verify that the specified URL or hostname is a valid federation metadata endpoint.
...
Error message: Entity descriptor '...'. ID6013: The signature verification failed.
After having successfully imported a raw metadata file and having added a suitable Claim Issuance Policy I've got it finally working:

Failure when trying to delete service key of s3 storage service

Deleting a service key fails with internal server error.
The following command produces the error:
cf dsk -f storage storage-keys
Deleting key storage-keys for service instance storage as user...
FAILED
Server error, status code: 502, error code: 10001, message: Service broker failed to delete service binding for instance storage: Service broker error: Internal Server Error
The error seems to be specific to this service key. It is a service key of a S3 storage service. I tried for several days with always the same error result. Withing the same time span deleting othe service keys (of Maria DB services) worked as expected.
This is a Swisscom specific problem that we have experienced as well. The only way to resolve the error was to send a direct support request via this form:
https://developer.swisscom.com/support
They will then proceed to manually delete the binding and ideally fix the related server error.

WSO2 API MANAGER clustering Worker-Manager

This is regarding WSO2 API Manager Worker cluster configuration with external Postgres db. I have used 2 databases i.e wso2_carbon for registry and user management and the wso2_am, for storing APIs. Respective xmls have been configured. The postgres scripts have been run to create the database tables. My log console when wso2server.sh is run, shows enabled clustering and the members of the domain. However on the https://: when I try to create to create APIs, it throws and error in the design phase itself.
ERROR - add:jag org.wso2.carbon.apimgt.api.APIManagementException: Error while checking whether context exists
[2016-12-13 04:32:37,737] ERROR - ApiMgtDAO Error while locating API: admin-hello-v.1.2.3 from the database
java.sql.SQLException: org.postgres.Driver cannot be found by jdbc-pool_7.0.34.wso2v2
As per the error message, the driver class name you have given is org.postgres.Driver which is not correct. It should be org.postgresql.Driver. Double check master-datasource.xml config.

ServiceProxy throws ProtocolException, communication is not restored on retrying

We are seeing ProtocolExceptions while communicating with a service running in the cluster. The message and InnerException message:
System.ServiceModel.ProtocolException: You have tried to create a channel to a service that does not support .Net Framing.
---> System.IO.InvalidDataException: Expected record type 'PreambleAck', found '145'.
This service is running on a local dev cluster, and the exception is thrown after communicating successfully with the service.
The code that we use for communicating is:
var eventHandlerServiceClient = ServiceProxy.Create<IEventHandlerService>(eventHandlerTypeName, new Uri(ServiceFabricSettings.EventHandlerServiceName));
return await eventHandlerServiceClient.GetQueueLength();
We have retry logic (with increasing delay's between the attempts). But this call never succeeds. So it looks like the service is in a fault state and cannot recover from it.
Update
We are also seeing the following errors in the logs:
connection 0x1B6F9EB0 localhost:64002-[::1]:50376 target 0x1B64F3C0: invalid frame: length=0x1000100,type=514,header=28278,check=0x742E7465
Update 14-12-2015
If this ProtocolException is thrown, retries don't help. Even after hours of waiting, it still fails.
We log the endpoint address with
var spr = ServicePartitionResolver.GetDefault();
var x = await spr.ResolveAsync(new Uri(ServiceFabricSettings.EventHandlerServiceName),
eventHandlerTypeName,
new CancellationToken());
var endpointAddress = x.GetEndpoint().Address;
The resolved endpoint looks like
{"Endpoints":{"":"net.tcp:\/\/localhost:57999\/d6782e21-87c0-40d1-a505-ec6f64d586db\/a00e6931-aee6-4c6d-868a-f8003864a216-130945476153695343"}}
This endpoint is the same as reported by the Service Fabric Explorer.
From our logs seen, it seems that this service is working (it is reachable via another API method), but this specific call never succeeds.
This typically indicate mismatched communication stack on the service and client side. Once the service is up and running, check the endpoint of the service replica via Service Fabric Explorer. If that seems fine, check that the client is connecting to the right service. Resolve the partition using the ServicePartitionResolver (https://msdn.microsoft.com/en-us/library/azure/microsoft.servicefabric.services.servicepartitionresolver.aspx), passing the same arguments that you pass to ServiceProxy.
I'm seeing the same sort of errors. Just looking at my code, I'm caching an actorproxy. I'm going to change that and remove the caching in case the cache is referencing an old instance of the service.
That appears to have fixed my issues. I'm guessing that the proxy caches the reference once it has been used and if the service changes, that reference is out of date.