OrientDB in distributed keeps getting ConcurrentModificationException - orientdb

I'm using orientdb community edition 2.2.9 with pyorient's binary serialiser (in development branch).
I have 3 nodes running on AWS.
I'm only using 1 node as a master for writing/reading and the other nodes for reading and replication.
I have configured the nodes with the following:
<properties>
<entry value="2147483647" name="ridBag.embeddedToSbtreeBonsaiThreshold"/>
<entry value="-1" name="index.embeddedToSbtreeBonsaiThreshold"/>
</properties>
I'm not using Java so the MVCC examples in the documentation is not really helping me.
I'm also not using transactions and I start the server with the following parameters:
java -Dcache.level1.enabled=false -Ddb.mvcc=false
I read in the docs that you can't disable mvcc any more so I assume that mvcc setting is useless.
I use rabbitmq and celery to queue tasks. When running orientDB in normal mode, when I get a "ConcurrentModificationException" error, celery will just retry that task and it usually succeeds. When running in distributed mode, that task keeps failing because the vertices version never seem to match up.
It doesn't matter what I do, I keep getting "ConcurrentModificationException".
I can see in the config that mvcc is still enabled.
I thought I tried everything the documentation suggested but all the nuances of running in distributed mode is scattered all over the documentation and not in one place. It's so easy to miss something :(
How can I avoid this issue?

Setting "writeQuorum" to 1 fixed this issue for me. Also, running "executionMode" as "asynchronous" also created problems.
Also, removing the following from my config also may have helped with this issue:
....
<handler class="com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin">
<parameters>
....
<parameter value="com.orientechnologies.orient.server.distributed.conflict.ODefaultReplicationConflictResolver" name="conflict.resolver.impl"/>
....
</parameters>
</handler>
...
Here is the default-distributed-db-config.json that got things working for me.
{
"autoDeploy": true,
"readQuorum": 1,
"writeQuorum": 1,
"executionMode": "synchronous",
"readYourWrites": true,
"servers": {
"*": "master"
},
"clusters": {
"internal": {
},
"*": {
"servers": ["<NEW_NODE>"]
}
}
}
I hope it helps someone.

Related

Service Fabric - 'System.Replicator' reported Warning for property 'RemoteReplicatorConnectionStatus'. Replica xyz cannot be reached

The error is
'System.Replicator' reported Warning for property 'RemoteReplicatorConnectionStatus'.
Replica 132295460844367404 cannot be reached to start the copy process. Error Code: CannotConnect, Target listen address: localhost:62352/5298ce62-a8b6-4c10-944c-ce861fb5abd9-132295460844367404;70bcec58-3f57-4a23-b787-7353d53e631d:fdd277399fb82af80e7f8a0f097d244d. Verify that ReplicatorAddress config is valid.
There are 3 replicas, and 2 of them are stuck InBuild. The error reports as coming from the Primary replica, and the replicaId it complains about is of one of the secondary replicas that is stuck InBuild.
Everything I find on this error is related to standalone clusters, but my cluster is Azure generated. What are some causes of this error? It only happens for my stateful service when I deploy multiple replicas.
In the Primary replica events it shows the following error for each of the other 2 replicas
"Description": "The api IReplicator.BuildReplica(132295460844367404) on node _default_4 is stuck. Start Time (UTC): 2020-03-24 17:55:24.215.",
If I set replica count to 1 the error doesn't appear, until I try to upgrade the application at which point it created a idle replica to swap and gets stuck on this error at that point causing the upgrade to hang indefinitely.
The same application can be deployed to my local 5 node cluster with no errors.
I started commenting out code to see if I could get to a state where it was working, and eventually narrowed it down to the way I was overriding the replicator settings.
I was doing this
public MyStateFulService(StatefulServiceContext context)
: base(context, new ReliableStateManager(context, new ReliableStateManagerConfiguration(new ReliableStateManagerReplicatorSettings
{
MaxReplicationMessageSize = 1073741824
}))){ }
and changing it to
<Section Name="ReplicatorConfig">
<Parameter Name="ReplicatorEndpoint" Value="ReplicatorEndpoint" />
<Parameter Name="MaxReplicationMessageSize" Value="524288000" />
<Parameter Name="MinLogSizeInMB" Value="4096" />
</Section>
resolved the issue. I assume I was overriding the default replicator endpoint by creating a new ReliableStateManagerReplicatorSettings object

JPAM Configuration for Apache Drill

I'm trying to configure PLAIN authentification based on JPAM 1.1 and am going crazy since it doesnt work after x times checking my syntax and settings. When I start drill with cluster-id and zk-connect only, it works, but with both options of PLAIN authentification it fails. Since I started with pam4j and tried JPAM later on, I kept JPAM for this post. In general I don't have any preferences. I just want to get it done. I'm running Drill on CentOS in embedded mode.
I've done anything required due to the official documentation:
I downloaded JPAM 1.1, uncompressed it and put libjpam.so into a specific folder (/opt/pamfile/)
I've edited drill-env.sh with:
export DRILLBIT_JAVA_OPTS="-Djava.library.path=/opt/pamfile/"
I edited drill-override.conf with:
drill.exec: {
cluster-id: "drillbits1",
zk.connect: "local",
impersonation: {
enabled: true,
max_chained_user_hops: 3
},
security: {
auth.mechanisms: ["PLAIN"],
},
security.user.auth: {
enabled: true,
packages += "org.apache.drill.exec.rpc.user.security",
impl: "pam",
pam_profiles: [ "sudo", "login" ]
}
}
It throws the subsequent error:
Error: Failure in starting embedded Drillbit: org.apache.drill.exec.exception.DrillbitStartupException: Problem in finding the native library of JPAM (Pluggable Authenticator Module API). Make sure to set Drillbit JVM option 'java.library.path' to point to the directory where the native JPAM exists.:no jpam in java.library.path (state=,code=0)
I've run that *.sh file by hand to make sure that the necessary path is exported since I don't know if Drill is expecting that. The path to libjpam should be know known. I've started Sqlline with sudo et cetera. No chance. Documentation doesn't help. I don't get it why it's so bad and imo incomplete. Sadly there is 0 explanation how to troubleshoot or configure basic user authentification in detail.
Or do I have to do something which is not told but expected? Are there any Prerequsites concerning PLAIN authentification which aren't mentioned by Apache Drill itself?
Try change:
export DRILLBIT_JAVA_OPTS="-Djava.library.path=/opt/pamfile/"
to:
export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS -Djava.library.path=/opt/pamfile/"
It works for me.

write log to Application insights from local service fabric

I am trying to integrate Azure App insights service into the service fabric app for logging and instrumentation. I am running fabric code on my local VM. I exactly followed the document here [scenario 2]. Other resources on learn.microsoft.com also seem to indicate the same steps. [ex: https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-diagnostics-event-aggregation-eventflow
For some reason, I don’t see any event entries in App insights. No errors in code when I do this:
ServiceEventSource.Current.ProcessedCountMetric("synced",sw.ElapsedMilliseconds, crc.DataTable.Rows.Count);
eventflowconfig.json contents
{
"inputs": [
{
"type": "EventSource",
"sources": [
{ "providerName": "Microsoft-ServiceFabric-Services" },
{ "providerName": "Microsoft-ServiceFabric-Actors" },
{ "providerName": "mystatefulservice" }
]
}
],
"filters": [
{
"type": "drop",
"include": "Level == Verbose"
}
],
"outputs": [
{
"type": "ApplicationInsights",
// (replace the following value with your AI resource's instrumentation key)
"instrumentationKey": "XXXXXXXXXXXXXXXXXXXXXX",
"filters": [
{
"type": "metadata",
"metadata": "metric",
"include": "ProviderName == mystatefulservice && EventName == ProcessedCountMetric",
"operationProperty": "operation",
"elapsedMilliSecondsProperty": "elapsedMilliSeconds",
"recordCountProperty": "recordCount"
}
]
}
],
"schemaVersion": "2016-08-11"
}
In ServiceEventSource.cs
[Event(ProcessedCountMetricEventId, Level = EventLevel.Informational)]
public void ProcessedCountMetric(string operation, long elapsedMilliSeconds, int recordCount)
{
if (IsEnabled())
WriteEvent(ProcessedCountMetricEventId, operation, elapsedMilliSeconds, recordCount);
}
EDIT
Adding diagnostics pipeline code from "Program.cs in fabric stateful service
using (var diagnosticsPipeline =
ServiceFabricDiagnosticPipelineFactory.CreatePipeline($"{ServiceFabricGlobalConstants.AppName}-mystatefulservice-DiagnosticsPipeline")
)
{
ServiceRuntime.RegisterServiceAsync("mystatefulserviceType",
context => new mystatefulservice(context)).GetAwaiter().GetResult();
ServiceEventSource.Current.ServiceTypeRegistered(Process.GetCurrentProcess().Id,
typeof(mystatefulservice).Name);
// Prevents this host process from terminating so services keep running.
Thread.Sleep(Timeout.Infinite);
}
Event Source is a tricky technology, I have been working with it for a while and always have problems. The configuration looks good, It is very hard to investigate without access to the environments, so I will make my suggestions.
There are a few catches you must be aware of:
If you are listening etw events from a different process, your process must be running with a user with permissions on 'Performance Log Users. Check which identity your service is running on and if it is part of performance log users, who has permissions to create event sessions to listen for these events.
Ensure the events are being emitted correctly and you can listen to them from Diagnostics Events Window, if it is not showing there, there is a problem in the provider.
For testing purposes, comment out the line if (IsEnabled()). it is an internal check to validate if your events should be emitted. I had situations where it is always false and skip the emit of events, probably it cache the results for a while, the docs are not clear how it should work.
Whenever possible, use the EventSource from the nuget package instead of the framework one, the framework version is full of bugs and lack fixes found in the nuget version.
Application Insights are not RealTime, sometimes it might take a few minutes to process your events, I would recommend to output the events to a console or file and check if it is listening correctly, afterwards, enable the AppInsights.
The link you provide is quite outdated and there's actually a much better way to log application error and exception info to Application insights. For example, the above won't help with tracking the call hierarchy of an incoming request between multiple services.
Have a look at the Microsoft App Insights Service Fabric nuget packages. It works great for:
Sending error and exception info
Populating the application map with all your services and their dependencies (including database)
Reporting on app performance metrics,
Tracing service call dependencies end-to-end,
Integrating with native as well as non-native SF applications

Logstash-Forwader 3.1 state file .logstash-forwarder not updating

I am having an issue with Logstash-forwarder 3.1.1 on Centos 6.5 where the state file /.logstash-forwarder is not updating as information is sent to Logstash.
I have found as activity is logged by logstash-forwarder the corresponding offset is not recorded in /.logstash-forwarder 'logrotate' file. The ./logstash-forwarder file is being recreated each time 100 events are recorded but not updated with data. I know the file has been recreated because I changed permissions to test, and permissions are reset each time.
Below are my configurations (With some actual data italicized/scrubbed):
Logstash-forwarder 3.1.1
Centos 6.5
/etc/logstash-forwarder
Note that the "paths" key does contain wildcards
{
"network": {
"servers": [ "*server*:*port*" ],
"timeout": 15,
"ssl ca": "/*path*/logstash-forwarder.crt"
},
"files": [
{
"paths": [
"/a/b/tomcat-*-*/logs/catalina.out"
],
"fields": { "type": "apache", "time_zone": "EST" }
}
]
}
Per logstash instructions for Centos 6.5 I have configured the LOGSTASH_FORWARDER_OPTIONS value so it looks like the following:
LOGSTASH_FORWARDER_OPTIONS="-config /etc/logstash-forwarder -spool-size 100"
Below is the resting state of the /.logstash-forwarder logrotate file:
{"/a/b/tomcat-set-1/logs/catalina.out":{"source":"/a/b/tomcat-set-1/logs/catalina.out","offset":433564,"inode":*number1*,"device":*number2*},"/a/b/tomcat-set-2/logs/catalina.out":{"source":"/a/b/tomcat-set-2/logs/catalina.out","offset":18782151,"inode":*number3*,"device":*number4*}}
There are two sets of logs that this is capturing. The offset has stayed the same for 20 minutes while activities have been occurred and sent over to Logstash.
Can anyone give me any advice on how to fix this problem whether it be a configuration setting I missed or a bug?
Thank you!
After more research I found it was announced that Filebeats is the preferred forwarder of choice now. I even found a post by the owner of Logstash-Forwarder that the program is full of bugs and is not fully supported any more.
I have instead moved to Centos7 using the latest version of the ELK stack, using Filbeats as the forwarder. Things are going much smoother now!

Is it available multi task tag in <StartUp> or do I have to merge these cmd files to only one?

I am new Azure development and writing powershell script.
I want to run two cmd files for azure start up tasks. I added these files into solutions and set properties as "copy always".After I add new note into ServiceDefinition.csdef Here it is :
<Startup>
<Task commandLine="Startup\startupcmd.cmd > c:\logs\startuptasks.log" executionContext="elevated" taskType="background">
<Environment>
<Variable name="EMULATED">
<RoleInstanceValue xpath="/RoleEnvironment/Deployment/#emulated" />
</Variable>
</Environment>
</Task>
<Task commandLine="Startup\disableTimeout.cmd" executionContext="elevated" />
</Startup>
It's not deploying and getting this error : Instance 0 of role Web is busy
Now In my question : Is it available multi task tag in <StartUp> or do I have to merge these cmd files to only one ?
As per definition:
The Startup element describes a collection of tasks that run when the
role is started.
So yes, the answer to your concrete question is: Yes, you can define multiple startup tasks.
State Busy is almost fine, in terms it is bit better than cycling! What I would suggest it to enable Remote Desktop and connect to see what is going on with the start up task. Busy is set until all simple tasks have completed and returned 0 exit code. Your task may fail or may hang for a while and that's why you would see busy.