R2DBC MSSQL r2dbc.mssql.client.ReactorNettyClient : Connection has been closed by peer - reactive-programming

I have started to work on a Spring WebFlux and R2DBC project. Mainly, my code works fine.
But after some elements I am receiving this warning
r2dbc.mssql.client.ReactorNettyClient : Connection has been closed by peer
after this warning I am getting this exception and normally program stops to read from Flux which source is R2DBC driver.
ReactorNettyClient$MssqlConnectionClosedException: Connection unexpectedly closed
My main pipeline like this;
Sinks.Empty<Void> completionSink = Sinks.empty();
Flux<Event> events = service.getPairs(
taskProperties.A,
taskProperties.B);
events
.flatMap(some operation)
.doOnComplete(() -> {
log.info("Finished Job");
completionSink.emitEmpty(Sinks.EmitFailureHandler.FAIL_FAST);
})
.subscribe();
completionSink.asMono().block();
After run, flatMap requesting 256 element as a default, then after fetching trying to request(1) for next signal.
At somewhere between 280. and 320. element it is getting above error. It is not idempotent, sometimes it reads 280 element sometimes it is reading 303, 315 etc.
I think it is about network maybe? But not sure and cannot find the reason. Do I need a pool or something different?
Sorry if I missed anything, in case you want I will try to update here.
Thank you in advance
I have tried to change request size of flatMap to unbounded, adding scheduler, default r2dbc pool but for now I don't have any clue.

Related

Firestore "Stream Removed" Error After 3 Minutes Doing Nothing

I'm working on .NET Windows Form App which uses Google Cloud Firestore as Database. I've created functions (using Google.Cloud.Firestore NuGet Package functions) to read/write database documents. Everything working greatly but if app doesn't use any of this read/write functions more than 2-3 minutes, i'm getting this error: Grpc.Core.RpcException: 'Status(StatusCode="Unknown", Detail="Stream removed" But if uses read/write functions every 1-2 minutes, i do not get this error in short period. I can create a thread function to keep my database connection active but it causes unnecessary reads or writes. How can i solve it?
To Reproduce Error
string Path = AppDomain.CurrentDomain.BaseDirectory + #"AdminSDKName.json";
Environment.SetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS", Path);
FirestoreDb DataBase = FirestoreDb.Create("DatabaseID");
Query QRef = DataBase.Collection("CollectionID").Document("DocID").Collection("CollectionID").WhereEqualTo("isTrue", false);
QuerySnapshot snap = await QRef.GetSnapshotAsync();
Console.WriteLine(snap.Count.ToString());
Console.WriteLine("Waiting for 5 minutes..");
Task.Delay(300000).Wait();
Console.WriteLine("Waited for 5 minutes");
snap = await QRef.GetSnapshotAsync();
Console.WriteLine(snap.Count.ToString());
Console.WriteLine("Done without any error.");
I get error after "Waited for 5 minutes" line.
Update
If i connect my computer network to mobile phone network i do not get any error.
I have a feeling you're seeing a part of Firestore's connection management here. If that's indeed what we're seeing, the connection should be reestablished when needed and there's nothing you can change about this through the API.
A solution would be to find out what is closing the stream prematurely. The configuration of Grpc.Core to send a keepalive packet once a minute can be found in the Github.
I suggest you try the code on a few different networks if you can, keeping everything else the same, to see if you can identify what is closing the connection.
Please refer to the User guide.
If this does not resolve, update your question with minimal reproducible code.

How to debug further this dropped record in apache beam?

I am seeing intermittent dropped records(only for error messages though not for success ones). We have a test case that intermittenly fails/passes because of a lost record. We are using "org.apache.beam.sdk.testing.TestPipeline.java" in the test case. This is the relevant setup code where I have tracked the dropped record too ....
PCollectionTuple processed = records
.apply("Process RosterRecord", ParDo.of(new ProcessRosterRecordFn(factory))
.withOutputTags(TupleTags.OUTPUT_INTEGER, TupleTagList.of(TupleTags.FAILURE))
);
errors = errors.and(processed.get(TupleTags.FAILURE));
PCollection<OrderlyBeamDto<Integer>> validCounts = processed.get(TupleTags.OUTPUT_INTEGER);
PCollection<OrderlyBeamDto<Integer>> errorCounts = errors
.apply("Flatten Roster File Error Count", Flatten.pCollections())
.apply("Publish Errors", ParDo.of(new ErrorPublisherFn(factory)));
The relevant code in ProcessRosterRecordFn.java is this
if(dto.hasValidationErrors()) {
RosterIngestError error = new RosterIngestError(record.getRowNumber(), record.toTitleValue());
error.getValidationErrors().addAll(dto.getValidationErrors());
error.getOldValidationErrors().addAll(dto.getOldValidationErrors());
log.info("Tagging record row number="+record.getRowNumber());
c.output(TupleTags.FAILURE, new OrderlyBeamDto<>(error));
return;
}
I see this log for the lost record of Tagging record row for 2 rows that fail. After that however, inside the first line of ErrorPublisherFn.java, we log immediately after receiving each message. We only receive 1 of the 2 rows SOMETIMES. When we receive both, the test passes. The test is very flaky in this regard.
Apache Beam is really annoying in it's naming of threads(they are all the same name), so I added a logback thread hashcode to get more insight and I don't see any and the ErrorPublisherFn could publish #4 on any thread anyways.
Ok, so now the big question: How to insert more things to figure out why this is being dropped INTERMITTENTLY?
Do I have to debug apache beam itself? Can I insert other functions or make changes to figure out why this error is 'sometimes' lost on some test runs and not others?
EDIT: Thankfully, this set of tests are not testing errors upstream and this line "errors = errors.and(processed.get(TupleTags.FAILURE));" can be removed which forces me to remove ".apply("Flatten Roster File Error Count", Flatten.pCollections())" and in removing those 2 lines, the issue goes away for 10 test runs in a row(ie. can't completely say it is gone with this flaky stuff going on). Are we doing something wrong in the join and flattening? I checked the Error structure and rowNumber is a part of equals and hashCode so there should be no duplicates and I am not sure why it would be intermittently failure if there are duplicate objects either.
What more can be done to debug here and figure out why this join is not working in the TestPipeline?
How to get insight into the flatten and join so I can debug why we are losing an event and why it is only 'sometimes' we lose the event?
Is this a windowing issue? even though our job started with a file to read in and we want to process that file. We wanted a constant dataflow stream available as google kept running into limits but perhaps this was the wrong decision?

UndeliverableException thrown within a RxAndroidBle stream

I have a misbehaving BLE device (temp sensor) that keeps throwing a status 8 (GATT_INSUF_AUTHORIZATION or GATT_CONN_TIMEOUT) exception everytime i try to connect to the device. I'm not concerned about this exception as the device is faulty.
However, I keep getting notified that i've not handled the error correctly by rxjava2 when using RxAndroidBle(1.9.1); see here;
This is my code.
rxBleClient
.getBleDevice(macAddress)
.establishConnection(false)
.flatMapSingle { it.readRssi() }
.subscribe({ "test1:Success" }, { "test1:error" })
and the Error
I/RxBle#GattCallback: MAC='E9:CF:8A:D0:01:19' onConnectionStateChange(), status=8, value=0
D/RxBle#ClientOperationQueue: FINISHED ConnectOperation(147547253) in 10257 ms
D/RxBle#ConnectionOperationQueue: Connection operations queue to be terminated (MAC='E9:CF:8A:D0:01:19')
com.polidea.rxandroidble2.exceptions.BleDisconnectedException: Disconnected from MAC='E9:CF:8A:D0:01:19' with status 8 (GATT_INSUF_AUTHORIZATION or GATT_CONN_TIMEOUT)
at com.polidea.rxandroidble2.internal.connection.RxBleGattCallback$2.onConnectionStateChange(RxBleGattCallback.java:77)
at android.bluetooth.BluetoothGatt$1$4.run(BluetoothGatt.java:249)
at android.bluetooth.BluetoothGatt.runOrQueueCallback(BluetoothGatt.java:725)
at android.bluetooth.BluetoothGatt.-wrap0(Unknown Source:0)
at android.bluetooth.BluetoothGatt$1.onClientConnectionState(BluetoothGatt.java:244)
at android.bluetooth.IBluetoothGattCallback$Stub.onTransact(IBluetoothGattCallback.java:70)
at android.os.Binder.execTransact(Binder.java:697)
D/BleDeviceManagerNew$observeRssiTest: test1:error
E/plication$setupApp: Terminal Exception From RXJAVA was Not handled correctly
io.reactivex.exceptions.UndeliverableException: The exception could not be delivered to the consumer because it has already canceled/disposed the flow or the exception has nowhere to go to begin with. Further reading: https://github.com/ReactiveX/RxJava/wiki/What's-different-in-2.0#error-handling | com.polidea.rxandroidble2.exceptions.BleDisconnectedException: Disconnected from MAC='E9:CF:8A:D0:01:19' with status 8 (GATT_INSUF_AUTHORIZATION or GATT_CONN_TIMEOUT)
at io.reactivex.plugins.RxJavaPlugins.onError(RxJavaPlugins.java:367)
at io.reactivex.internal.operators.observable.ObservableUnsubscribeOn$UnsubscribeObserver.onError(ObservableUnsubscribeOn.java:67)
at io.reactivex.internal.operators.observable.ObservableSubscribeOn$SubscribeOnObserver.onError(ObservableSubscribeOn.java:63)
I'm not sure what else I should do - i've implemented a 'catch all' solution but don't like this approach;
RxJavaPlugins.setErrorHandler { e -> Timber.e(e, "Terminal Exception From RXJAVA was Not handled correctly") }
but don't see that as a good solution as expected that i should be-able to handle exception on the steam. Any suggestions of where I went wrong?
Your code is fine. The library has a flaw that does not allow to achieve your desired behaviour. More on the topic is on this library's wiki page.
While it is possible to design an API that would not throw UndeliverableException it would need to have a separate error Observable or Completable for BluetoothAdapter turning off and a separate one for RxBleConnection disconnect. The user would be responsible to mix those into their chain appropriately.
Current API does not allow it.

Siemens S7-1200. TRCV_С. Error code: 893A; Event ID 02:253A

Please, help to solve the problem with communication establishment between PC and 1211C (6ES7-211-1BD30-0XB0 Firmware: V 2.0.2). I feel that I've made a stupid mistake somewhere, but can't figure out where exactly it is.
So, I'm using function TRCV_С...
The configuration seems to be okay:
When i set the CONT=1, the connection establishes without any problems...
But, when i set EN_R=1, I'm getting "error 893A".
That's what I have in my diagnostic buffer: (DB9 - is a block where the received data is supposed to be written)
There is an explanation given for "893A" in the manuals: Parameter contains the number of a DB that is not loaded. In diag. buffer its also written that DB9 is not loaded. But in my case it is loaded! So what should I do in this case?
it seems that DB were created or edited manually due to which they are miss aligned with FB instances try removing and DB and FB instances and then add again instances of FBs with automatically created DBs and do a offline dowonload

Debug missing messages in akka

I have the following architecture at the moment:
Load(Play app with basic interface for load tests) -> Gateway(Spray application with REST interface for incoming messages) -> Processor(akka app that works with MongoDB) -> MongoDB
Everything works fine as long as number of messages I am pushing through is low. However when I try to push 10000 events, that will eventully end up at MongoDB as documents, it stops at random places, for example on 742 message or 982 message and does nothing after.
What would be the best way to debug such situations? On the load side I am just pushing hard into the REST service:
for (i ← 0 until users) workerRouter ! Load(server, i)
and then in the workerRouter
WS.url(server + "/track/message").post(Json.toJson(newUser)).map { response =>
println(response.body)
true
}
On the spray side:
pathPrefix("track") {
path("message") {
post {
entity(as[TrackObj]) { msg =>
processors ! msg
complete("")
}
}
}
}
On the processor side it's just basically an insert into a collection. Any suggestions on where to start from?
Update:
I tried to move the logic of creating messages to the Gatewat, did a cycle of 1 to 10000 and it works just fine. However if spray and play are involed in a pipeline it interrupts and random places. Any suggestions on how to debug in this case?
In a distributed and parallel environment it is next to impossible to create a system that work reliably. Whatever debugging method you use it will only allow you to find a few bugs that will happen during the debug session.
Once our team spent 3 months(!) while tuning an application for a robust 24/7 working. And still there were bugs. Then we applied a method of Model checking (Spin). Within a couple of weeks we implemented a model that allowed us to get a robust application. However, model checking requires a bit different way of thinking and it can be difficult to start.
I moved the load test app to Spray framework and now it works like a charm. So I suppose the problem was somewhere in the way that I used WS API in Play framework:
WS.url(server + "/track/message").post(Json.toJson(newUser)).map { response =>
println(response.body)
true
}
The problem is resovled but not solved, won't work on a solution based on Play.