How to push tables through socket - kdb

Considering the setup of kdb+ tick, how do the tables get pushed through the sockets?
In tick, it's possible to subscribe with a process (let's say) a to the tickerplant, which will then proceed to push the data of the subscribed 'tickers' to a as new data arrives.
I would like to do the same but I was wondering how. As far as I know, inter-process communication between q process is just the ability to transport commands from one process to the other, such that the commands will be executed on the other.
So how is it then possible to transport a complete table between processes?
I know the method which does this in tick is .u.pub and .u.sub, but it's not clear to me how the tables are transported between the processes.
So I have two questions:
How does kdb+ tick do this?
How can I push a table from one process to the other in general?

Let's understand the simple process of doing this:
We have one server 'S' and one client 'C'. When 'C' calls .u.sub function, that function code connects to 'S' using its host and port and call a specific function on 'S' (lets say 'request') with subscription parameters.
On getting this request, 'S request' function makes following entries to its subscribtion table which it maintains for subscription request.
-> Host and port of Client(incoming request)
-> Subscription params (for ex. clients send sym `VOD.L for subscription)
Now when 'S' gets any data update from feed, it goes thorugh it's subscription table and check the entries whose subscription param column value (sym in our case) matches with incoming data. Then it makes connection to each of them using their host and port from table and call their 'upd' function with new data.
Only thing is, client should have 'upd' function defined on their side.
This is a very basic process. KDB+ uses this with extra optimizations and features. For ex. more optimized structure for maintaining subscription table,log maintenance, replaying logs, unsubscription ,recovery logic, timer for publishing and lot more.
For more details, you can check definition of functions in 'u' namespace.

Related

asynchronous bulk data validations service - GET or POST?

Here is a different scenario for GET or POST confusion. I am working on a web application built with spring-boot microservice architecture where there is a need of validate and update some bulk data from excel sheet.
There can be 500-1000 records in excel sheet with 6 different columns for bulk processing. Once UI submits the excel sheet to server from then the total process is asynchronous. There are microservice to microservice calls which I am getting confused to have GET or POST.
Here is a problem: I have 4 microservices (let's say orchestra-service,A-service,B-service and C-service).
OrchestraService creates a DTO list from excel sheet which will be used in further calls. Orchestra calls 'A'. 'A' validates the data with DB and marks success and failure records in DTO list object and returns the list back to orchestra. Again orchestra calls 'B', it does the similar job like 'A' and returns back to orchestra.
Now orchestra calls 'C' which will update success records into database, updates the file status on database and also creates a new resultant excel sheet with error messages per row which will be emailed to the user later(small report kind of thing).
In above microservice to microservice calls only C is updating database and creating resource on server. All above calls I used POST method because I need the request body to pass my input list to all services.
According to HTTP Standards am I doing right?
https://www.rfc-editor.org/rfc/rfc7231#section-4.3.3
Providing a block of data, such as the fields entered into an HTML form, to a data-handling process it should be a POST call.
Please advice me whether:
I should use POST for only 'C' and GET for others or
It should be POST for all as other process involves in data filtering process.
NOTE: service A,B, and C not all services uses all the columns of excel but some of them in combinations. One column having 18 characters long data so I think it can be a problem with GET header limit for bulk operation.
Http Protocol
There is no actual violation on passing information on GET and if that request doesn't mutate between identical requests, then it's fine.
Microservice wise
Now for clarification, are Service A and Service B actually needed ?
Aren't they the same Domain as Service C, and can reside inside of him ?
It's more then good practice to have a Microservice validate its own domain and return a collection of success and failure with the relevant messages.
I had the similar question few years back and here is the possible solution for the first part of your question.
As mentioned by #Oreal Eraki in his answer, I would also question whether you need services A and B. If its just validation and data transformation it can be done in the same domain where the data is actually stored.

callback on table update in kdb

I want to implement a client server mechanism in kdb where clients can register them self to receive a callback when some table is updated.
I know how the callback work in kdb, I was not able to figure how to bind table updates in server to a function from where I can call 'callback' from client.
Basically you want to implement 'Publish-Subscribe' mechanism. KDB already has a script 'u.q' in tick library which provides that:
https://code.kx.com/q/cookbook/publish-subscribe/
On server, it maintains list of clients along with their handles, subscription tables and callback functions. You will have to change function on server which handles data insert/update to also publish the data.
q) .u.pub[table name; table data]
This will take care of calling each client's callback function which are registered for this table.
On client side, create the connection to publisher and call the library function for subscription.
q) .u.sub[tablename;list_of_symbols_to_subscribe_to]
You can also look into example publisher and subscriber code: https://github.com/KxSystems/cookbook/tree/master/pubsub

How to keep state consistent across distributed systems

When building distributed systems, it must be ensured the client and the server eventually ends up with consistent view of the data they are operating on, i.e they never get out of sync. Extra care is needed, because network can not be considered reliable. In other words, in the case of network failure, client never knows if the operation was successful, and may decide to retry the call.
Consider a microservice, which exposes simple CRUD API, and unbounded set of clients, maintained in-house by the same team, by different teams and by different companies also.
In the example, client request a creation of new entity, which the microservice successfully creates and persists, but the network fails and client connection times out. The client will most probably retry, unknowingly persisting the same entity second time. Here is one possible solution to this I came up with:
Use client-generated identifier to prevent duplicate post
This could mean the primary key as it is, the half of the client and server -generated composite key, or the token issued by the service. A service would either persist the entity, or reply with OK message in the case the entity with that identifier is already present.
But there is more to this: What if the client gives up after network failure (but entity got persisted), mutates it's internal view of the entity, and later decides to persist it in the service with the same id. At this point and generally, would it be reasonable for the service just silently:
Update the existing entity with the state that client posted
Or should the service answer with some more specific status code about what happened? The point is, developer of the service couldn't really influence the client design solutions.
So, what are some sensible practices to keep the state consistent across distributed systems and avoid most common pitfalls in the case of network and system failure?
There are some things that you can do to minimize the impact of the client-server out-of-sync situation.
The first measure that you can take is to let the client generate the entity IDs, for example by using GUIDs. This prevents the server to generate a new entity every time the client retries a CreateEntityCommand.
In addition, you can make the command handing idempotent. This means that if the server receives a second CreateEntityCommand, it just silently ignores it (i.e. it does not throw an exception). This depends on every use case; some commands cannot be made idempotent (like updateEntity).
Another thing that you can do is to de-duplicate commands. This means that every command that you send to a server must be tagged with an unique ID. This can also be a GUID. When the server receives a command with an ID that it already had processed then it ignores it and gives a positive response (i.e. 200), maybe including some meta-information about the fact that the command was already processed. The command de-duplication can be placed on top of the stack, as a separate layer, independent of the domain (i.e. in front of the Application layer).

Model parametrized API Call in Activity Diagram

I have an an activity diagram with two swimlanes (Client and Server). I want to model a request call from Client to Server.
Is it correct to use Signals Notation for Calls between systems? Are there alternatives?
The call is parametrized, Client wants to send something which was created before. How to model this?
Thankful for any hint! Here's my example:
My answer has to be improved, but here is a first step.
The norm/spec says: "A SendSignalAction is an InvocationAction that creates a Signal instance and transmits the instance to the object given on its target InputPin. A SendSignalAction must have argument InputPins corresponding, in order, to each of the (owned and inherited) Properties of the Signal being sent, with the same type, ordering and multiplicity as the corresponding attribute.
And a SendSignalAction has an association to a target objet which is an input pin.
So for your question about Request:item I would use input pin, one for the object from which the Signal is created and one to define the Target. (in the schema the target comes from an output pin but a data store may be use). Then after sending the request, the client is waiting the answer. The AcceptEvent is linked to a trigger (not shown on the schema) which a signal, the one created by the server. But you can not link SendRequest of Client to ReceiveRequest of Server because this is not how it runs.
For the server, you can do similar reasoning.
Concerning the parametrization of the call I would use InputPin to model the arguments of the Call i.e. the Object sent by the Call as shown below.
Signal and Call Notations are correct for me but I am not used to have the sending and receive action in the same diagram so will suggest two alternatives.
1) First remove them...
2) Separate Client and Server Modelling
Let me know what you think about that and what seems to be clear for you...
I also recognize the tool you used so please find my project at:
https://www.dropbox.com/s/s1mx46cb3linop0/Project1.zip?dl=0
As I see it, it should be modeled like this:
The server runs in an independent loop and starts with waiting for a request. There's a object flow between Create request and Query result set. This symbolizes data placed in a queue (or what ever is appropriate). The receipt of the result set would be done below in a similar way, I just left that out for brevity.
You can also draw an object for the query set
instead of the ActionPins.

OPC UA - Client - Milo - Best Practice - Subscription to Data Change

I started a OPC UA project using the Milo project to create a OPC UA Client. I am still very new to OPC UA. Right now I am stuck looking for the best practice to read values from several Nodes after a data change of one specific node.
The information model looks like this:
RfidSensorType
On my server i will have several objects of this RfidSensorType. The client creates a subscription on the CurrentAtTag Node to listen for data changes.
My Question:
When the value of CurrentAtTag is changed a callback function will be called in my client which contains the UaMonitoredItem and the DataValue of the CurrentAtTag.
In my application i need to process (at the same time) also the values of Station, IOLPort and CurrentValue which are changed at that moment too.
How can i access those values within the callback from CurrentAtTag?
My only solution is: Using a synchronous read request within that callback
-> Is that an legit approach?
My Research:
1) TriggeringService
I've seen that a TriggerigService exists, which monitors items will send reports only if one specific node changes it values.
Problem: This will call several callsbacks and noz just one..i need all the informations at the same time to further process them..
2) Event Monitoring
In event monitoring one can select "Event fields" which will be returned for each Event notificaiton. I am not sure if i could select the CurrentAtTag, Station, IOLPort and CurrentValue...
Just like you can subscribe to the server's ServerStatus (nodeid "i=2256"), you should be able to subscribe to the nodeid corresponding to 'RfidSensor_Station1'. The server will send PublishResponse with data of type 'RfidSensorType' encoded as an ExtensionObject. The trick is decoding the ExtensionObject.
As Kevin corrected, because 'RfidSensor_Station1' is not node class 'Variable' then it doesn't have a value attribute and you can not monitor the node for data changes. If you are using a PLC, I might combine all properties of the sensor into a string, or byte array. Then I monitor the new variable, and parse the string in the client.
Or you could make ReadRequest as you describe. That will work just fine.