Construct own message - apache-kafka

I have user table, which is obviously does not outbox table. Table consist uuid and email field. I need to construct message and publish into Kafka based on this two field with next format:
{
"uuid": "11ebe013-ffd2-4523-b6b4-6318cbe7e6d1",
"email": "jaron24#example.net",
"staticObject": {
"staticProperty1": "WEB",
"staticProperty2": "random.com"
},
"staticProperty": "CREATED"
}
I don't want to create new field in table, where I can put constructed message(parameter transforms.outbox.table.field.event.payload - not suitable for this case). I want that kafka-connect construct it(with schema above) for me.
Is there any possibility to achieve this via configuration?

Related

Translating a GUID to a text value, from an API response in a Power Automate Flow

I'm using MS Automate to solve an integration challenge between two systems we use in our Project Management lifecycle. I have a custom connector written by the vendor of System A which allows me to create a Flow in MS Automate which is triggered when a record is Created or Updated.
So far, so good. However, the method in the connector provided by System A returns the new or updated record containing a number of fields which contain value GUIDs as the fields are 'choice' type fields e.g. Department, Status etc. What I end up with is a record where Status = "XXXXXX-000000-00000-00000" etc. The vendor also provides a restful API endpoint which I can query, which returns a JSON collection of fields, which include a 'Choices' section for each field of this type which is a standard JSON which looks like:
{
"Id": "156e6c29-24b3-4413-af91-80a62a04d443",
"Order": 110,
"InternalName": "PrjStatus",
"DisplayName": "Status",
"ColumnType": 5,
"ColumnAggregate": 0,
"Choices": {
"69659014-be4d-eb11-bf94-00155df8457c": "(0) Not Set",
"c30c50e5-51cf-ea11-bfd3-00155de84703": "(1) On Track",
"c40c50e5-51cf-ea11-bfd3-00155de84703": "(2) At Risk",
"c50c50e5-51cf-ea11-bfd3-00155de84703": "(3) Off Track",
"6a659014-be4d-eb11-bf94-00155df8457c": "(4) Not Tracked"
},
Technical problem:
What I have is the GUID of the choice (not the field). I need to take the GUID, in this case "6a659014-be4d-eb11-bf94-00155df8457c" and translate it into "(4) Not Tracked" and store this in a variable to write to a SharePoint list. I need to do this for about 30 fields which are similar in the record.
I've created the flow and the connector has given me the record with a list of fields, some of which contain value GUIDs. I know which fields these are and I have the Display Names of these fields.
I have added a HTTP call to the provided API endpoint (lets call it GetFields), which has returned a 200 response, the body of the response containing a JSON collection of the 50 or so fields in System A.
I can't work out how to parse the body of the response for the GUID I have for each field value and ensure I have the right corresponding text value, so I can then write it to a field variable, and then create a SharePoint record, all wrapped up in an MS Automate flow.
I hope I've understood you correctly but from what I can work out, you want to dynamically select the value of the choice from the GUID you've been provided (by whatever means).
I created a small flow to prove the concept. Firstly, these two steps setup the scenario, the first being the GUID you want to extract the choice value for and the second being the JSON object itself ...
The third step will take the value from the first variable and use it dynamically in an expression to extract that key from the JSON and return the value.
This is the expression ...
variables('JSON')?['Choices'][variables('Choice ID')]
You an see I'm just using the variable in the path expression over the top of the JSON object to extract the key I want.
This is the end result ...

MarkLogic Data Hub document metadata added by steps

According to the documentation:
Document metadata added by steps
For every content object outputted by a Data Hub step, regardless of the step type, Data Hub will add the following document metadata keys and values to the document wrapped by the content object:
datahubCreatedOn = the date and time at which the document is written
datahubCreatedBy = the MarkLogic user used to run the step
datahubCreatedInFlow = the name of the flow containing the step being run
datahubCreatedByStep = the name of the step being run
datahubCreatedByJob = the ID of the job being run; this will contain the job ID of every flow run on the step, with multiple values being space-delimited
Is there any possibility to add some extra metadata keys and values to the document?
It is possible to add additional static values in your headers options or use one of these keywords to dynamically add values.
{
"headers": {
"sources": [{
"name": "loadCustomersJSON"
}],
"createdOn": "datahubCreatedOn",
"createdBy": "datahubCreatedBy"
}
}
You can also dynamically add values by using an interceptor
(See: https://docs.marklogic.com/datahub/5.6/flows/about-interceptors-custom-hooks.html) or updating the header value in a custom step if you are already using one (See:https://docs.marklogic.com/datahub/5.6/modules/editing-custom-step-module.html

Update multiple records for PUT in Rest API

I have an API to update records in my Database. I have the endpoint as below
PUT /student/name/{name}/roll/{rollNumber}/reg-number/{regNumber}
and body as
{
"studentList" : [
{
"fathersName": "string",
"baseSubject": "string",
"age": "string",
"preferredLanguage": "string"
}
]
}
Each record is uniquely identified by the name, rollNumber and regNumber and I want to update multiple records at once (I have a list in my request body).
What is the best way to achieve this? Should I pass arrays of name, rollNumber and regNumber as path param and corresponding records in sequential order in the body inside studentList
or
I should have all fields in the body itself and nothing in the path params?
I am following OpenAPI specs.
Update: Adding some more details to my question
I have a table with Primary key as a combination of name, rollNumber and regNumber
I want to update multiple rows in a single PUT call. The fields which can be updated are passed in my request body and the fields which are used to identify the row that has to be updated are being passed in the URI as path Params.
I would like to know the correct approach and rest specification to achieve this. The two options which I have in mind have been mentioned in my question.

Two different approaches to structure my NoSQL database < What to choose?

I currently get to work with DynamoDB and I have a question regarding the structure I should choose.
I setup Twilio for being able to receive WhatsApp messages from guests in a restaurant. Guests can send their feedback directly to my Twilio WhatsApp number. I receive that feedback via webhook and save it in DynamoDB. The restaurant manager gets a Dashboard (React application) where he can see monitor the feedback. While I start with one restaurant / one WhatsApp number I will add more users / restaurants over time.
Now I have one of the following two structures in mind. With the first idea, I would always create a new item when a new message from a guest is sent to the restaurant.
With the second idea, I would (most of the time) update an existing entry. Only if the receiver / the restaurant doesn't exist yet, a new item is created. Every other message to that restaurant will just update the existing item.
Do you have any advice on what's the best way forward?
First idea:
PK (primary key), Created (Epoc time), Receiver/Restaurant (phone number), Sender/Guest (phone number), Body (String)
Sample data:
1, 1574290885, 4917123525993, 4916034325342, "Example Message 1" # Restaurant McDonalds (4917123525993)
2, 1574291036, 4917123525993, 4917542358273, "Example Message 2" # different sender (4917542358273)
3, 1574291044, 4917123525993, 4916034325342, "Example Message 3" # same sender as pk 1 (4916034325342)
4, 1574291044, 4913423525123, 4916034325342, "Example Message 4" # Restaurant Burger King (4913423525123)
Second idea:
{
Receiver (primary key),
Messages: {
{
id,
Created,
From,
Body
}
}
}
Sample data (same data as for first idea, but different structured):
{
Receiver: 4917123525993,
Messages: {
{
Created: 1574290885,
Sender: 4916034325342,
Body: "Example Message 1"
},
{
Created: 1574291036,
Sender: 4917542358273,
Body: "Example Message 2"
},
{
Created: 1574291044,
Sender: 4916034325342,
Body: "Example Message 3"
}
}
}
{
Receiver: 4913423525123,
Messages: {
{
Created: 1574291044,
Sender: 4916034325342,
Body: "Example Message 4"
}
}
}
If I read this correctly, in both approaches, the proposal is to save all messages received by a restaurant as a nested list (the Messages property looks like an object in the samples you've shared, but I assume it is an array since that would make more sense).
One potential problem that I foresee with this is that DynamoDB documents have a limitation on how big they can get (400kb). Agreed this seems like a pretty large number, but you're bound to reach that limit pretty quickly if you use this application for something like a food order delivery system.
Another potential issue is that querying on nested objects is not possible in DynamoDB and the proposed structure would mostly involve table scans for any filtering, greatly increasing operational costs.
Unlike with relational DBs, the structure of your data in document DBs is dependent heavily on the questions you want to answer most frequently. In fact, you should avoid designing your NoSQL schema unless you know what questions you want to answer, your access patterns, and your data volumes.
To come up with a data model, I will assume you want to answer the following questions with your table :
Get all messages received by a restaurant, ordered by timestamp (ascending / descending can be determined in the query by specifying ScanIndexForward = true/false
Get all messages sent by a user ordered by timestamp
Get all messages sent by a user to a restaurant, ordered by timestamp
Consider the following record structure :
{
pk : <restaurant id>, // Partition key of the main table
sk : "<user id>:<timestamp>", // Synthetic (generated) range key of the main table
messageBody : <message content>,
timestamp: <timestamp> // Local secondary index (LSI) on this field
}
You insert a new record of this structure for each new message that comes into your system. This structure allows you to :
Efficiently query all messages received by a restaurant ID using only the partition key
Efficiently retrieve all messages received by a restaurant and sent by a user using pk = <restaurant id> and begins_with(sk, <user id>)
The LSI on timestamp allows for efficiently filtering messages based on creation time.
However, this by itself does not allow you to query all messages sent by a user (to any restaurant, or a specific restaurant). To do that we can create a global secondary index (GSI), using the table's sk property (containing user IDs) as the GSI's primary key, and a synthetic range key that consists of the restaurant ID and timestamp separated by a ':'.
GSI structure
{
gsi_pk: <user Id>,
gsi_sk: "<dealer Id>:<timestamp>",
messageBody : <message content>
}
messageBody is a non key field projected on to the GSI
The synthetic SK of the GSI helps make use of the different key matching modes that DynamoDB provides (less than, greater than, starts with, between).
This GSI allows us to answer the following questions:
Get all messages by a user (using only gsi_pk)
Get all messages by a user, sent to a particular restaurant (ordered by timestamp) (gsi_pk = <user Id> and begins_with(gsi_sk, <restaurant Id>)
The system has a some duplication of data, but that is in line with one of the core ideas of DynamoDB, and most NoSQL databases. I hope this helps!
Storing multiple message in a single record has multiple issues
Size of write to db will increase as we go. (which will translate to money and response time, worst case you may end up hitting 400kb limit.)
Race condition between multiple writes.
No way to aggregate messages by user and other patterns.
And the worse part is that, I don't see any benefit of storing multiple messages together. (Other than may be I can query all of them together, which will becomes a con as size grows, like you will not be able to do get me last 10 reviews, you will always have to fetch all and then fetch last 10.)
Hence go for option where all the messages are stored differently.

Neo4J REST Unique Nodes

My question is two parts:
First, when trying to create a unique node using the REST Interface like below...
http://localhost:7474/db/data/index/node/people?uniqueness=create_or_fail
What is the meaning of the "person" portion of the URL. I'm under the impression that it is a label but I'm not sure.
Second, if it is indeed a label, when I execute the following REST call...
http://localhost:7474/db/data/index/node/Test?uniqueness=create_or_fail
with this payload...
{
key: "name",
value: "test",
properties:
{
"lastName": "test",
"name": "test",
"type": "test",
"firstName": "test"
}
}
A node is created but does not have an associated label. It creates a label-less node that does still enforce uniqueness. How do I create a unique node using the REST API with a label?
I'm using neo4j 2.0.
You are correct. When you send in JSON, it will create the node, or fail if it already exists using the index label 'people'
When sending, you need to have in your object a "key" and "value" which denotes how to do the index matching.
How are you determining that the node has no label? In the REST documentation, I can see that the labels is a different URL call for a node, have you checked there?