OrientDB ETL: how to skip a duplicate vertex but create the edge - orientdb

I am creating a communication graph.
Each message has a msgid and each person has a userid.
I have already created the message vertices, now i want to create the user vertices and an edge connecting a message vertex to the user vertex.
A user can get multiple messages (obviously).
My file contains:
msgid, userid, (and some other info i will assign to the edge)
The isssue that i am having is that in my file i have duplicate userids (because users can get multiple messages), i dont want to create another vertex with the user id so i skipDuplicates. But if i do skip duplicates the edge will not get created either. I do want multiple edges to the same user vertex as each edge represents one message.
How do i keep the User vertex unique but create the edge?
My current ETL .json file that works fine with the exception of what i have detailed above.
{
"source": { "file": { "path": "msgs.txt" } },
"extractor": { "row": {} },
"transformers": [
{ "csv": {"separator": "\t"} },
{ "vertex": { "class": "user", "skipDuplicates": true } },
{ "edge": { "class": "sent_to", "joinFieldName": "msgid", "lookup":"message.id","direction": "in" } },
"edgeFields": { "n": "${input.n}" }
],
"loader": {
"orientdb": {
"dbURL": "remote:/localhost/databases/communication",
"dbType": "graph",
"classes": [
{"name": "user", "extends": "V"},
{"name": "message", "extends": "V"},
{"name": "sent_to", "extends": "E"}
], "indexes": [
{"class":"user", "fields":["id"], "type":"UNIQUE" }
]
}
}
}

Okay, here is what i did and it seemed to work.
First i created the message vertices (as stated above, in the q.).
Then i created the user vertices.
Then to create the edge in between them i ran the following ETL on a file that had {userid, msgid, ...}
{
"source": { "file": { "path": "msgs1.txt" } },
"extractor": { "row": {} },
"transformers": [
{ "csv": {"separator": "\t"} },
{ "merge": {"joinFieldName": "userid", "lookup": "user.id"} },
{ "vertex": { "class": "user", "skipDuplicates": true } },
{ "edge": { "class": "sent_to",
"joinFieldName": "msgid",
"lookup":"message.id",
"direction": "in",
"edgeFields": { "n": "${input.n}", "date": "${input.date}"}
}
}
],
"loader": {
"orientdb": {
"dbURL": "remote:/localhost/databases/communication",
"dbType": "graph",
"classes": [
{"name": "user", "extends": "V"},
{"name": "message", "extends": "V"},
{"name": "sent_to", "extends": "E"}
],
"indexes": [
]
}
}
}
This created all the edges, even if there was more than one edge pointing to a user.
Hopefully this will help someone

Related

Update Azure FrontdoorPremium Web Application Firewall Policy by API

I'm trying to update an Frontdoor WAF policy by API following the article in the link below but I'm running into several issues.
-Article seems to be focused on Frontdoor Classic, not premium, so the json in the article doesn't work.
-Adding an empty tags value solves the tags error.
https://learn.microsoft.com/en-us/rest/api/frontdoorservice/webapplicationfirewall/policies/create-or-update?tabs=HTTP#skuname
Can't get anywhere with MS Support, hoping anyone here has experience with this.
HTTP Respons:
{
"errors": {
"sku": [
"Could not find member 'sku' on object of type 'WebApplicationFirewallPatchRequestModel'. Path 'sku', line 1, position 7070."
],
"tags": [
"Required property 'tags' not found in JSON. Path '', line 1, position 7104."
],
"location": [
"Could not find member 'location' on object of type 'WebApplicationFirewallPatchRequestModel'. Path 'location', line 1, position 12."
],
"properties": [
"Could not find member 'properties' on object of type 'WebApplicationFirewallPatchRequestModel'. Path 'properties', line 1, position 35."
]
},
"type": "https://tools.ietf.org/html/rfc7231#section-6.5.1",
"title": "One or more validation errors occurred.",
"status": 400,
"traceId": "00-1006d4208c3a8e569d9ec0ff3513ca31-cc06e3e858308547-01"
}
Json post (shortend):
{
"location": "global",
"properties": {
"customRules": {
"rules": [
{
"name": "AllowCDN",
"enabledState": "Enabled",
"priority": 110,
"ruleType": "MatchRule",
"rateLimitDurationInMinutes": 1,
"rateLimitThreshold": 100,
"matchConditions": [
{
"matchVariable": "RequestUri",
"selector": null,
"operator": "Contains",
"negateCondition": false,
"matchValue": [
"snip.azureedge.net",
"snip.azureedge.net"
],
"transforms": [
"Lowercase"
]
}
],
"action": "Allow"
}
]
},
"managedRules": {
"managedRuleSets": [
{
"ruleSetType": "Microsoft_DefaultRuleSet",
"ruleSetVersion": "2.1",
"ruleSetAction": "Block",
"ruleGroupOverrides": [],
"exclusions": []
},
{
"ruleSetType": "Microsoft_BotManagerRuleSet",
"ruleSetVersion": "1.0",
"ruleSetAction": null,
"ruleGroupOverrides": [
{
"ruleGroupName": "GoodBots",
"rules": [
{
"ruleId": "Bot200200",
"enabledState": "Enabled",
"action": "Block",
"exclusions": []
}
],
"exclusions": []
},
{
"ruleGroupName": "UnknownBots",
"rules": [
{
"ruleId": "Bot300200",
"enabledState": "Enabled",
"action": "Block",
"exclusions": []
},
{
"ruleId": "Bot300600",
"enabledState": "Enabled",
"action": "Block",
"exclusions": []
},
{
"ruleId": "Bot300700",
"enabledState": "Enabled",
"action": "Log",
"exclusions": []
},
{
"ruleId": "Bot300400",
"enabledState": "Enabled",
"action": "Log",
"exclusions": []
},
{
"ruleId": "Bot300300",
"enabledState": "Enabled",
"action": "Block",
"exclusions": []
}
],
"exclusions": []
}
],
"exclusions": []
}
]
},
"policySettings": {
"enabledState": "Enabled",
"mode": "Prevention",
"redirectUrl": null,
"customBlockResponseStatusCode": null,
"customBlockResponseBody": null,
"requestBodyCheck": "Enabled"
}
},
"sku": {
"name": "Premium_AzureFrontDoor"
}
}
Updating an existing Frontdoor Premium WAF policy doesn't work.
I was able to execute the Update REST API above though for my Azure Front Door Standard. The process I followed was to make the GET REST API Call first and then copy the response body and then make the updates required in the JSON and use this JSON as a request Body in the Update REST API. The reference JSON below worked for me.
{
"id": "/subscriptions/xxxxx/resourcegroups/xxxxx/providers/Microsoft.Network/frontdoorwebapplicationfirewallpolicies/xxxxx",
"type": "Microsoft.Network/frontdoorwebapplicationfirewallpolicies",
"name": "xxxxx",
"location": "Global",
"tags": {
"Reason": "Repro",
"CreatedDate": "12/29/2022 2:40:29 AM",
"CreatedBy": "xxxxx",
"OwningTeam": "xxxxx"
},
"sku": {
"name": "Standard_AzureFrontDoor"
},
"properties": {
"policySettings": {
"enabledState": "Enabled",
"mode": "Detection",
"redirectUrl": null,
"customBlockResponseStatusCode": 403,
"customBlockResponseBody": null,
"requestBodyCheck": "Disabled"
},
"customRules": {
"rules": [
{
"name": "testcustomrule",
"enabledState": "Enabled",
"priority": 100,
"ruleType": "MatchRule",
"rateLimitDurationInMinutes": 1,
"rateLimitThreshold": 100,
"matchConditions": [
{
"matchVariable": "SocketAddr",
"selector": null,
"operator": "GeoMatch",
"negateCondition": false,
"matchValue": [
"UY"
],
"transforms": []
}
],
"action": "Block"
},
{
"name": "testcustomrule2",
"enabledState": "Enabled",
"priority": 101,
"ruleType": "MatchRule",
"rateLimitDurationInMinutes": 1,
"rateLimitThreshold": 100,
"matchConditions": [
{
"matchVariable": "SocketAddr",
"selector": null,
"operator": "GeoMatch",
"negateCondition": false,
"matchValue": [
"AU"
],
"transforms": []
}
],
"action": "Block"
}
]
},
"managedRules": {
"managedRuleSets": []
},
"frontendEndpointLinks": [],
"securityPolicyLinks": [
{
"id": "/subscriptions/xxxxx/resourcegroups/xxxxx/providers/Microsoft.Cdn/profiles/xxxxx/securitypolicies/xxxxx"
}
],
"routingRuleLinks": [],
"resourceState": "Enabled",
"provisioningState": "Succeeded"
}
}

orientdb oetl error / I want to have pokec data

To configure the current pokec db ./oetl.sh I'm trying. However, the pipeline exit keeps occurring. I don't know what the problem is. Help me.The dburl is also a code that was well written but is now erased. I've tried all the settings, but they can't. If you do this, the data will be good, but it will not shut down automatically.
{
"config": {
"parallel": true
},
"source": {
"file": {
"path": "/home/yuna/soc-pokec-profiles.txt",
"lock" : true ,
"encoding" : "UTF-8"
}
},
"extractor": { "row": {} },
"transformers": [
{ "csv": {"columns":["id","public","completion_percentage","gender","region","region2","last_login","registration","AGE","body","I_am_working_in_field","spoken_languages","hobbies","I_most_enjoy_good_food","pets","body_type","my_eyesight","eye_color","hair_color","hair_type","completed_level_of_education","favourite_color","relation_to_smoking","relation_to_alcohol","on_pokec_i_am_looking_for","love_is_for_me","relation_to_casual_sex","my_partner_should_be","marital_status","children","relation_to_children","I_like_movies","I_like_watching_movie","I_like_music","I_mostly_like_listening_to_music","the_idea_of_good_evening","I_like_specialties_from_kitchen","fun","I_am_going_to_concerts","my_active_sports","my_passive_sports","profession","I_like_books","life_style","music","cars","politics","relationships","art_culture","hobbies_interests","science_technologies","computers_internet","education","sport","movies","travelling","health","companies_brands","more"],"separator": "/t","nullValue": "NULL"} },
{ "vertex": { "class": "Profile"} },
{"field":
{ "fieldName" : "id",
"expression" : "id.prefix('P')"
}
},
{"field":
{ "fieldNames" :
["region2","body","I_am_working_in_field","spoken_languages","hobbies","I_most_enjoy_good_food","pets","body_type","my_eyesight","eye_color","hair_color","hair_type","completed_level_of_education","favourite_color","relation_to_smoking","relation_to_alcohol","on_pokec_i_am_looking_for","love_is_for_me","relation_to_casual_sex","my_partner_should_be","marital_status","children","relation_to_children","I_like_movies","I_like_watching_movie","I_like_music","I_mostly_like_listening_to_music","the_idea_of_good_evening","I_like_specialties_from_kitchen","fun","I_am_going_to_concerts","my_active_sports","my_passive_sports","profession","I_like_books","life_style","music","cars","politics","relationships","art_culture","hobbies_interests","science_technologies","computers_internet","education","sport","movies","travelling","health","companies_brands","more"],
"operation": "remove"
}
}
],
"loader": {
"orientdb": {
"dbURL": "",
"dbType": "graph",
"wal": false,
"batchCommit": 10000,
"useLightweightEdges" : true,
"dbAutoCreateProperties": true,
"classes": [
{"name": "Profile", "extends": "V", "clusters": 3},
{"name": "Relation", "extends": "E"}
], "indexes": [
{"class":"Profile", "fields":["id:string"], "type":"UNIQUE" }
],
"settings": {
}
}
}

Loopback 3 get relation from embedded model

I'm using loopback 3 to build a backend with mongoDB.
So i have 3 models: Object, Attachment and AwsS3.
Object has a relation Embeds2Many to Attachment.
Attachment has a relation Many2One to AwsS3.
Objects look like that in mongoDB
[
{
"fieldA": "valueA1",
"attachments": [
{
"id": 1,
"awsS3Id": "1234"
},
{
"id": 2,
"awsS3Id": "1235"
}
]
},
{
"fieldA": "valueA2",
"attachments": [
{
"id": 4,
"awsS3Id": "1236"
},
{
"id": 5,
"awsS3Id": "1237"
}
]
}
]
AwsS3 looks like that in mongoDB
[
{
"id": "1",
"url": "abc.com/1"
},
{
"id": "2",
"url": "abc.com/2"
}
]
The question is: how can i get Objects included Attachment and AwsS3.url over the RestAPI?
I have try with the include and scope filter. But it didn't work. It look like, that this function is not implemented in loopback3, right? Here is what i tried over the GET request:
{
"filter": {
"include": {
"relation": "Attachment",
"scope": {
"include": {
"relation": "awsS3",
}
}
}
}
}
With this request i only got the Objects with Attachments without anything from AwsS3.
UPDATE for the relation definitons
The relation from Object to Attachment:
"Attachment": {
"type": "embedsMany",
"model": "Attachment",
"property": "attachments",
"options": {
"validate": true,
"forceId": false
}
},
The relation from Attachment to AwsS3
in attachment.json
"relations": {
"awsS3": {
"type": "belongsTo",
"model": "AwsS3",
"foreignKey": ""
}
}
in AwsS3.json
"relations": {
"attachments": {
"type": "hasMany",
"model": "Attachment",
"foreignKey": ""
}
}
Try this filter:
{ "filter": { "include": ["awsS3", "attachments"]}}}}

OrientDB ETL loading CSV with vertices in one file and edges in another

I have some data that is in 2 CSV files, one contains the vertices and the other file contains the edges are in the other file. I'm working out how to set this up using ETL and am close but not quite there yet--it mostly works but my edges have properties and I'm not sure that they're loading right. This question was helpful but I'm still missing something...
Here's my data:
vertices.csv:
label,data,date
v01,0.1234,2015-01-01
v02,0.5678,2015-01-02
v03,0.9012,2015-01-03
edges.csv:
u,v,weight,date
v01,v02,12.4,2015-06-17
v02,v03,17.9,2015-09-14
I import my vertices using this:
commonVertices.json:
{
"begin": [
{ "let": { "name": "$filePath",
"expression": "$fileDirectory.append($fileName)"
}
},
],
"config": { "log": "info"},
"source": { "file": { "path": "$filePath" } },
"extractor": { "csv": { "ignoreEmptyLines": true,
"nullValue": "N/A",
"dateFormat": "yyyy-mm-dd"
}
},
"transformers": [
{ "vertex": { "class": "myVertex" } },
{ "code": { "language": "Javascript",
"code": "print(' Current record: ' + record); record;" }
}
],
"loader": { "orientdb": {
"dbURL": "plocal:my_orientdb",
"dbType": "graph",
"batchCommit": 1000,
"classes": [ { "name": "myVertex", "extends", "V" },
],
"indexes": []
}
}
}
vertices.json:
{ "config": { "log": "info",
"fileDirectory": "./",
"fileName": "vertices.csv"
}
}
commonEdges.json:
{
"begin": [
{ "let": { "name": "$filePath",
"expression": "$fileDirectory.append($fileName )"
}
},
],
"config": { "log": "info"
},
"source": { "file": { "path": "$filePath" } },
"extractor": { "csv": { "ignoreEmptyLines": true,
"nullValue": "N/A",
"dateFormat": "yyyy-mm-dd"
}
},
"transformers": [
{ "merge": { "joinFieldName": "u", "lookup": "myVertex.label" } },
{ "edge": { "class": "myEdge",
"joinFieldName": "v",
"lookup": "myVertex.label",
"direction": "out",
"unresolvedLinkAction": "NOTHING"
}
},
{ "field": { "fieldNames": ["u", "v"], "operation": "remove" } }
],
"loader": {
"orientdb": {
"dbURL": "plocal:my_orientdb",
"dbType": "graph",
"batchCommit": 1000,
"useLightweightEdges": false,
"classes": [
{ "name": "myEdge", "extends", "E" }
],
"indexes": []
}
}
}
edges.json:
{
"config": {
"log": "info",
"fileDirectory": "./",
"fileName": "edges.csv"
}
}
I am running it with oetl.sh like this:
$ oetl.sh vertices.json commonVertices.json
$ oetl.sh edges.json commonEdges.json
Everything runs, but when I query the edges... I'm new to OrientDB, so maybe it is getting the properties in my edges, but when I query the edges I don't see the weight and date fields:
orientdb {db=my_orientdb}> SELECT FROM myEdge
+----+-----+------+-----+-----+
|# |#RID |#CLASS|out |in |
+----+-----+------+-----+-----+
|0 |#33:0|myEdge|#25:0|#26:0|
|1 |#34:0|myEdge|#26:0|#27:0|
+----+-----+------+-----+-----+
The vertex table contains the [weight] field from my edges.csv and the [date] field is getting clobbered in a weird way. The day of the month is getting overwritten to the day from the edge.csv file, which is undesirable, but it's odd to me that the month itself isn't also getting change:
orientdb {db=my_orientdb}> SELECT FROM myVertex
+----+-----+--------+------+-------------------+-----+------+----------+---------+
|# |#RID |#CLASS |data |date |label|weight|out_myEdge|in_myEdge|
+----+-----+--------+------+-------------------+-----+------+----------+---------+
|0 |#25:0|myVertex|0.1234|2015-01-17 00:06:00|v01 |12.4 |[#33:0] | |
|1 |#26:0|myVertex|0.5678|2015-01-14 00:09:00|v02 |17.9 |[#34:0] |[#33:0] |
|2 |#27:0|myVertex|0.9012|2015-01-03 00:01:00|v03 | | |[#34:0] |
+----+-----+--------+------+-------------------+-----+------+----------+---------+
I'm sure it's probably a simple tweak, any help would be great!
In edge transformer use edgeFields to bind properties in edges. Example:
"transformers": [
{ "merge": { "joinFieldName": "u", "lookup": "myVertex.label" } },
{ "edge": { "class": "myEdge",
"joinFieldName": "v",
"lookup": "myVertex.label",
"edgeFields": { "weight": "${input.weight}", "date": "${input.date}" },
"direction": "out",
"unresolvedLinkAction": "NOTHING"
}
},
{ "field": { "fieldNames": ["u", "v"], "operation": "remove" } }
],
Hope it helps.

How to Create Only Edges Through OrientDb ETL

I have a CSV file, having Id1 and Id2. Id1 and Id2 are vertex of two different classes. I want to make edge between Id1 and Id2. Can this be achieved by ETL?
Can we add something into the edge configuration of transformers to achieve this.
I will assume, that
the two classes are A and B
A has Id1
B has Id2
the class of the edge is AtoB
A and B instances are present in the DB
The AtoB.csv is like
AId,BId
a1,b1
a2,b2
a2,b3
Then the following ETL config will do
{
"source": { "file": { "path": "...\AtoB.csv" } },
"extractor": { "csv": { } },
"transformers": [
{ "merge": {
"joinFieldName": "BId",
"lookup": "B.Id2",
"unresolvedLinkAction": "WARNING" } },
{ "vertex": { "class": "B" } },
{ "edge": {
"class": "AtoB",
"joinFieldName": "AId",
"lookup": "A.Id1",
"direction": "in" } },
{ "field": {
"fieldNames": ["AId", "BId"],
"operation": "remove" } }
],
"loader": {
"orientdb": {
"dbURL": "plocal:../databases/...",
"dbType": "graph",
"useLightweightEdges": false,
"classes": [
{ "name": "A", "extends": "V" },
{ "name": "B", "extends": "V" },
{ "name": "AtoB", "extends": "E" }
]
}
}
}
The result will be
(a1) ➡ (b1)
(a2) ➡ (b2)
(a2) ➡ (b3)