etl and many to many relation

etl and many to many relation - orientdb

In documentation the etl from csv use one to many feature, I would like to extends it to many to many. So I made 3 configs, one for post, one for comment and one for relation. Post and Comment are ok but when I launch relation I have got this error, what I'm doing wrong ?
commentId,postId
0,10
1,10
21,10
41,20
82,20
{
"source": { "file": { "path": "/tmp/relation.csv" } },
"extractor": { "csv": {} },
"transformers": [
{ "edge":
{ "class": "HasComments", "joinFieldName": "postId", "lookup": "Post.id", "direction": "out"},
{ "class": "HasComments", "joinFieldName": "commentId", "lookup": "Comment.id", "direction": "in"}
}
],
"loader": {
"orientdb": {
"dbURL": "plocal:/tmp/test",
"dbType": "graph",
"classes": [
{"name": "Post", "extends": "V"},
{"name": "Comment", "extends": "V"},
{"name": "HasComments", "extends": "E"}
],
"indexes": [
{"class":"Post", "fields":["id:integer"], "type":"UNIQUE" },
{"class":"Comment", "fields":["id:integer"], "type":"UNIQUE" }
]
}
}
}
OrientDB etl v.2.1.9-SNAPSHOT (build 2.1.x#r; 2016-01-07 10:51:24+0000) www.orientdb.com
BEGIN ETL PROCESSOR
[file] INFO Reading from file /tmp/relation.csv with encoding UTF-8
Error in Pipeline execution: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1#72ade7e3' is not supported
ETL process halted: com.orientechnologies.orient.etl.OETLProcessHaltedException: Halt
Exception in thread "main" com.orientechnologies.orient.etl.OETLProcessHaltedException: Halt
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:149)
at com.orientechnologies.orient.etl.OETLProcessor.executeSequentially(OETLProcessor.java:448)
at com.orientechnologies.orient.etl.OETLProcessor.execute(OETLProcessor.java:255)
at com.orientechnologies.orient.etl.OETLProcessor.main(OETLProcessor.java:109)
Caused by: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1#72ade7e3' is not supported
at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:107)
at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:37)
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:115)
... 3 more

One possible solution might be, after importing the Post and Comment classes through their json file, you can use another json file and import the class Relation
{
"source": { "file": { "path": "/tmp/relation.csv" } },
"extractor": { "row": {} },
"transformers": [
{ "csv": { "separator": ","}
},
{ "vertex": { "class": "Relation" } }
],
"loader": {
"orientdb": {
"dbURL": "plocal:/tmp/test",
"dbType": "graph",
"classes": [
{"name": "Post", "extends": "V"},
{"name": "Comment", "extends": "V"},
{"name": "Relation", "extends": "V"},
{"name": "HasComments", "extends": "E"}
],
"indexes": [
{"class":"Post", "fields":["id:integer"], "type":"UNIQUE" },
{"class":"Comment", "fields":["id:integer"], "type":"UNIQUE" }
]
}
}
}
You'll get these records.
Using the following javascript function
var g=orient.getGraphNoTx();
var relation = g.command("sql","select from Relation");
for(i=0;i<relation.length;i++){
var relationMM=g.command("sql","select postId , commentId from "+ relation[i].getId());
var idPost=relationMM[0].getProperty("postId");
var idComment=relationMM[0].getProperty("commentId");
var post=g.command("sql","select from Post where id = " + idPost);
var comment=g.command("sql","select from Comment where id = " + idComment);
g.command("sql","create edge HasComments from " + post[0].getId() + " to " + comment[0].getId());
}
g.command("sql","drop class Relation unsafe");
You'll get the following structure.
This will be your graph.
UPDATE
You can use this code to check if the edge already exists
var counter=g.command("sql","select count(*) from HasComments where out=" + post[0].getId() + " and in=" + comment[0].getId());
if(counter[0].getProperty("count")==0){
g.command("sql","create edge HasComments from " + post[0].getId() + " to " + comment[0].getId());
}

Related

Cannot use resultSelector while developing an Azure DevOps extension

I am working on a custom extension for Azure Devops which already contains a service endpoint:
"type": "ms.vss-endpoint.service-endpoint-type"
In addition, I would like to create a custom Release Artifact Source:
“type”: “ms.vss-releaseartifact.release-artifact-type”
Following this documentation, my current struggle is in filling the fields under the Artifact Source using an external API. I tried many patterns in the following ‘resultSelector’ and ‘resultTemplate’, but couldn’t hit one that worked for me.
In my example, I would like to take all the ‘uri’ values under ‘builds’ in the json response and present them in the ‘definition’ inputDescriptor of the Artifact Source. All my attempts resulted in an empty combo-box, even though I can see the request reaching the required API.
The json I would like to parse into the combo-box:
{
"builds": [
{
"uri": "/build1",
"lastStarted": "2018-11-07T13:12:42.547+0000"
},
{
"uri": "/build2",
"lastStarted": "2018-11-09T15:40:30.315+0000"
},
{
"uri": "/build3",
"lastStarted": "2018-11-12T17:46:24.805+0000"
}
],
"uri": "https://<server-address>/api/build"
}
Can you please help me create the Mustache pattern to retrieve the above "uri" values?
I tried:
$.builds[*].uri
which doesn't seem to work.
Here's some more information in case it helps.
Service endpoint's datasources:
"dataSources": [
{
"name": "TestConnection",
"endpointUrl": "{{endpoint.url}}/api/plugins",
"resourceUrl": "",
"resultSelector": "jsonpath:$.values[*]",
"headers": [],
"authenticationScheme": null
},
{
"name": "BuildNames",
"endpointUrl": "{{endpoint.url}}/api/build",
"resultSelector": "jsonpath:$.builds[*].uri"
},
{
"name": "BuildNumbers",
"endpointUrl": "{{endpoint.url}}/api/builds/{{definition}}",
"resultSelector": "jsonpath:$.buildsNumbers[*].uri"
}
]
Artifact source:
"inputDescriptors": [
{
"id": "connection",
"name": "Artifactory service",
"inputMode": "combo",
"isConfidential": false,
"hasDynamicValueInformation": true,
"validation": {
"isRequired": true,
"dataType": "string",
"maxLength": 300
}
},
{
"id": "definition",
"name": "definition",
"description": "Name of the build.",
"inputMode": "combo",
"isConfidential": false,
"dependencyInputIds": [
"connection"
],
"validation": {
"isRequired": true,
"dataType": "string",
"maxLength": 300
}
},
{
"id": "buildNumber",
"name": "Build Number",
"description": "Number of the build.",
"inputMode": "combo",
"isConfidential": false,
"dependencyInputIds": [
"connection"
],
"validation": {
"isRequired": true,
"dataType": "string",
"maxLength": 300
}
}
],
"dataSourceBindings": [
{
"target": "definition",
"dataSourceName": "BuildNames",
"resultTemplate": "{ Value : \"{{uri}}\", DisplayValue : \"{{uri}}\" }"
},
{
"target": "versions",
"dataSourceName": "BuildNumbers",
"resultTemplate": "{ Value : \"{{uri}}\", DisplayValue : \"{{uri}}\" }"
},
{
"target": "latestVersion",
"dataSourceName": "BuildNumbers",
"resultTemplate": "{ Value : \"{{uri}}\", DisplayValue : \"{{uri}}\" }"
},
{
"target": "artifactDetails",
"resultTemplate": "{ Name: \"{{version}}\", downloadUrl : \"{{endpoint.url}}\" }"
},
{
"target": "buildNumber",
"dataSourceName": "BuildNumbers",
"resultTemplate": "{ Value : \"{{uri}}\", DisplayValue : \"{{uri}}\" }"
}
]
}
Any help provided will be highly appreciated.

The working combination for this case is:
dataSources:
{
"name": "BuildNames",
"endpointUrl": "{{endpoint.url}}/api/build",
"resultSelector": "jsonpath:$.builds[*]"
}
dataSourceBindings:
{
"target": "definition",
"dataSourceName": "BuildNames",
"resultTemplate": "{ \"Value\" : \"{{{uri}}}\", \"DisplayValue\" : \"{{{uri}}}\" }"
}

ElasticSearch Reindex API and painless script to access date field

I try to familiarize myself with the Reindexing API of ElasticSearch and the use of Painless scripts.
I have the following model:
"mappings": {
"customer": {
"properties": {
"firstName": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"lastName": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"dateOfBirth": {
"type": "date"
}
}
}
}
I would like to reindex all documents from test-v1 to test-v2 and apply a few transformations on them (for example extract the year part of dateOfBirth, convert a date value to a timestamp, etc) and save the result as a new field. But I got an issue when I tried to access it.
When I made the following call, I got an error:
POST /_reindex?pretty=true&human=true&wait_for_completion=true HTTP/1.1
Host: localhost:9200
Content-Type: application/json
{
"source": {
"index": "test-v1"
},
"dest": {
"index": "test-v2"
},
"script": {
"lang": "painless",
"inline": "ctx._source.yearOfBirth = ctx._source.dateOfBirth.getYear();"
}
}
And the response:
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"ctx._source.yearOfBirth = ctx._source.dateOfBirth.getYear();",
" ^---- HERE"
],
"script": "ctx._source.yearOfBirth = ctx._source.dateOfBirth.getYear();",
"lang": "painless"
}
],
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"ctx._source.yearOfBirth = ctx._source.dateOfBirth.getYear();",
" ^---- HERE"
],
"script": "ctx._source.yearOfBirth = ctx._source.dateOfBirth.getYear();",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Unable to find dynamic method [getYear] with [0] arguments for class [java.lang.String]."
}
},
"status": 500
}
According to this tutorial Date fields are exposed as ReadableDateTime so they support methods like getYear, and getDayOfWeek. and indeed, the Reference mentions those as supported methods.
Still, the response mentions [java.lang.String] as the type of the dateOfBirth property. I could just parse it to e.g. an OffsetDateTime, but I wonder why it is a string.
Anyone has a suggestion what I'm doing wrong?

Populating only vertex from CSV file

Need help to know how should I populate my vertex class in orientdb with the csv file. The format in csv file is
name,type,status
xxxxx,ABC,3
yyyyy,ABC,1
zzzzz,123,5
--
I have a vertex and edges extended in OrientDB, where the vertex have 3 property name,type and status. I only want the vertex to get populated from csv, the edges will be created dynamically via API
I tried to create ETL file as below :
{
"source":{"file": { "path": "/tmp/ientdb-community-2.2.18/config/data.csv" } },
"extractor": { "csv": {} },
"transformers": [
{ "vertex": { "class": "MyObject" } }
],
"loader": {
"orientdb": {
"dbURL": "remote:localhost/mydb",
"dbUser": "root",
"dbPassword": "root",
"dbType": "graph",
"classes": [
{"name": "MyObject", "extends": "V"},
], "indexes": [
{"class":"MyObject", "fields":["name:string"], "type":"UNIQUE" }
]
}
}
}
I find that, if I use plocal the root/root credential is not working. And the classes are not as same as when logged in with remote (after starting server)

I tried your code and it works for me, this is what I get:
the only changes that I made to your code are: credential, and dbUrl plocal instead of remote:
{
"source":{"file": { "path": "mypath/config/data.csv" } },
"extractor": { "csv": {} },
"transformers": [
{ "vertex": { "class": "MyObject" } }
],
"loader": {
"orientdb": {
"dbURL": "plocal:mypath/databases/mydb",
"dbType": "graph",
"dbUser": "<user name>",
"dbPassword": "<user password>",
**BEGIN UPDATE**
"serverUser": "<server administrator user name, usually root>",
"serverPassword": "<server administrator user password that is provided at server startup>",
**END UPDATE**
"classes": [
{"name": "MyObject", "extends": "V"},
], "indexes": [
{"class":"MyObject", "fields":["name:string"], "type":"UNIQUE" }
]
}
}
}
By the way I noticed that your path is called: ientdb-community-2.2.18 is that correct?
Hope it helps.
Regards.

Utilizing OrientDB ETL to create 2 vertices and a connected edge at every line of CSV

I'm utilizing OrientDB ETL tool to import a large amount of data in GBs. The format of the CSV is such that ( I'm using orientDB 2.2 ) :
"101.186.130.130","527225725","233 djfnsdkj","0.119836317542"
"125.143.534.148","112212983","1227 sdfsdfds","0.0465215171983"
"103.149.957.752","112364761","1121 sdfsdfds","0.0938863016658"
"103.190.245.128","785804692","6138 sdfsdfsd","0.117767539364"
I'm required to create Two vertices one with the value in Column1(key being the value itself) and another Vertex having values in column 2 & 3 ( Its key concatenated with both values and both present as attributes in the second vertex type, the 4th column will be the property of the edge connecting both of these vertices.
I used the below code and it works ok with some errors, one problem is all values in each csv row is stored as properties within the IpAddress vertex, Is there any way to store only the IpAddress in it. Secondly please can you let me know the method to concatenate two values read from the csv.
{
"source": { "file": { "path": "/home/abcd/OrientDB/examples/ip_address.csv" } },
"extractor": { "csv": {"columnsOnFirstLine": false, "columns": ["ip:string", "dpcb:string", "address:string", "prob:string"] } },
"transformers": [
{ "merge": { "joinFieldName":"ip", "lookup":"IpAddress.ip" } },
{ "edge": { "class": "Located",
"joinFieldName": "address",
"lookup": "PhyLocation.loc",
"direction": "out",
"targetVertexFields": { "geo_address": "${input.address}", "dpcb_number": "${input.dpcb}"},
"edgeFields": { "confidence": "${input.prob}" },
"unresolvedLinkAction": "CREATE"
}
}
],
"loader": {
"orientdb": {
"dbURL": "remote:/localhost/Bulk_Transfer_Test",
"dbType": "graph",
"dbUser": "root",
"dbPassword": "tiger",
"serverUser": "root",
"serverPassword": "tiger",
"classes": [
{"name": "IpAddress", "extends": "V"},
{"name": "PhyLocation", "extends": "V"},
{"name": "Located", "extends": "E"}
], "indexes": [
{"class":"IpAddress", "fields":["ip:string"], "type":"UNIQUE" },
{"class":"PhyLocation", "fields":["loc:string"], "type":"UNIQUE" }
]
}
}
}

"No nodes configured for partition" after creating a database via ETL

I've just created a custom database using the following ETL config,
{
"source": { "file": { "path": "./mydata.csv" } },
"extractor": { "row": {} },
"transformers": [
{ "csv": {} },
{ "vertex": { "class": "MyClass" } }
],
"loader": {
"orientdb": {
"dbURL": "plocal:/opt/orientdb/databases/MyData",
"dbUser": "root",
"dbPassword": "qrefhiuqwriouhwqv",
"dbType": "graph",
"classes": [
{"name": "MyClass", "extends": "V"},
]
}
}
}
Now, when I go to the web console, I can see I have 433k records of type MyClass created at database MyData.
When I try to query it with "select from MyClass", I get the error
2015-04-06 23:56:25:541 SEVERE Internal server error:
com.orientechnologies.orient.server.distributed.ODistributedException:
No nodes configured for partition 'MyClass.[]' request:
id=-1 from=node1428362873334 task=command_sql(select from MyClass) userName= [ONetworkProtocolHttpDb]
What am I doing wrong?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

etl and many to many relation - orientdb

Related

Cannot use resultSelector while developing an Azure DevOps extension

ElasticSearch Reindex API and painless script to access date field

Populating only vertex from CSV file

Utilizing OrientDB ETL to create 2 vertices and a connected edge at every line of CSV

"No nodes configured for partition" after creating a database via ETL

Categories

Resources