Create Edges by ETL in same Class - orientdb

1) Class Items with ItemId and Name ready in the database.
2) CSV-file: two columns,
ItemId1,ItemId2001
ItemId1,ItemId2345
ItemId1,ItemId2381
...
ItemId2,ItemId8393
ItemId2,ItemId8743
..
etc.
Question:
How to define a ETL json-file to create Edges between ItemId1 and all the ItemId's in col#2, and between ItemId2 and its col#2-peers.

I tried to reproduce your problem.
I had this Items
and with this code I connected them
{
"source": { "file": { "path": "myPath/item.csv" } },
"extractor": {"row": {}},
"transformers": [{
"csv": {
"separator": ","
}
},
{
"command" : {
"command" : "create edge from (select from Item where idItem= '${input.idItem1}') to (select from Item where idItem= '${input.idItem2}')",
"output" : "edge"
}
}
],
"loader": {
"orientdb": {
"dbURL": "plocal:myPath/myDb",
"dbType": "graph"
}
}
}
and I got
Hope it helps.

Related

Gentics Mesh - Multilanguage support - Cross language in a list of node - GraphQL query

Gentics Mesh Version : v1.5.1
Intro:
Let suppose we have schema A with a field of type: list and list type: node and allowed schemas: B. (see (1)).
An instance of B node has been created (b1-EN) in language en and (b1-DE) in de.
An instance of B node has been created (b2-EN) in languages en.
An instance of A node has been created (a1-DE) in language de and b1-DE and b2-EN are added in the node list (Bs) of a1.
As result, when selecting de language in the Gentics Mesh CMS, Node a1-DE (de) has a list of 2 nodes b1-DE, b2-EN.
When the following GraphQL query is applied :
{
node(path: "/a1-DE") {
... on A {
path
uuid
availableLanguages
fields {
Bs {
... on B {
path
fields {
id
}
}
}
}
}
}
}
The result is :
{
"data": {
"node": {
"path": "/a1-DE",
"uuid": "30dfd534cdee40dd8551e6322c6b1518",
"availableLanguages": [
"de"
],
"fields": {
"Bs": [
{
"path": "/b1-DE",
"fields": {
"id": "b1-DE"
}
},
{
"path": null,
"fields": null
}
]
}
}
}
}
Question:
Why the result is not showing the b2-EN node in the list of nodes ? Is the query wrong ? What I would like to get as result is the default language version of the node (b2-EN) because the b2-DE is not contributed yet. so the expected result :
{
"data": {
"node": {
"path": "/a1-DE",
"uuid": "30dfd534cdee40dd8551e6322c6b1518",
"availableLanguages": [
"de"
],
"fields": {
"Bs": [
{
"path": "/b1-DE",
"fields": {
"id": "b1-DE"
}
},
{
"path": "/b2-EN",
"fields": {
"id": "b2-EN"
}
}
]
}
}
}
}
In the documentation (2):
The fallback to the configured default language will be applied if no other matching content found be found. Null will be returned if this also fails.
Can someone enlighten me ?
(1): Schema
{
"name": "A",
"container": false,
"autoPurge": false,
"displayField": "id",
"segmentField": "id",
"urlFields": [
"id"
],
"fields": [
{
"name": "Bs",
"type": "list",
"label": "Bs",
"required": false,
"listType": "node",
"allow": [
"B"
]
},
{
"name": "id",
"type": "string",
"label": "id",
"required": true
}
]
}
(2) https://getmesh.io/docs/graphql/#_multilanguage_support
There are some known issues and inconsistent behaviour when loading nodes via GraphQL. See this issue: https://github.com/gentics/mesh/issues/971
In your case, the queried list of nodes will always be in the configured default language (in mesh.yml). In your case this seems to be de. This is why the English-only node yields no result.
Until this is fixed, you can work around this issue by loading all languages of the node list:
{
node(path: "/a1-DE") {
... on A {
path
uuid
availableLanguages
fields {
Bs {
... on B {
languages {
path
language
fields {
id
}
}
}
}
}
}
}
}
You will the contents of all languages of the node list. This means that you will have to filter for the desired language in your code after receiving the response.

Populating only vertex from CSV file

Need help to know how should I populate my vertex class in orientdb with the csv file. The format in csv file is
name,type,status
xxxxx,ABC,3
yyyyy,ABC,1
zzzzz,123,5
--
I have a vertex and edges extended in OrientDB, where the vertex have 3 property name,type and status. I only want the vertex to get populated from csv, the edges will be created dynamically via API
I tried to create ETL file as below :
{
"source":{"file": { "path": "/tmp/ientdb-community-2.2.18/config/data.csv" } },
"extractor": { "csv": {} },
"transformers": [
{ "vertex": { "class": "MyObject" } }
],
"loader": {
"orientdb": {
"dbURL": "remote:localhost/mydb",
"dbUser": "root",
"dbPassword": "root",
"dbType": "graph",
"classes": [
{"name": "MyObject", "extends": "V"},
], "indexes": [
{"class":"MyObject", "fields":["name:string"], "type":"UNIQUE" }
]
}
}
}
I find that, if I use plocal the root/root credential is not working. And the classes are not as same as when logged in with remote (after starting server)
I tried your code and it works for me, this is what I get:
the only changes that I made to your code are: credential, and dbUrl plocal instead of remote:
{
"source":{"file": { "path": "mypath/config/data.csv" } },
"extractor": { "csv": {} },
"transformers": [
{ "vertex": { "class": "MyObject" } }
],
"loader": {
"orientdb": {
"dbURL": "plocal:mypath/databases/mydb",
"dbType": "graph",
"dbUser": "<user name>",
"dbPassword": "<user password>",
**BEGIN UPDATE**
"serverUser": "<server administrator user name, usually root>",
"serverPassword": "<server administrator user password that is provided at server startup>",
**END UPDATE**
"classes": [
{"name": "MyObject", "extends": "V"},
], "indexes": [
{"class":"MyObject", "fields":["name:string"], "type":"UNIQUE" }
]
}
}
}
By the way I noticed that your path is called: ientdb-community-2.2.18 is that correct?
Hope it helps.
Regards.

Utilizing OrientDB ETL to create 2 vertices and a connected edge at every line of CSV

I'm utilizing OrientDB ETL tool to import a large amount of data in GBs. The format of the CSV is such that ( I'm using orientDB 2.2 ) :
"101.186.130.130","527225725","233 djfnsdkj","0.119836317542"
"125.143.534.148","112212983","1227 sdfsdfds","0.0465215171983"
"103.149.957.752","112364761","1121 sdfsdfds","0.0938863016658"
"103.190.245.128","785804692","6138 sdfsdfsd","0.117767539364"
I'm required to create Two vertices one with the value in Column1(key being the value itself) and another Vertex having values in column 2 & 3 ( Its key concatenated with both values and both present as attributes in the second vertex type, the 4th column will be the property of the edge connecting both of these vertices.
I used the below code and it works ok with some errors, one problem is all values in each csv row is stored as properties within the IpAddress vertex, Is there any way to store only the IpAddress in it. Secondly please can you let me know the method to concatenate two values read from the csv.
{
"source": { "file": { "path": "/home/abcd/OrientDB/examples/ip_address.csv" } },
"extractor": { "csv": {"columnsOnFirstLine": false, "columns": ["ip:string", "dpcb:string", "address:string", "prob:string"] } },
"transformers": [
{ "merge": { "joinFieldName":"ip", "lookup":"IpAddress.ip" } },
{ "edge": { "class": "Located",
"joinFieldName": "address",
"lookup": "PhyLocation.loc",
"direction": "out",
"targetVertexFields": { "geo_address": "${input.address}", "dpcb_number": "${input.dpcb}"},
"edgeFields": { "confidence": "${input.prob}" },
"unresolvedLinkAction": "CREATE"
}
}
],
"loader": {
"orientdb": {
"dbURL": "remote:/localhost/Bulk_Transfer_Test",
"dbType": "graph",
"dbUser": "root",
"dbPassword": "tiger",
"serverUser": "root",
"serverPassword": "tiger",
"classes": [
{"name": "IpAddress", "extends": "V"},
{"name": "PhyLocation", "extends": "V"},
{"name": "Located", "extends": "E"}
], "indexes": [
{"class":"IpAddress", "fields":["ip:string"], "type":"UNIQUE" },
{"class":"PhyLocation", "fields":["loc:string"], "type":"UNIQUE" }
]
}
}
}

OrientDB: missing "half edges"

I'm still playing with OrientDB.
Now I'm trying the schema functionalities, that look awesome :-)
I have two data files: joinA.txt and joinB.txt, which I used to populate a database with the following schema (the content of the two files is at the end of the post):
CREATE CLASS Employee EXTENDS V;
CREATE PROPERTY Employee.eid Integer;
CREATE PROPERTY Employee.name String;
CREATE PROPERTY Employee.eage Short;
CREATE INDEX Employee.eid unique_hash_index;
CREATE CLASS ExtendedProfile EXTENDS V;
CREATE CLASS XYZProfile EXTENDS ExtendedProfile;
CREATE PROPERTY XYZProfile.textual String;
-- SameAs can only connect Employees to ExtendedProfile
CREATE CLASS SameAs EXTENDS E; -- same employee across many tables
CREATE PROPERTY SameAs.out LINK ExtendedProfile;
CREATE PROPERTY SameAs.In LINK Employee;
The JSONs I gave to the ETL tool are, for JoinA:
{
"source": { "file": {"path": "the_path"}},
"extractor": {"csv": {
"separator": " ",
"columns": [
"eid:Integer",
"name:String",
"eage:Short"
]
}
},
"transformers": [
{"vertex": {"class": "Employee", "skipDuplicates": true}}
]
,"loader": {
"orientdb": {
"dbURL": "plocal:thepath",
"dbType": "graph",
"useLightweightEdges": false
}
}
}
and for JoinB:
{
"source": { "file": {"path": "thepath"}},
"extractor": {"csv": {
"separator": " ",
"columnsOnFirstLine": false,
"quote": "\"",
"columns": [
"id:String",
"textual:String"
]
}
},
"transformers": [
{"vertex": {"class": "XYZProfile", "skipDuplicates": true}},
{ "edge": { "class": "SameAs",
"direction": "out",
"joinFieldName": "id",
"lookup":"Employee.eid",
"unresolvedLinkAction":"ERROR"}},
],
"loader": {
"orientdb": {
"dbURL": "path",
"dbUser": "root",
"dbPassword": "pwd",
"dbType": "graph",
"useLightweightEdges": false}
}
}
Now, the problem is that when I run select expand(both()) from Employee I get the edges in the column out_SameAs, while when I run select expand(both()) from XYZProfile I get nothing.
This is weird since the first query told me that the #CLASS pointed by the edges is XYZProfile.
Does anybody know what's wrong with my example?
Cheers,
Alberto
JoinA:
1 A 10
2 B 14
3 C 22
JoinB:
1 i0
1 i1
2 i2
Check out your JSON file, I think there is an error on your JSON file. You forget to put [] at the beginning and ending of the JSON file.
It was actually my fault.
The line CREATE PROPERTY SameAs.In LINK Employee; was the problem: In should have been all lowercased, as pointed out here.

"No nodes configured for partition" after creating a database via ETL

I've just created a custom database using the following ETL config,
{
"source": { "file": { "path": "./mydata.csv" } },
"extractor": { "row": {} },
"transformers": [
{ "csv": {} },
{ "vertex": { "class": "MyClass" } }
],
"loader": {
"orientdb": {
"dbURL": "plocal:/opt/orientdb/databases/MyData",
"dbUser": "root",
"dbPassword": "qrefhiuqwriouhwqv",
"dbType": "graph",
"classes": [
{"name": "MyClass", "extends": "V"},
]
}
}
}
Now, when I go to the web console, I can see I have 433k records of type MyClass created at database MyData.
When I try to query it with "select from MyClass", I get the error
2015-04-06 23:56:25:541 SEVERE Internal server error:
com.orientechnologies.orient.server.distributed.ODistributedException:
No nodes configured for partition 'MyClass.[]' request:
id=-1 from=node1428362873334 task=command_sql(select from MyClass) userName= [ONetworkProtocolHttpDb]
What am I doing wrong?