OrientDB: missing "half edges" - orientdb

I'm still playing with OrientDB.
Now I'm trying the schema functionalities, that look awesome :-)
I have two data files: joinA.txt and joinB.txt, which I used to populate a database with the following schema (the content of the two files is at the end of the post):
CREATE CLASS Employee EXTENDS V;
CREATE PROPERTY Employee.eid Integer;
CREATE PROPERTY Employee.name String;
CREATE PROPERTY Employee.eage Short;
CREATE INDEX Employee.eid unique_hash_index;
CREATE CLASS ExtendedProfile EXTENDS V;
CREATE CLASS XYZProfile EXTENDS ExtendedProfile;
CREATE PROPERTY XYZProfile.textual String;
-- SameAs can only connect Employees to ExtendedProfile
CREATE CLASS SameAs EXTENDS E; -- same employee across many tables
CREATE PROPERTY SameAs.out LINK ExtendedProfile;
CREATE PROPERTY SameAs.In LINK Employee;
The JSONs I gave to the ETL tool are, for JoinA:
{
"source": { "file": {"path": "the_path"}},
"extractor": {"csv": {
"separator": " ",
"columns": [
"eid:Integer",
"name:String",
"eage:Short"
]
}
},
"transformers": [
{"vertex": {"class": "Employee", "skipDuplicates": true}}
]
,"loader": {
"orientdb": {
"dbURL": "plocal:thepath",
"dbType": "graph",
"useLightweightEdges": false
}
}
}
and for JoinB:
{
"source": { "file": {"path": "thepath"}},
"extractor": {"csv": {
"separator": " ",
"columnsOnFirstLine": false,
"quote": "\"",
"columns": [
"id:String",
"textual:String"
]
}
},
"transformers": [
{"vertex": {"class": "XYZProfile", "skipDuplicates": true}},
{ "edge": { "class": "SameAs",
"direction": "out",
"joinFieldName": "id",
"lookup":"Employee.eid",
"unresolvedLinkAction":"ERROR"}},
],
"loader": {
"orientdb": {
"dbURL": "path",
"dbUser": "root",
"dbPassword": "pwd",
"dbType": "graph",
"useLightweightEdges": false}
}
}
Now, the problem is that when I run select expand(both()) from Employee I get the edges in the column out_SameAs, while when I run select expand(both()) from XYZProfile I get nothing.
This is weird since the first query told me that the #CLASS pointed by the edges is XYZProfile.
Does anybody know what's wrong with my example?
Cheers,
Alberto
JoinA:
1 A 10
2 B 14
3 C 22
JoinB:
1 i0
1 i1
2 i2

Check out your JSON file, I think there is an error on your JSON file. You forget to put [] at the beginning and ending of the JSON file.

It was actually my fault.
The line CREATE PROPERTY SameAs.In LINK Employee; was the problem: In should have been all lowercased, as pointed out here.

Related

Populating only vertex from CSV file

Need help to know how should I populate my vertex class in orientdb with the csv file. The format in csv file is
name,type,status
xxxxx,ABC,3
yyyyy,ABC,1
zzzzz,123,5
--
I have a vertex and edges extended in OrientDB, where the vertex have 3 property name,type and status. I only want the vertex to get populated from csv, the edges will be created dynamically via API
I tried to create ETL file as below :
{
"source":{"file": { "path": "/tmp/ientdb-community-2.2.18/config/data.csv" } },
"extractor": { "csv": {} },
"transformers": [
{ "vertex": { "class": "MyObject" } }
],
"loader": {
"orientdb": {
"dbURL": "remote:localhost/mydb",
"dbUser": "root",
"dbPassword": "root",
"dbType": "graph",
"classes": [
{"name": "MyObject", "extends": "V"},
], "indexes": [
{"class":"MyObject", "fields":["name:string"], "type":"UNIQUE" }
]
}
}
}
I find that, if I use plocal the root/root credential is not working. And the classes are not as same as when logged in with remote (after starting server)
I tried your code and it works for me, this is what I get:
the only changes that I made to your code are: credential, and dbUrl plocal instead of remote:
{
"source":{"file": { "path": "mypath/config/data.csv" } },
"extractor": { "csv": {} },
"transformers": [
{ "vertex": { "class": "MyObject" } }
],
"loader": {
"orientdb": {
"dbURL": "plocal:mypath/databases/mydb",
"dbType": "graph",
"dbUser": "<user name>",
"dbPassword": "<user password>",
**BEGIN UPDATE**
"serverUser": "<server administrator user name, usually root>",
"serverPassword": "<server administrator user password that is provided at server startup>",
**END UPDATE**
"classes": [
{"name": "MyObject", "extends": "V"},
], "indexes": [
{"class":"MyObject", "fields":["name:string"], "type":"UNIQUE" }
]
}
}
}
By the way I noticed that your path is called: ientdb-community-2.2.18 is that correct?
Hope it helps.
Regards.

Create Edges by ETL in same Class

1) Class Items with ItemId and Name ready in the database.
2) CSV-file: two columns,
ItemId1,ItemId2001
ItemId1,ItemId2345
ItemId1,ItemId2381
...
ItemId2,ItemId8393
ItemId2,ItemId8743
..
etc.
Question:
How to define a ETL json-file to create Edges between ItemId1 and all the ItemId's in col#2, and between ItemId2 and its col#2-peers.
I tried to reproduce your problem.
I had this Items
and with this code I connected them
{
"source": { "file": { "path": "myPath/item.csv" } },
"extractor": {"row": {}},
"transformers": [{
"csv": {
"separator": ","
}
},
{
"command" : {
"command" : "create edge from (select from Item where idItem= '${input.idItem1}') to (select from Item where idItem= '${input.idItem2}')",
"output" : "edge"
}
}
],
"loader": {
"orientdb": {
"dbURL": "plocal:myPath/myDb",
"dbType": "graph"
}
}
}
and I got
Hope it helps.

Use ETL to load CSV data into OrientDB containing a SPATIAL index

I'm interested in loading some data into an OrientDB from some CSV files that contain spatial coordinates in WGS84 Lat/Long.
I'm using OrientDB 2.2.8 and have the lucene spatial module added to my $ORIENTDB_HOME/lib directory.
I'm loading my data into a database using ETL and would like to add the spatial index but I'm not sure how to do this.
Say my CSV file has the following columns:
Label (string)
Latitude (float)
Longitude (float)
I've tried this in my ETL:
"loader": {
"orientdb": {
"dbURL": "plocal:myDatabase.orientdb",
"dbType": "graph",
"batchCommit": 1000,
"classes": [ { "name": "vertex", "extends", "V" } ],
"indexes": [ { "class": "vertex", "fields":["Label:string"], "type":"UNIQUE" },
{ "class": "Label", "fields":["Latitude:float","Longitude:float"], "type":"SPATIAL" }
]
}
}
but it's not working. I get the following error message:
ETL process has problem: com.orientechnologies.orient.core.index.OIndexException: Index with type SPATIAL and algorithm null does not exist.
Has anyone looked into creating spatial indices via ETL? Most of the stuff I'm seeing on this is using either Java or via direct query.
Thanks in advance for any advice.
I was able to get it to load using the legacy spatial capabilities.
I put together a cheezy dataset that has some coordinates for a few of the Nazca line geoglyphs:
Name,Latitude,Longitude
Hummingbird,-14.692131,-75.148892
Monkey,-14.7067274,-75.1475391
Condor,-14.6983457,-75.1283374
Spider,-14.694363,-75.1235815
Spiral,-14.688309,-75.122757
Hands,-14.694459,-75.113881
Tree,-14.693897,-75.114467
Astronaut,-14.745222,-75.079755
Dog,-14.706401,-75.130788
I used a script to create my GeoGlyph class, createVertexGeoGlyph.osql:
set echo true
connect PLOCAL:./nazca.orientdb admin admin
CREATE CLASS GeoGlyph EXTENDS V CLUSTERS 1
CREATE PROPERTY GeoGlyph.Name STRING
CREATE PROPERTY GeoGlyph.Latitude FLOAT
CREATE PROPERTY GeoGlyph.Longitude FLOAT
CREATE PROPERTY GeoGlyph.Tag EMBEDDEDSET STRING
CREATE INDEX GeoGlyph.index.Location ON GeoGlyph(Latitude,Longitude) SPATIAL ENGINE LUCENE
which I load into my database using
$ console.sh createVertexGeoGlyph.osql
I do it this way because it seems to work more consistently for me. I've had some difficulties with getting the ETL engine to create defined properties when I've wanted it to off CSV imports. Sometimes it wants to cooperate and create my properties and other times has trouble.
So, the next step to get the data in is to create my .json files for the ETL process. I like to make two, one that is file-specific and another that is a common file since often I have datasets that span multiple files.
First, I have a my nazca_liens.json file:
{
"config": {
"log": "info",
"fileDirectory": "./",
"fileName": "nazca_lines.csv"
}
}
Next is the commonGeoGlyph.json file:
{
"begin": [
{ "let": { "name": "$filePath", "expression": "$fileDirectory.append($fileName )" } },
],
"config": { "log": "debug" },
"source": { "file": { "path": "$filePath" } },
"extractor":
{
"csv": { "ignoreEmptyLines": true,
"nullValue": "N/A",
"separator": ",",
"columnsOnFirstLine": true,
"dateFormat": "yyyy-MM-dd"
}
},
"transformers": [
{ "vertex": { "class": "GeoGlyph" } },
{ "code": { "language":"Javascript",
"code": "print('>>> Current record: ' + record); record;" }
}
],
"loader": {
"orientdb": {
"dbURL": "plocal:nazca.orientdb",
"dbType": "graph",
"batchCommit": 1000,
"classes": [],
"indexes": []
}
}
}
There's more stuff in the file than is necessary, I use it as a template for a lot of stuff. In this case, I don't have to create my index in the ETL file itself because I already created it in the createVertexGeoGlyph.osql file.
To load the data I just use the oetl.sh script:
$ oetl.sh commonGeoGlyph.json nazca_lines.json
This is what's working for me... I'm sure there are better ways to do it, but this works. I'm posting this here to tie off the question. Hopefully someone will find this to be useful.

Utilizing OrientDB ETL to create 2 vertices and a connected edge at every line of CSV

I'm utilizing OrientDB ETL tool to import a large amount of data in GBs. The format of the CSV is such that ( I'm using orientDB 2.2 ) :
"101.186.130.130","527225725","233 djfnsdkj","0.119836317542"
"125.143.534.148","112212983","1227 sdfsdfds","0.0465215171983"
"103.149.957.752","112364761","1121 sdfsdfds","0.0938863016658"
"103.190.245.128","785804692","6138 sdfsdfsd","0.117767539364"
I'm required to create Two vertices one with the value in Column1(key being the value itself) and another Vertex having values in column 2 & 3 ( Its key concatenated with both values and both present as attributes in the second vertex type, the 4th column will be the property of the edge connecting both of these vertices.
I used the below code and it works ok with some errors, one problem is all values in each csv row is stored as properties within the IpAddress vertex, Is there any way to store only the IpAddress in it. Secondly please can you let me know the method to concatenate two values read from the csv.
{
"source": { "file": { "path": "/home/abcd/OrientDB/examples/ip_address.csv" } },
"extractor": { "csv": {"columnsOnFirstLine": false, "columns": ["ip:string", "dpcb:string", "address:string", "prob:string"] } },
"transformers": [
{ "merge": { "joinFieldName":"ip", "lookup":"IpAddress.ip" } },
{ "edge": { "class": "Located",
"joinFieldName": "address",
"lookup": "PhyLocation.loc",
"direction": "out",
"targetVertexFields": { "geo_address": "${input.address}", "dpcb_number": "${input.dpcb}"},
"edgeFields": { "confidence": "${input.prob}" },
"unresolvedLinkAction": "CREATE"
}
}
],
"loader": {
"orientdb": {
"dbURL": "remote:/localhost/Bulk_Transfer_Test",
"dbType": "graph",
"dbUser": "root",
"dbPassword": "tiger",
"serverUser": "root",
"serverPassword": "tiger",
"classes": [
{"name": "IpAddress", "extends": "V"},
{"name": "PhyLocation", "extends": "V"},
{"name": "Located", "extends": "E"}
], "indexes": [
{"class":"IpAddress", "fields":["ip:string"], "type":"UNIQUE" },
{"class":"PhyLocation", "fields":["loc:string"], "type":"UNIQUE" }
]
}
}
}

OrientDB: Create Edges with subselect

I have a simple tree structure in a MySQL table (id, parentId) with about 3 million vertices and wanted to import this into a OrientDB Graph database. The ETL importer imports the vertices smoothly, but can't create edges (NullPointerException). The ETL does not even work on a plain database with the given examples in the documentation (http://orientdb.com/docs/last/Import-a-tree-structure.html throws the same exception), so I just imported the vertices and wanted to create the edges manually.
I have a Vertex class (Address) with two properties (id, parentId) and I want to create the Edges between these Vertices (parentId -> id). Is there a simple way to do this instead of inserting the edges in a loop? Something like in SQL
INSERT INTO E (out, in) VALUES (SELECT parentId, id FROM Address)
Since edges shall only be created with CREATE EDGE, I guess OrientDB does not support such an operation by default. But maybe there is a workaround to create these 3 million edges?
I found it is easy to create a link between the two records:
CREATE LINK parentLink TYPE LINK FROM Address.parentId TO Address.Id
However, I cannot create Edges in such a way. I tried working with variables
CREATE EDGE isParentOf FROM (SELECT FROM Address) TO (SELECT FROM Address WHERE id = $current.parentId)
But that does not work.
Have you tried this ETL Json:
{
"config": {"log": "debug", "parallel": true },
"extractor" : {
"jdbc": { "driver": "oracle.jdbc.driver.OracleDriver",
"url": "jdbc:oracle:thin:hostname/db",
"userName": "username",
"userPassword": "password",
"query": "select id, A.parentId from Address a where rownum<2" }
},
"transformers": [`enter code here`
{ "vertex": { "class": "Address" }},
{ "edge": { "class": "isParentOf",
"joinFieldName": "parentId",
"lookup": "Address.Id",
"direction": "in",
"skipDuplicates":true
}
}
],
"loader": {
"orientdb": {
"dbURL": "remote:server/db",
"dbUser": "user",
"dbPassword": "passwd!",
"dbType": "graph",
"classes": [
{"name": "Address", "extends": "V"},
{"name": "isParentOf", "extends": "E"}
], "indexes": [
{"class":"Address", "fields":["ID:string"], "type":"UNIQUE" }
]
}
}
}