I have a simple tree structure in a MySQL table (id, parentId) with about 3 million vertices and wanted to import this into a OrientDB Graph database. The ETL importer imports the vertices smoothly, but can't create edges (NullPointerException). The ETL does not even work on a plain database with the given examples in the documentation (http://orientdb.com/docs/last/Import-a-tree-structure.html throws the same exception), so I just imported the vertices and wanted to create the edges manually.
I have a Vertex class (Address) with two properties (id, parentId) and I want to create the Edges between these Vertices (parentId -> id). Is there a simple way to do this instead of inserting the edges in a loop? Something like in SQL
INSERT INTO E (out, in) VALUES (SELECT parentId, id FROM Address)
Since edges shall only be created with CREATE EDGE, I guess OrientDB does not support such an operation by default. But maybe there is a workaround to create these 3 million edges?
I found it is easy to create a link between the two records:
CREATE LINK parentLink TYPE LINK FROM Address.parentId TO Address.Id
However, I cannot create Edges in such a way. I tried working with variables
CREATE EDGE isParentOf FROM (SELECT FROM Address) TO (SELECT FROM Address WHERE id = $current.parentId)
But that does not work.
Have you tried this ETL Json:
{
"config": {"log": "debug", "parallel": true },
"extractor" : {
"jdbc": { "driver": "oracle.jdbc.driver.OracleDriver",
"url": "jdbc:oracle:thin:hostname/db",
"userName": "username",
"userPassword": "password",
"query": "select id, A.parentId from Address a where rownum<2" }
},
"transformers": [`enter code here`
{ "vertex": { "class": "Address" }},
{ "edge": { "class": "isParentOf",
"joinFieldName": "parentId",
"lookup": "Address.Id",
"direction": "in",
"skipDuplicates":true
}
}
],
"loader": {
"orientdb": {
"dbURL": "remote:server/db",
"dbUser": "user",
"dbPassword": "passwd!",
"dbType": "graph",
"classes": [
{"name": "Address", "extends": "V"},
{"name": "isParentOf", "extends": "E"}
], "indexes": [
{"class":"Address", "fields":["ID:string"], "type":"UNIQUE" }
]
}
}
}
Related
I am using ibm graph in bluemix and new to this.
I created a graph named 'test' using the GUI provided by bluemix and uploaded the sample data 'Music Festival' provided by ibm in that graph.
Now I am trying to query all the vertices having label 'attendee' using below query.
def gt = graph.traversal();
gt.V().hasLabel("attendee");
But I am getting error as
Error: Error encountered evaluating script def gt = graph.traversal();gt.V().hasLabel("attendee"); with reason com.thinkaurelius.titan.core.TitanException: Could not find a suitable index to answer graph query and graph scans are disabled: [(~label = attendee)]:VERTEX
Not sure what I am doing wrong.
Can somebody tell where am i going wrong?
How can i get rid of this error and get the expected output?
Thanks
#Radhika, Your Gremlin query is a valid Gremlin query. However, some vendors (such as IBM Graph and Titan) chose to only allow users to start their queries with a query that is indexed.This is to make sure you get the performance of your queries. Calling hasLabel() by itself will give you the Could not find a suitable index... error as you can't create indexes for labels. What you need to do is follow this step with a step that uses a indexed property as in this query :
graph.traversal();gt.V().hasLabel("band").has("genre","pop");
An index for genre has been created in the schema for the sample music festival data as you can see below
{
"propertyKeys": [
{ "name": "name", "dataType": "String", "cardinality": "SINGLE" },
{ "name": "gender", "dataType": "String", "cardinality": "SINGLE" },
{ "name": "age", "dataType": "Integer", "cardinality": "SINGLE" },
{ "name": "genre", "dataType": "String", "cardinality": "SINGLE" },
{ "name": "monthly_listeners", "dataType": "String", "cardinality": "SINGLE" },
{ "name":"date","dataType":"String","cardinality":"SINGLE" },
{ "name":"time","dataType":"String","cardinality":"SINGLE" }
],
"vertexLabels": [
{ "name": "attendee" },
{ "name": "band" },
{ "name": "venue" }
],
"edgeLabels": [
{ "name": "bought_ticket", "multiplicity": "MULTI" },
{ "name":"advertised_to","multiplicity":"MULTI" },
{ "name":"performing_at","multiplicity":"MULTI" }
],
"vertexIndexes": [
{ "name": "vByName", "propertyKeys": ["name"], "composite": true, "unique": false },
{ "name": "vByGender", "propertyKeys": ["gender"], "composite": true, "unique": false },
{ "name": "vByGenre", "propertyKeys": ["genre"], "composite": true, "unique": false}
],
"edgeIndexes" :[
{ "name": "eByBoughtTicket", "propertyKeys": ["time"], "composite": true, "unique": false }
]
That's why the above query works and you need to do the same.
If you don't have a schema, create one. You can model it after the
one above or follow the API
doc
Create an (Vertex/Label) index for the properties that you'll start
your traversals from. In this example, Name, Gender and Genre for
vertex properties and name for the edge properties.
Call the schema
endpoint
to add your schema to your graph
It's recommended to create your schema before adding any data to
your graph so that you don't have to reindex later. That'll save you
a lot of time.
Once you create your schema, you can't modify what you created
already, but you can add new properties/indexes later on.
Look at the following code samples for Java and Nodejs for the exact code to use.
I hope that helps
I'm utilizing OrientDB ETL tool to import a large amount of data in GBs. The format of the CSV is such that ( I'm using orientDB 2.2 ) :
"101.186.130.130","527225725","233 djfnsdkj","0.119836317542"
"125.143.534.148","112212983","1227 sdfsdfds","0.0465215171983"
"103.149.957.752","112364761","1121 sdfsdfds","0.0938863016658"
"103.190.245.128","785804692","6138 sdfsdfsd","0.117767539364"
I'm required to create Two vertices one with the value in Column1(key being the value itself) and another Vertex having values in column 2 & 3 ( Its key concatenated with both values and both present as attributes in the second vertex type, the 4th column will be the property of the edge connecting both of these vertices.
I used the below code and it works ok with some errors, one problem is all values in each csv row is stored as properties within the IpAddress vertex, Is there any way to store only the IpAddress in it. Secondly please can you let me know the method to concatenate two values read from the csv.
{
"source": { "file": { "path": "/home/abcd/OrientDB/examples/ip_address.csv" } },
"extractor": { "csv": {"columnsOnFirstLine": false, "columns": ["ip:string", "dpcb:string", "address:string", "prob:string"] } },
"transformers": [
{ "merge": { "joinFieldName":"ip", "lookup":"IpAddress.ip" } },
{ "edge": { "class": "Located",
"joinFieldName": "address",
"lookup": "PhyLocation.loc",
"direction": "out",
"targetVertexFields": { "geo_address": "${input.address}", "dpcb_number": "${input.dpcb}"},
"edgeFields": { "confidence": "${input.prob}" },
"unresolvedLinkAction": "CREATE"
}
}
],
"loader": {
"orientdb": {
"dbURL": "remote:/localhost/Bulk_Transfer_Test",
"dbType": "graph",
"dbUser": "root",
"dbPassword": "tiger",
"serverUser": "root",
"serverPassword": "tiger",
"classes": [
{"name": "IpAddress", "extends": "V"},
{"name": "PhyLocation", "extends": "V"},
{"name": "Located", "extends": "E"}
], "indexes": [
{"class":"IpAddress", "fields":["ip:string"], "type":"UNIQUE" },
{"class":"PhyLocation", "fields":["loc:string"], "type":"UNIQUE" }
]
}
}
}
I trie import and edge by oetl with orientdb 2.2.4/2.2.5/2.2.6. In all versions the error is the same. If I use version 2.1 the error doesn't occurs.
My json file is
{
"config": {
"log": "info",
"parallel": false
},
"source": {
"file": {
"path": "/opt/orientdb/csvs_1milhao/metodo03/a10a.csv"
}
},
"extractor": {
"row": {
}
},
"transformers": [{
"csv": {
"separator": ",",
"columnsOnFirstLine": true,
"columns": ["psq_id_from:integer",
"pro_id_to:integer",
"ordem:integer"]
}
},
{
"command": {
"command": "create edge PUBLICOU from (SELECT FROM index:Pesquisador.psq_id WHERE key = ${input.psq_id_from}) to (SELECT FROM index:Producao.pro_id where key = ${input.pro_id_to})",
"output": "edge"
}
}],
"loader": {
"orientdb": {
"dbURL": "remote:localhost/dbUmMilhaoM03",
"dbUser": "admin",
"dbPassword": "admin",
"dbURL": "remote:localhost/dbUmMilhaoM03",
"dbType": "graph",
"standardElementConstraints": false,
"batchCommit": 1000,
"classes": [{
"name": "PUBLICOU",
"extends": "E"
}]
}
}
}
When I execute the oetl command, the result is:
root#teste:/opt/orientdb_226/bin# ./oetl.sh /opt/orientdb_226/scripts_orientdb/Db1Milhao/metodo03/a10a_psq_publicou_pro.json >> log_m03
Exception in thread "main" com.orientechnologies.orient.core.exception.OConfigurationException: Error on creating ETL processor
at com.orientechnologies.orient.etl.OETLProcessor.parse(OETLProcessor.java:225)
at com.orientechnologies.orient.etl.OETLProcessor.parse(OETLProcessor.java:176)
at com.orientechnologies.orient.etl.OETLProcessor.parseConfigAndParameters(OETLProcessor.java:144)
at com.orientechnologies.orient.etl.OETLProcessor.main(OETLProcessor.java:108)
Caused by: com.orientechnologies.orient.etl.loader.OLoaderException: unable to manage remote db without server admin credentials
at com.orientechnologies.orient.etl.loader.OOrientDBLoader.manageRemoteDatabase(OOrientDBLoader.java:447)
at com.orientechnologies.orient.etl.loader.OOrientDBLoader.configure(OOrientDBLoader.java:391)
at com.orientechnologies.orient.etl.OETLProcessor.configureComponent(OETLProcessor.java:448)
at com.orientechnologies.orient.etl.OETLProcessor.configureLoader(OETLProcessor.java:262)
at com.orientechnologies.orient.etl.OETLProcessor.parse(OETLProcessor.java:209)
... 3 more
When I execute with OrientDb 2.1 the result is:
Exception in thread "main" com.orientechnologies.orient.etl.OETLProcessHaltedException: com.orientechnologies.orient.core.exception.OCommandExecutionException: Source vertex '#-1:-1' not exists
But the indexes exists
Name Type Class Properties Engine Actions
Atuacao.atu_id UNIQUE Atuacao [atu_id] SBTREE
dictionary DICTIONARY [undefined] SBTREE
Instituicao.ins_id UNIQUE Instituicao [ins_id] SBTREE
ORole.name UNIQUE ORole [name] SBTREE
OUser.name UNIQUE OUser [name] SBTREE
Pais.pai_id UNIQUE Pais [pai_id] SBTREE
Pesquisador.psq_id UNIQUE Pesquisador [psq_id] SBTREE
Producao.pro_id UNIQUE Producao [pro_id] SBTREE
Publicacao.pub_id UNIQUE Publicacao [pub_id] SBTREE
TipoPublicacao.tpu_id UNIQUE TipoPublicacao [tpu_id] SBTREE
Is this an Orientdb bug?
try this as your command:
"command": "create edge PUBLICOU from (SELECT expand(rid) FROM index:Pesquisador.psq_id WHERE key = ${input.psq_id_from}) to (SELECT expand(rid) FROM index:Producao.pro_id where key = ${input.pro_id_to})"
this should work because when you select from index the rid associated with the result record is in the property rid.
Or even better you can directly select from class instead of index:
create edge PUBLICOU from (SELECT FROM Pesquisador WHERE psq_id = ${input.psq_id_from}) to (SELECT FROM Producao where pro_id = ${input.pro_id_to})
in this way it uses indexes as well.
Ivan
I'm still playing with OrientDB.
Now I'm trying the schema functionalities, that look awesome :-)
I have two data files: joinA.txt and joinB.txt, which I used to populate a database with the following schema (the content of the two files is at the end of the post):
CREATE CLASS Employee EXTENDS V;
CREATE PROPERTY Employee.eid Integer;
CREATE PROPERTY Employee.name String;
CREATE PROPERTY Employee.eage Short;
CREATE INDEX Employee.eid unique_hash_index;
CREATE CLASS ExtendedProfile EXTENDS V;
CREATE CLASS XYZProfile EXTENDS ExtendedProfile;
CREATE PROPERTY XYZProfile.textual String;
-- SameAs can only connect Employees to ExtendedProfile
CREATE CLASS SameAs EXTENDS E; -- same employee across many tables
CREATE PROPERTY SameAs.out LINK ExtendedProfile;
CREATE PROPERTY SameAs.In LINK Employee;
The JSONs I gave to the ETL tool are, for JoinA:
{
"source": { "file": {"path": "the_path"}},
"extractor": {"csv": {
"separator": " ",
"columns": [
"eid:Integer",
"name:String",
"eage:Short"
]
}
},
"transformers": [
{"vertex": {"class": "Employee", "skipDuplicates": true}}
]
,"loader": {
"orientdb": {
"dbURL": "plocal:thepath",
"dbType": "graph",
"useLightweightEdges": false
}
}
}
and for JoinB:
{
"source": { "file": {"path": "thepath"}},
"extractor": {"csv": {
"separator": " ",
"columnsOnFirstLine": false,
"quote": "\"",
"columns": [
"id:String",
"textual:String"
]
}
},
"transformers": [
{"vertex": {"class": "XYZProfile", "skipDuplicates": true}},
{ "edge": { "class": "SameAs",
"direction": "out",
"joinFieldName": "id",
"lookup":"Employee.eid",
"unresolvedLinkAction":"ERROR"}},
],
"loader": {
"orientdb": {
"dbURL": "path",
"dbUser": "root",
"dbPassword": "pwd",
"dbType": "graph",
"useLightweightEdges": false}
}
}
Now, the problem is that when I run select expand(both()) from Employee I get the edges in the column out_SameAs, while when I run select expand(both()) from XYZProfile I get nothing.
This is weird since the first query told me that the #CLASS pointed by the edges is XYZProfile.
Does anybody know what's wrong with my example?
Cheers,
Alberto
JoinA:
1 A 10
2 B 14
3 C 22
JoinB:
1 i0
1 i1
2 i2
Check out your JSON file, I think there is an error on your JSON file. You forget to put [] at the beginning and ending of the JSON file.
It was actually my fault.
The line CREATE PROPERTY SameAs.In LINK Employee; was the problem: In should have been all lowercased, as pointed out here.
I have 3 CSV files to load in an OrientDB graph.
People
Product
Purchases
People.csv is like
person_id;name
1;francesco
2;luca
Product.csv is like
product_id;product_name
101;apple
102;banana
Purchases.csv is like
person_id;product_id;avg_price
1;101;$1.10
2;101;$1.08
1;102;$5.34
I load first all the people and the products with 2 different ETL jobs.
Each job loads vertices.
How can I load periodically just the edges using OrientdbETL, as people buy new products?
All the Transformers and particularly EDGE output OrientVertex, that can only be INSERTed by the LOADER step.
(The EDGE Transformer adds EDGE properties to the Vertex, but the actual action is an INSERT of the Vertex). Is there a way to update a Vertex using the ETL?
Rgds,
Francesco
An ETL json with these transformers should import the "Purchase" edges from purchases.csv and update the avg_price of each purchased product.
"transformers": [
{ "merge": { "joinFieldName": "product_id", "lookup": "Product.id" } },
{ "vertex": {"class": "Product", "skipDuplicates": true} },
{ "edge": { "class": "Purchase",
"joinFieldName": "person_id",
"lookup": "Person.id",
"direction": "in"
}
},
{ "field": { "fieldNames": ["person_id", "product_id"], "operation": "remove" } }
]
class and attribute names ("Product.id", "Person", etc) may be different based on your DB schema.