Using OrientDB ETL to load file of Edges

Using OrientDB ETL to load file of Edges - orientdb

I am running OrientDB 2.1.2 from the AWS Marketplace AMI. I have already used ETL to load up two sets of vertices. Now I'm trying to load up a file of Edges into OrientDB with ETL and getting: IllegalArgumentException: destination vertex is null. I've looked at the documentation and some other examples on the net and my ETL config looks correct to me. I was hoping someone might have an idea.
My two V subclasses are:
Author (authorId, authGivenName, authSurname) and an index on authorId
Abstract (abstractId) with an index on abstractId
My E subclass
Authored - no properties or indices defined on it
My Edge file
(authorId, abstractId) - \t separated fields with one header line with those names
My ETL config:
{
"config": { "log":"debug"},
"source" : { "file": { "path":"/root/poc1_Datasets/authAbstractEdge1.tsv" }},
"extractor":{ "row":{} },
"transformers":[
{ "csv":{ "separator": "\t" } },
{ "merge": {
"joinFieldName": "authorId",
"lookup":"Author.authorId"
} },
{ "vertex":{ "class":"Author" } },
{ "edge" : {
"class": "Authored",
"joinFieldName": "abstractId",
"lookup": "Abstract.abstractId",
"direction": "out"
}}
],
"loader":{
"orientdb":{
"dbURL":"remote:localhost/DataSpine1",
"dbType":"graph",
"wal":false,
"tx":false
} }
}
When I run ETL with this config and file I get:
OrientDB etl v.2.1.2 (build #BUILD#) www.orientdb.com
BEGIN ETL PROCESSOR
[file] DEBUG Reading from file /root/poc1_Datasets/authAbstractEdge1.tsv
[0:csv] DEBUG Transformer input: authorId abstractId
[0:csv] DEBUG parsing=authorId abstractId
[0:csv] DEBUG Transformer output: null
2016-06-09 12:15:04:088 WARNI Transformer [csv] returned null, skip rest of pipeline execution [OETLPipeline][1:csv] DEBUG Transformer input: 9-s2.0-10039026700 2-s2.0-29144536313
[1:csv] DEBUG parsing=9-s2.0-10039026700 2-s2.0-29144536313
[1:csv] DEBUG document={authorId:9-s2.0-10039026700,abstractId:2-s2.0-29144536313}
[1:csv] DEBUG Transformer output: {authorId:9-s2.0-10039026700,abstractId:2-s2.0-29144536313}
[1:merge] DEBUG Transformer input: {authorId:9-s2.0-10039026700,abstractId:2-s2.0-29144536313}
[1:merge] DEBUG joinValue=9-s2.0-10039026700, lookupResult=Author#12:10046021{authorId:9-s2.0-10039026700,authGivenName:M. A.,authSurname:Turovskaya,abstractId:2-s2.0-29144536313} v2
[1:merge] DEBUG merged record Author#12:10046021{authorId:9-s2.0-10039026700,authGivenName:M. A.,authSurname:Turovskaya,abstractId:2-s2.0-29144536313} v2 with found record={authorId:9-s2.0-10039026700,abstractId:2-s2.0-29144536313}
[1:merge] DEBUG Transformer output: Author#12:10046021{authorId:9-s2.0-10039026700,authGivenName:M. A.,authSurname:Turovskaya,abstractId:2-s2.0-29144536313} v2
[1:vertex] DEBUG Transformer input: Author#12:10046021{authorId:9-s2.0-10039026700,authGivenName:M. A.,authSurname:Turovskaya,abstractId:2-s2.0-29144536313} v2
[1:vertex] DEBUG Transformer output: v(Author)[#12:10046021]
[1:edge] DEBUG Transformer input: v(Author)[#12:10046021]
[1:edge] DEBUG joinCurrentValue=2-s2.0-29144536313, lookupResult=Abstract#13:16626366{abstractId:2-s2.0-29144536313} v1
Error in Pipeline execution: java.lang.IllegalArgumentException: destination vertex is null
java.lang.IllegalArgumentException: destination vertex is null
at com.tinkerpop.blueprints.impls.orient.OrientVertex.addEdge(OrientVertex.java:888)
at com.tinkerpop.blueprints.impls.orient.OrientVertex.addEdge(OrientVertex.java:832)
at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.createEdge(OEdgeTransformer.java:188)
at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:117)
at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:37)
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:114)
at com.orientechnologies.orient.etl.OETLProcessor.executeSequentially(OETLProcessor.java:487)
at com.orientechnologies.orient.etl.OETLProcessor.execute(OETLProcessor.java:291)
at com.orientechnologies.orient.etl.OETLProcessor.main(OETLProcessor.java:161)
ETL process halted: com.orientechnologies.orient.etl.OETLProcessHaltedException: java.lang.IllegalArgumentException: destination vertex is null
As I look at the debug, it appears that the MERGE successfully found the Author vertex and the EDGE found the Abstract Vertex successfully (based on seeing the RIDs in the output). I'm stumped as to why I'm getting the Exception. Thanks in advance for any pointers.

Have you already tried to see if with the new etl, teleporter, Version 2.2 solves this problem?
At this link there is description about new etl product.

I actually discovered that the ETL loader in OrientDB version 2.2.2 seems to have solved this issue. (Note: version 2.2.0 still had the same issue)

Related

org.apache.kafka.connect.errors.DataException: Invalid JSON for array default value: "null"

I am trying to use the confluent Kafka s3 connector using confluent-4.1.1.
s3-sink
"value.converter.schema.registry.url": "http://localhost:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
When I run Kafka connectors for the s3 sink, I get this error message:
ERROR WorkerSinkTask{id=singular-s3-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:172)
org.apache.kafka.connect.errors.DataException: Invalid JSON for array default value: "null"
at io.confluent.connect.avro.AvroData.defaultValueFromAvro(AvroData.java:1649)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1562)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1443)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1443)
at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1323)
at io.confluent.connect.avro.AvroData.toConnectData(AvroData.java:1047)
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:468)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
My Schema contains only 1 array type field and its schema is like this
{"name":"item_id","type":{"type":"array","items":["null","string"]},"default":[]}
I am able to see the deserialized message using the kafka-avro-console-consumer command. I have seen a similar question but in his case, he was using Avro serializer for key also.
./confluent-4.1.1/bin/kafka-avro-console-consumer --topic singular_custom_postback --bootstrap-server localhost:9092 -max-messages 2
"item_id":[{"string":"15552"},{"string":"37810"},{"string":"38061"}]
"item_id":[]
I cannot put the entire output I get from the console consumer as it contains sensitive user information, so I have added the only array type field in my schema.
Thanks in advance.

The io.confluent.connect.avro.AvroData.defaultValueFromAvro(AvroData.java:1649) is called for the conversion of avro schema of the message you read to the connect sink's internal schema. I believe it is not related to the data of your message. That is why the AbstractKafkaAvroDeserializer can successfully deserialise your message (e.g. via kafka-avro-console-consumer), as your message is a valid avro message. The above exception may occur if your default value is null, while null is not a valid value of your field. E.g.
{
"name":"item_id",
"type":{
"type":"array",
"items":[
"string"
]
},
"default": null
}
I would propose you to remotely debug connect and see what exactly is failing.

Same problem as the question that you have linked to.
In the source code, you can see this condition.
case ARRAY: {
if (!jsonValue.isArray()) {
throw new DataException("Invalid JSON for array default value: " + jsonValue.toString());
}
And the exception can be thrown when the schema type is defined in your case as type:"array", but the payload itself has a null value (or any other value type) rather than actually an array, despite what you have defined as your schema default value. The default is only applied when the items element isn't there at all, not when "items":null
Other than that, I would suggest a schema like so, i.e. a record object, not just a named array, with a default of an empty array, not null.
{
"type" : "record",
"name" : "Items",
"namespace" : "com.example.avro",
"fields" : [ {
"name" : "item_id",
"type" : {
"type" : "array",
"items" : [ "null", "string" ]
},
"default": []
} ]
}

StreamEnabled table property causes Serverless failure

I'm using Serverless to deploy my AWS cloudformation stack. On one of my tables, I enable streams via "StreamEnabled": true. When this is enabled, I get an error on deployment: Encountered unsupported property StreamEnabled.
If I remove the property, I get a validation exception: ValidationException: Stream StreamEnabled was null.
I found a git issue that was addressed and apparently fixed (here), but after upgrading to v1.3, I'm still getting the same errors on deployment.
Can anyone lend insight as to what the issue may be?

It is enabled by default. You can check it from shell:
aws dynamodbstreams list-streams
{
"Streams": [
{
"TableName": "MyTableName-dev",
"StreamArn": "arn:aws:dynamodb:eu-west-2:0000000000000:table/MyTableName-dev/stream/2018-10-26T15:06:25.995",
"StreamLabel": "2018-10-26T15:06:25.995"
}
]
}
And:
aws dynamodbstreams describe-stream --stream-arn "arn:aws:dynamodb:eu-west-2:00000000000:table/MyTableName-dev/stream/2018-10-26T15:06:25.995"
{
"StreamDescription": {
"StreamLabel": "2018-10-26T15:06:25.995",
"StreamStatus": "ENABLED",
"TableName": "MyTableName-dev",
"Shards": [
{
"ShardId": "shardId-000000000000000-0000000f",
"SequenceNumberRange": {
"StartingSequenceNumber": "00000000000000000000000"
}
}
],
"CreationRequestDateTime": 1540566385.987,
"StreamArn": "arn:aws:dynamodb:eu-west-2:0000000000000000:table/MyTableName-dev/stream/2018-10-26T15:06:25.995",
"KeySchema": [
{
"KeyType": "HASH",
"AttributeName": "application_id"
}
],
"StreamViewType": "KEYS_ONLY"
}
}
It is not a solution, but found that fact I realized that I don't have an issue.

error in accessing tables from hive using apache drill

I am trying to read the data from my table abc which is in hive using Drill. For that i have created hive storage plugin with the configuration mentioned below
{
"type": "hive",
"enabled": true,
"configProps": {
"hive.metastore.uris": "thrift://<ip>:<port>",
"fs.default.name": "hdfs://<ip>:<port>/",
"hive.metastore.sasl.enabled": "false",
"hive.server2.enable.doAs": "true",
"hive.metastore.execute.setugi": "true"
}
}
with this i am able to see the databases in hive, but when i try to access any table in the particular database
select * from hive.db.abc;
it throws the following error
org.apache.drill.common.exceptions.UserRemoteException: VALIDATION
ERROR: From line 1, column 15 to line 1, column 18: Object 'abc' not
found within 'hive.db' SQL Query null [Error Id:
b6c56276-6255-4b5b-a600-746dbc2f3d67 on centos2.example.com:31010]
(org.apache.calcite.runtime.CalciteContextException) From line 1,
column 15 to line 1, column 18: Object 'abc' not found within
'hive.db' sun.reflect.NativeConstructorAccessorImpl.newInstance0():-2
sun.reflect.NativeConstructorAccessorImpl.newInstance():62
sun.reflect.DelegatingConstructorAccessorImpl.newInstance():45
java.lang.reflect.Constructor.newInstance():423
org.apache.calcite.runtime.Resources$ExInstWithCause.ex():463
org.apache.calcite.sql.SqlUtil.newContextException():800
org.apache.calcite.sql.SqlUtil.newContextException():788
org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError():4703
org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl():127
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():177
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():947
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():928
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2972
org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom():267
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2957
org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom():267
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3216
org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():947
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():928
org.apache.calcite.sql.SqlSelect.validate():226
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():903
org.apache.calcite.sql.validate.SqlValidatorImpl.validate():613
org.apache.drill.exec.planner.sql.SqlConverter.validate():190
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():630
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():202
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():174
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():146
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():84
org.apache.drill.exec.work.foreman.Foreman.runSQL():567
org.apache.drill.exec.work.foreman.Foreman.run():264
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748 Caused By
(org.apache.calcite.sql.validate.SqlValidatorException) Object 'abc'
not found within 'hive.db'
sun.reflect.NativeConstructorAccessorImpl.newInstance0():-2
sun.reflect.NativeConstructorAccessorImpl.newInstance():62
sun.reflect.DelegatingConstructorAccessorImpl.newInstance():45
java.lang.reflect.Constructor.newInstance():423
org.apache.calcite.runtime.Resources$ExInstWithCause.ex():463
org.apache.calcite.runtime.Resources$ExInst.ex():572
org.apache.calcite.sql.SqlUtil.newContextException():800
org.apache.calcite.sql.SqlUtil.newContextException():788
org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError():4703
org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl():127
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():177
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():947
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():928
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2972
org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom():267
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2957
org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom():267
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3216
org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():947
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():928
org.apache.calcite.sql.SqlSelect.validate():226
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():903
org.apache.calcite.sql.validate.SqlValidatorImpl.validate():613
org.apache.drill.exec.planner.sql.SqlConverter.validate():190
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():630
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():202
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():174
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():146
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():84
org.apache.drill.exec.work.foreman.Foreman.runSQL():567
org.apache.drill.exec.work.foreman.Foreman.run():264
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748

You should upgrade to the newer Hive version. For Drill 1.13 it is Hive 2.3.2 version? Starting from Drill-1.13 Drill leverages 2.3.2 version of Hive client [1].
Supporting of Hive 3.0 version is upcoming [2].
Also please follow the following guide with necessary Hive plugin configurations for your environment [3]. You could omit "hive.metastore.sasl.enabled", "hive.server2.enable.doAs" and "hive.metastore.execute.setugi" properties, since you have specified the default values [4]. Regarding "hive.metastore.uris" and "fs.default.name" you should specify the same values for them as in your hive-site.xml.
[1] https://drill.apache.org/docs/hive-storage-plugin
[2] https://issues.apache.org/jira/browse/DRILL-6604
[3] https://drill.apache.org/docs/hive-storage-plugin/#hive-remote-metastore-configuration
[4] https://github.com/apache/hive/blob/rel/release-2.3.2/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L824

How to properly use context variables in OrientDB ETL configuration file?

Summary
Trying to learn about OrientDB ETL configuration json file.
Assuming a CSV file where:
each row is a single vertex
a 'class' column gives the intended class of the vertex
there are multiple classes for the vertices (Foo, Bar, Baz)
How do I set the class of the vertex to be the value of the 'class' column?
Efforts to Troubleshoot
I have spent a LOT of time in the OrientDB ETL documentation trying to solve this. I have tried many different combinations of let and block and code components. I have tried variable names like className and $className and ${classname}.
Current Results:
The code component is able to correctly print the value of `className', so I know that it is being set correctly.
The vertex component isn't referencing the variable correctly, and consequently sets the class of each vertex to null.
Context
I have a freshly created database (PLOCAL GRAPH) on localhost called 'deleteme'.
I have an vertex CSV file (nodes.csv) that looks like this:
id,name,class
1,Jack,Foo
2,Jill,Bar
3,Gephri,Baz
And an ETL configuration file (test.json) that looks like this:
{
"config": {
"log": "DEBUG"
},
"source": {"file": {"path": "nodes.csv"}},
"extractor": {"csv": {}},
"transformers": [
{"block": {"let": {"name": "$className",
"value": "$input.class"}}},
{"code": {"language": "Javascript",
"code": "print(className + '\\n'); input;"}},
{"vertex": {"class": "$className"}}
],
"loader": {
"orientdb": {
"dbURL": "remote:localhost:2424/deleteme",
"dbUser": "admin",
"dbPassword": "admin",
"dbType": "graph",
"tx": false,
"wal": false,
"batchCommit": 1000,
"classes": [
{"name": "Foo", "extends": "V"},
{"name": "Bar", "extends": "V"},
{"name": "Baz", "extends": "V"}
]
}
}
}
And when I run the ETL job, I have output that looks like this:
aj#host:~/bin/orientdb-community-2.1.13/bin$ ./oetl.sh test.json
OrientDB etl v.2.1.13 (build 2.1.x#r9bc1a54a4a62c4de555fc5360357f446f8d2bc84; 2016-03-14 17:00:05+0000) www.orientdb.com
BEGIN ETL PROCESSOR
[file] INFO Reading from file nodes.csv with encoding UTF-8
[orientdb] DEBUG - OrientDBLoader: created vertex class 'Foo' extends 'V'
[orientdb] DEBUG orientdb: found 0 vertices in class 'null'
+ extracted 0 rows (0 rows/sec) - 0 rows -> loaded 0 vertices (0 vertices/sec) Total time: 1001ms [0 warnings, 0 errors]
[orientdb] DEBUG - OrientDBLoader: created vertex class 'Bar' extends 'V'
[orientdb] DEBUG orientdb: found 0 vertices in class 'null'
[orientdb] DEBUG - OrientDBLoader: created vertex class 'Baz' extends 'V'
[orientdb] DEBUG orientdb: found 0 vertices in class 'null'
[csv] DEBUG document={id:1,class:Foo,name:Jack}
[1:block] DEBUG Transformer input: {id:1,class:Foo,name:Jack}
[1:block] DEBUG Transformer output: {id:1,class:Foo,name:Jack}
[1:code] DEBUG Transformer input: {id:1,class:Foo,name:Jack}
Foo
[1:code] DEBUG executed code=OCommandExecutorScript [text=print(className); input;], result={id:1,class:Foo,name:Jack}
[1:code] DEBUG Transformer output: {id:1,class:Foo,name:Jack}
[1:vertex] DEBUG Transformer input: {id:1,class:Foo,name:Jack}
[1:vertex] DEBUG Transformer output: v(null)[#3:0]
[csv] DEBUG document={id:2,class:Bar,name:Jill}
[2:block] DEBUG Transformer input: {id:2,class:Bar,name:Jill}
[2:block] DEBUG Transformer output: {id:2,class:Bar,name:Jill}
[2:code] DEBUG Transformer input: {id:2,class:Bar,name:Jill}
Bar
[2:code] DEBUG executed code=OCommandExecutorScript [text=print(className); input;], result={id:2,class:Bar,name:Jill}
[2:code] DEBUG Transformer output: {id:2,class:Bar,name:Jill}
[2:vertex] DEBUG Transformer input: {id:2,class:Bar,name:Jill}
[2:vertex] DEBUG Transformer output: v(null)[#3:1]
[csv] DEBUG document={id:3,class:Baz,name:Gephri}
[3:block] DEBUG Transformer input: {id:3,class:Baz,name:Gephri}
[3:block] DEBUG Transformer output: {id:3,class:Baz,name:Gephri}
[3:code] DEBUG Transformer input: {id:3,class:Baz,name:Gephri}
Baz
[3:code] DEBUG executed code=OCommandExecutorScript [text=print(className); input;], result={id:3,class:Baz,name:Gephri}
[3:code] DEBUG Transformer output: {id:3,class:Baz,name:Gephri}
[3:vertex] DEBUG Transformer input: {id:3,class:Baz,name:Gephri}
[3:vertex] DEBUG Transformer output: v(null)[#3:2]
END ETL PROCESSOR
+ extracted 3 rows (4 rows/sec) - 3 rows -> loaded 3 vertices (4 vertices/sec) Total time: 1684ms [0 warnings, 0 errors]
Oh, and what does DEBUG orientdb: found 0 vertices in class 'null' mean?

Try this. I wrestled with this for awhile too, but the below setup worked for me.
Note that setting #class before the vertex transformer will initialize a Vertex with the proper class.
"transformers": [
{"block": {"let": {"name": "$className",
"value": "$input.class"}}},
{"code": {"language": "Javascript",
"code": "print(className + '\\n'); input;"}},
{ "field": {
"fieldName": "#class",
"expression": "$className"
}
},
{"vertex": {}}
]

To get your result, you could use "ETL" to import data from csv into a CLASS named "Generic".
Through an JS function, "separateClass ()", create new classes taking the name from the property 'Class' imported from csv, and put vertices from class Generic to new classes.
File json:
{
"source": { "file": {"path": "data.csv"}},
"extractor": { "row": {}},
"begin": [
{ "let": { "name": "$className", "value": "Generic"} }
],
"transformers": [
{"csv": {
"separator": ",",
"nullValue": "NULL",
"columnsOnFirstLine": true,
"columns": [
"id:Integer",
"name:String",
"class:String"
]
}
},
{"vertex": {"class": "$className", "skipDuplicates": true}}
],
"loader": {
"orientdb": {
"dbURL": "remote:localhost/test",
"dbType": "graph"
}
}
}
After importing the data from etl, in javascript creates the function
var g = orient.getGraphNoTx();
var queryResult= g.command("sql", "SELECT FROM Generic");
//example filed vertex: ID, NAME, CLASS
if (!queryResult.length) {
print("Empty");
} else {
//for each value create or insert in class
for (var i = 0; i < queryResult.length; i++) {
var className = queryResult[i].getProperty("class").toString();
//chech is className is already created
var countClass = g.command("sql","select from V where #class = '"+className+"'");
if (!countClass.length) {
g.command("sql","CREATE CLASS "+className+" extends V");
g.command("sql"," CREATE PROPERTY "+className+".id INTEGER");
g.command("sql"," CREATE PROPERTY "+className+".name STRING");
g.commit();
}
var id = queryResult[i].getProperty("id").toString();
var name = queryResult[i].getProperty("name").toString();
g.command("sql","INSERT INTO "+className+ " (id, name) VALUES ("+id+",'"+name+"')");
g.commit();
}
//remove class generic
g.command("sql","truncate class Generic unsafe");
}
the result should be like the one shown in the picture.

Neo4j: Change legacy index from exact to fulltext

In my Neo4j (2.1.1 Community Edition) database I have Lucene legacy index in place called node_auto_index:
GET http://localhost:7474/db/data/index/node/
{
"node_auto_index": {
"template": "http://localhost:7474/db/data/index/node/node_auto_index/{key}/{value}",
"provider": "lucene",
"type": "exact"
}
}
Now I would like to change the type from "exact" to "fulltext". How can I do that using REST? I tried the following approaches but neither of them worked:
DELETE and recreate
I tried to delete it first before recreating as "fulltext", but it is read-only:
DELETE http://localhost:7474/db/data/index/node/node_auto_index/node_auto_index
{
"message": "read only index",
"exception": "UnsupportedOperationException",
"fullname": "java.lang.UnsupportedOperationException",
"stacktrace": [
"org.neo4j.kernel.impl.coreapi.AbstractAutoIndexerImpl$ReadOnlyIndexToIndexAdapter.readOnlyIndex(AbstractAutoIndexerImpl.java:254)",
"org.neo4j.kernel.impl.coreapi.AbstractAutoIndexerImpl$ReadOnlyIndexToIndexAdapter.delete(AbstractAutoIndexerImpl.java:290)",
"org.neo4j.server.rest.web.DatabaseActions.removeNodeIndex(DatabaseActions.java:437)",
"org.neo4j.server.rest.web.RestfulGraphDatabase.deleteNodeIndex(RestfulGraphDatabase.java:935)",
"java.lang.reflect.Method.invoke(Unknown Source)",
"org.neo4j.server.rest.transactional.TransactionalRequestDispatcher.dispatch(TransactionalRequestDispatcher.java:139)",
"java.lang.Thread.run(Unknown Source)"
]
}
POST to replace
POST http://localhost:7474/db/data/index/node/
{
"name" : "node_auto_index",
"config" : {
"to_lower_case" : "true",
"type" : "fulltext",
"provider" : "lucene"
}
}
 
{
"message": "Supplied index configuration:\n{to_lower_case=true, type=fulltext, provider=lucene}\ndoesn't match stored config in a valid way:\n{provider=lucene, type=exact}\nfor 'node_auto_index'",
"exception": "IllegalArgumentException",
"fullname": "java.lang.IllegalArgumentException",
"stacktrace": [
"org.neo4j.kernel.impl.coreapi.IndexManagerImpl.assertConfigMatches(IndexManagerImpl.java:168)",
"org.neo4j.kernel.impl.coreapi.IndexManagerImpl.findIndexConfig(IndexManagerImpl.java:149)",
"org.neo4j.kernel.impl.coreapi.IndexManagerImpl.getOrCreateIndexConfig(IndexManagerImpl.java:209)",
"org.neo4j.kernel.impl.coreapi.IndexManagerImpl.getOrCreateNodeIndex(IndexManagerImpl.java:314)",
"org.neo4j.kernel.impl.coreapi.IndexManagerImpl.forNodes(IndexManagerImpl.java:302)",
"org.neo4j.server.rest.web.DatabaseActions.createNodeIndex(DatabaseActions.java:398)",
"org.neo4j.server.rest.web.RestfulGraphDatabase.jsonCreateNodeIndex(RestfulGraphDatabase.java:830)",
"java.lang.reflect.Method.invoke(Unknown Source)",
"org.neo4j.server.rest.transactional.TransactionalRequestDispatcher.dispatch(TransactionalRequestDispatcher.java:139)",
"java.lang.Thread.run(Unknown Source)"
]
}

for all the future readers of this question.
I faced the similar situation and found a much cleaner approach to fix this. Instead of deleting the node_auto_index, try the following steps.
open the shell / command line for the db:
- neo4j-sh (0)$ index --get-config node_auto_index
- ==> {
- ==> "provider": "lucene",
- ==> "type": "exact"
- ==> }
- neo4j-sh (0)$ index --set-config node_auto_index type fulltext
- ==> INDEX CONFIGURATION CHANGED, INDEX DATA MAY BE INVALID
- neo4j-sh (0)$ index --get-config node_auto_index
- ==> {
- ==> "provider": "lucene",
- ==> "type": "fulltext"
- ==> }
- neo4j-sh (0)$
Worked perfectly fine for me . Hope this helps someone in need :-)

Neo4j does not allow deleting the auto indexes node_auto_index and relationship_auto_index, nor by REST nor by any other API.
However there's a dirty trick to do the job. This trick will delete all auto and other legacy indexes. It does not touch the schema indexes. Be warned, it's a potentially dangerous operation, so make sure you have a valid backup in place. Stop the database and then do a
rm -rf data/graph.db/index*
Restart the database and all auto and legacy indexes are gone.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Using OrientDB ETL to load file of Edges - orientdb

Have you already tried to see if with the new etl, teleporter, Version 2.2 solves this problem? At this link there is description about new etl product.

I actually discovered that the ETL loader in OrientDB version 2.2.2 seems to have solved this issue. (Note: version 2.2.0 still had the same issue)

Related

org.apache.kafka.connect.errors.DataException: Invalid JSON for array default value: "null"

StreamEnabled table property causes Serverless failure

error in accessing tables from hive using apache drill

How to properly use context variables in OrientDB ETL configuration file?

Neo4j: Change legacy index from exact to fulltext

Categories

Resources