In my database CU242176 DBMS OrientDB version 2.0.7 is a table M_PERM:
PERM_DESC:string;
PERM_ID:integer not null;
PERM_NAME:string.
In my database CU242176 DBMS DB2 version 9.1 is a table M_PERM of the same structure. In this table 14 rows. With module Orientdb-ETL I did import the data. No errors, but there is no data in the table. While the table is created index on PERM_ID.
Here is my config:
{
"config":{
"log": "debug"
},
"extractor" : {
"jdbc":
{ "driver": "com.ibm.db2.jcc.DB2Driver",
"url": "jdbc:db2://ITS-C:50000/CU242176",
"userName": "metr",
"userPassword": "metr1",
"query": "select PERM_DESC,PERM_ID,PERM_NAME from METR.M_PERM"
}
},
"transformers":[
],
"loader" : {
"orientdb": {
"dbURL": "plocal:c:/Program Files/orientdb-community-2.0.7/databases/CU242176",
"dbUser": "admin",
"dbPassword": "admin",
"dbAutoCreate": false,
"standardElementConstraints": false,
"tx":true,
"wal":false,
"batchCommit":1000,
"dbType": "document",
"classes":[{"name": "M_PERM"}],
"indexes": [{"class":"M_PERM", "fields":["PERM_ID:integer"], "type":"UNIQUE" }]
}
}
}
Log executed command (oetl config_Import_M_PERM_JDBC.json):
OrientDB etl v.2.0.7 (build #BUILD#) www.orientechnologies.com
[orientdb] DEBUG Opening database 'plocal:c:/Program Files/orientdb-community-2.0.7/databases/
CU242176'...
2015-04-29 14:39:34:562 WARNING {db=CU242176} segment file 'database.ocf' was not closed corre
ctly last time [OSingleFileSegment]BEGIN ETL PROCESSOR
[orientdb] DEBUG orientdb: found 0 documents in class 'null'
END ETL PROCESSOR
extracted 29 records (0 records/sec) - 29 records -> loaded 14 documents (0 documents/sec) T otal time: 159ms [0 warnings, 0 errors]
How do I resolve this issue? For 14 lines loaded into my table.
Instead of:
"classes": [{"name": "M_PERM"}],
use:
"class": "M_PERM"
I can't see this documented anywhere but it worked for me.
Related
I want to synchronize 3 tables from a postgresql database to a self hosted elasticsearch and to do so, I use PGSync.
To build this stack, I followed this tutorial.
When I start the docker containers everything works well (execpt some errors in pgsync but its normal, the tables don't exist yet), after that, I restore my database from a dump (each tables has 30 000, 9 000 000 and 13 000 000 lines approximately). After the dump pgsync detects the new lines in the database and sync them in elasticsearch.
My problem is that after that first synchronisation, PGSync detects new lines:
Polling db cardpricetrackerprod: 61 item(s)
Polling db cardpricetrackerprod: 61 item(s)
but the synchronisation isn't made.
Here is what my schema looks like:
[
{
"database": "mydb",
"index": "elastic-index-first-table",
"nodes": {
"table": "first_table",
"schema": "public",
"columns": [
"id",
...
]
}
},
{
"database": "mydb",
"index": "elastic-index-second-table",
"nodes": {
"table": "second_table",
"schema": "public",
"columns": [
"id",
...
]
}
},
{
"database": "mydb",
"index": "elastic-index-third-table",
"nodes": {
"table": "third_table",
"schema": "public",
"columns": [
"id",
...
]
}
}
]
Have I missed a configuration step?
My goal is importing 25M edges in the graph which has about 50M vertices. Target time:
The current speed of importing is ~150 edges/sec. Speed on remote connection was about 100 edges/sec.
extracted 20,694,336 rows (171 rows/sec) - 20,694,336 rows -> loaded 20,691,830 vertices (171 vertices/sec) Total time: 35989762ms [0
warnings, 4 errors]
extracted 20,694,558 rows (156 rows/sec) - 20,694,558 rows -> loaded 20,692,053 vertices (156 vertices/sec) Total time: 35991185ms [0
warnings, 4 errors]
extracted 20,694,745 rows (147 rows/sec) - 20,694,746 rows -> loaded 20,692,240 vertices (147 vertices/sec) Total time: 35992453ms [0
warnings, 4 errors]
extracted 20,694,973 rows (163 rows/sec) - 20,694,973 rows -> loaded 20,692,467 vertices (162 vertices/sec) Total time: 35993851ms [0
warnings, 4 errors]
extracted 20,695,179 rows (145 rows/sec) - 20,695,179 rows -> loaded 20,692,673 vertices (145 vertices/sec) Total time: 35995262ms [0
warnings, 4 errors]
I tried to enable parallel in etl config, but looks like it is completely broken in Orient 2.2.12 (Inconsistency with multi-threading changes in 2.1?) and gives me nothing but 4 errors in the log above. Dumb parallel mode (running 2+ ETL processes) also impossible for plocal connection.
My config:
{
"config": {
"log": "info",
"parallel": true
},
"source": {
"input": {}
},
"extractor": {
"row": {
"multiLine": false
}
},
"transformers": [
{
"code": {
"language": "Javascript",
"code": "(new com.orientechnologies.orient.core.record.impl.ODocument()).fromJSON(input);"
}
},
{
"merge": {
"joinFieldName": "_ref",
"lookup": "Company._ref"
}
},
{
"vertex": {
"class": "Company",
"skipDuplicates": true
}
},
{
"edge": {
"joinFieldName": "with_id",
"lookup": "Person._ref",
"direction": "in",
"class": "Stakeholder",
"edgeFields": {
"_ref": "${input._ref}",
"value_of_share": "${input.value_of_share}"
},
"skipDuplicates": true,
"unresolvedLinkAction": "ERROR"
}
},
{
"field": {
"fieldNames": [
"with_id",
"with_to",
"_type",
"value_of_share"
],
"operation": "remove"
}
}
],
"loader": {
"orientdb": {
"dbURL": "plocal:/mnt/disks/orientdb/orientdb-2.2.12/databases/df",
"dbUser": "admin",
"dbPassword": "admin",
"dbAutoDropIfExists": false,
"dbAutoCreate": false,
"standardElementConstraints": false,
"tx": false,
"wal": false,
"batchCommit": 1000,
"dbType": "graph",
"classes": [
{
"name": "Company",
"extends": "V"
},
{
"name": "Person",
"extends": "V"
},
{
"name": "Stakeholder",
"extends": "E"
}
]
}
}
}
Data sample:
{"_ref":"1072308006473","with_to":"person","with_id":"010703814320","_type":"is.stakeholder","value_of_share":10000.0} {"_ref":"1075837000095","with_to":"person","with_id":"583600656732","_type":"is.stakeholder","value_of_share":15925.0} {"_ref":"1075837000095","with_to":"person","with_id":"583600851010","_type":"is.stakeholder","value_of_share":33150.0}
Server's specs are: instance on Google Cloud, PD-SSD, 6CPU, 18GB RAM.
Btw, on the same server I managed to get ~3k/sec on importing vertices using remote connection (it is still too slow, but acceptable for my current dataset).
And the question: is it any reliable way to increase speed of importing to let's say 10k inserts per second, or at least 5k? I wouldn't like to turn off indexes, it is still millions of records, not billions.
UPDATE
After few hours the performance continue to deteriorate.
extracted 23,146,912 rows (56 rows/sec) - 23,146,912 rows -> loaded 23,144,406 vertices (56 vertices/sec) Total time: 60886967ms [0
warnings, 4 errors]
extracted 23,146,981 rows (69 rows/sec) - 23,146,981 rows -> loaded 23,144,475 vertices (69 vertices/sec) Total time: 60887967ms [0
warnings, 4 errors]
extracted 23,147,075 rows (39 rows/sec) - 23,147,075 rows -> loaded 23,144,570 vertices (39 vertices/sec) Total time: 60890356ms [0
warnings, 4 errors]
I trie import and edge by oetl with orientdb 2.2.4/2.2.5/2.2.6. In all versions the error is the same. If I use version 2.1 the error doesn't occurs.
My json file is
{
"config": {
"log": "info",
"parallel": false
},
"source": {
"file": {
"path": "/opt/orientdb/csvs_1milhao/metodo03/a10a.csv"
}
},
"extractor": {
"row": {
}
},
"transformers": [{
"csv": {
"separator": ",",
"columnsOnFirstLine": true,
"columns": ["psq_id_from:integer",
"pro_id_to:integer",
"ordem:integer"]
}
},
{
"command": {
"command": "create edge PUBLICOU from (SELECT FROM index:Pesquisador.psq_id WHERE key = ${input.psq_id_from}) to (SELECT FROM index:Producao.pro_id where key = ${input.pro_id_to})",
"output": "edge"
}
}],
"loader": {
"orientdb": {
"dbURL": "remote:localhost/dbUmMilhaoM03",
"dbUser": "admin",
"dbPassword": "admin",
"dbURL": "remote:localhost/dbUmMilhaoM03",
"dbType": "graph",
"standardElementConstraints": false,
"batchCommit": 1000,
"classes": [{
"name": "PUBLICOU",
"extends": "E"
}]
}
}
}
When I execute the oetl command, the result is:
root#teste:/opt/orientdb_226/bin# ./oetl.sh /opt/orientdb_226/scripts_orientdb/Db1Milhao/metodo03/a10a_psq_publicou_pro.json >> log_m03
Exception in thread "main" com.orientechnologies.orient.core.exception.OConfigurationException: Error on creating ETL processor
at com.orientechnologies.orient.etl.OETLProcessor.parse(OETLProcessor.java:225)
at com.orientechnologies.orient.etl.OETLProcessor.parse(OETLProcessor.java:176)
at com.orientechnologies.orient.etl.OETLProcessor.parseConfigAndParameters(OETLProcessor.java:144)
at com.orientechnologies.orient.etl.OETLProcessor.main(OETLProcessor.java:108)
Caused by: com.orientechnologies.orient.etl.loader.OLoaderException: unable to manage remote db without server admin credentials
at com.orientechnologies.orient.etl.loader.OOrientDBLoader.manageRemoteDatabase(OOrientDBLoader.java:447)
at com.orientechnologies.orient.etl.loader.OOrientDBLoader.configure(OOrientDBLoader.java:391)
at com.orientechnologies.orient.etl.OETLProcessor.configureComponent(OETLProcessor.java:448)
at com.orientechnologies.orient.etl.OETLProcessor.configureLoader(OETLProcessor.java:262)
at com.orientechnologies.orient.etl.OETLProcessor.parse(OETLProcessor.java:209)
... 3 more
When I execute with OrientDb 2.1 the result is:
Exception in thread "main" com.orientechnologies.orient.etl.OETLProcessHaltedException: com.orientechnologies.orient.core.exception.OCommandExecutionException: Source vertex '#-1:-1' not exists
But the indexes exists
Name Type Class Properties Engine Actions
Atuacao.atu_id UNIQUE Atuacao [atu_id] SBTREE
dictionary DICTIONARY [undefined] SBTREE
Instituicao.ins_id UNIQUE Instituicao [ins_id] SBTREE
ORole.name UNIQUE ORole [name] SBTREE
OUser.name UNIQUE OUser [name] SBTREE
Pais.pai_id UNIQUE Pais [pai_id] SBTREE
Pesquisador.psq_id UNIQUE Pesquisador [psq_id] SBTREE
Producao.pro_id UNIQUE Producao [pro_id] SBTREE
Publicacao.pub_id UNIQUE Publicacao [pub_id] SBTREE
TipoPublicacao.tpu_id UNIQUE TipoPublicacao [tpu_id] SBTREE
Is this an Orientdb bug?
try this as your command:
"command": "create edge PUBLICOU from (SELECT expand(rid) FROM index:Pesquisador.psq_id WHERE key = ${input.psq_id_from}) to (SELECT expand(rid) FROM index:Producao.pro_id where key = ${input.pro_id_to})"
this should work because when you select from index the rid associated with the result record is in the property rid.
Or even better you can directly select from class instead of index:
create edge PUBLICOU from (SELECT FROM Pesquisador WHERE psq_id = ${input.psq_id_from}) to (SELECT FROM Producao where pro_id = ${input.pro_id_to})
in this way it uses indexes as well.
Ivan
I have installed OrientDB V 2.1.11 on a Mac - El Capitan.
I am following the instructions as per the OrientDB documentation.
http://orientdb.com/docs/last/Import-from-DBMS.html
When I run the oetl.sh I get a null pointer exception. I assume it is connecting to the Oracle instance.
Json config:
{
"config": {
"log": "error"
},
"extractor" : {
"jdbc": { "driver": "oracle.jdbc.OracleDriver",
"url": "jdbc:oracle:thin:#<dbUrl>:1521:<dbSid>",
"userName": "username",
"userPassword": "password",
"query": "select sold_to_party_nbr from customer" }
},
"transformers" : [
{ "vertex": { "class": "Company"} }
],
"loader" : {
"orientdb": {
"dbURL": "plocal:../databases/BetterDemo",
"dbUser": "admin",
"dbPassword": "admin",
"dbAutoCreate": true
}
}
}
Error:
sharon.oconnor$ ./oetl.sh ../loadFromOracle.json
OrientDB etl v.2.1.11 (build UNKNOWN#rddb5c0b4761473ae9549c3ac94871ab56ef5af2c; 2016-02-15 10:49:20+0000) www.orientdb.com
Exception in thread "main" java.lang.NullPointerException
at com.orientechnologies.orient.etl.transformer.OVertexTransformer.begin(OVertexTransformer.java:53)
at com.orientechnologies.orient.etl.OETLPipeline.begin(OETLPipeline.java:72)
at com.orientechnologies.orient.etl.OETLProcessor.executeSequentially(OETLProcessor.java:465)
at com.orientechnologies.orient.etl.OETLProcessor.execute(OETLProcessor.java:269)
at com.orientechnologies.orient.etl.OETLProcessor.main(OETLProcessor.java:116)
The data in Oracle looks like this:
0000281305
0000281362
0000281378
0000281381
0000281519
0000281524
0000281563
0000281566
0000281579
0000281582
0000281623
0000281633
I have created a Company class with a sold_to_party_nbr string property in the BetterDemo database.
How can I degbug further to figure out what is wrong?
I created ES (with MongoDB river plugin) index with folowing information:
{
"type": "mongodb",
"mongodb": {
"db": "mydatabase",
"collection": "Users"
},
"index": {
"name": "users",
"type": "user"
}
}
When I insert simple object like:
{
"name": "Joe",
"surname": "Black"
}
Everything work without problem (I can see data using ES Head web interface).
But when I insert bigger object, it doesn't index it:
{
"object": {
"text": "Let's do it again!",
"boolTest": false
},
"type": "coolType",
"tags": [
""
],
"subObject1": {
"count": 0,
"last3": [],
"array": []
},
"subObject2": {
"count": 0,
"last3": [],
"array": []
},
"subObject3": {
"count": 0,
"last3": [],
"array": []
},
"usrID": "5141a5a4d8f3a79c09000001",
"created": Date(1363527664000),
"lastUpdate": Date(1363527664000)
}
Where can be problem please?
Thank you for your help!
EDIT: This is error from ES console:
org.elasticsearch.index.mapper.MapperParsingException: object mapping
for [stream] tried to parse as object, but got EOF, has a concrete
value been provided to it? at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:457)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:486)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:430)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:318)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:157)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:431)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722) [2013-03-20
10:35:05,697][WARN
][org.elasticsearch.river.mongodb.MongoDBRiver$Indexer] failed to
executefailure in bulk execution: [0]: index [stream], type [stream],
id [514982c9b7f3bfbdb488ca81], message [MapperParsingException[object
mapping for [stream] tried to parse as object, but got EOF, has a
concrete value been provided to it?]] [2013-03-20 10:35:05,698][INFO
][org.elasticsearch.river.mongodb.MongoDBRiver$Indexer] Indexed 1
documents, 1 insertions 0, updates, 0 deletions, 0 documents per
second
Which version of MongoDB river are you using?
Please look at issue #26 [1]. It contains examples on indexing large json documents with no issue.
If you can still reproduce the issue please provide more details: river settings, mongodb (version, specific settings), elasticsearch (version, specific settings).
https://github.com/richardwilly98/elasticsearch-river-mongodb/issues/26