Only data from node 1 visible in a 2 node OrientDB cluster - orientdb

I created a 2-node OrientDB cluster by following the below steps. But while distributing it, the data present in only one of the node is accessible. Please can you help me debug this issue. The OrientDB version is 2.2.6
Steps involved :
Utilized plocal mode in ETL tool and stored part of the data in node 1 and the other part in node2. The data stored actually belongs to just one class of vertex alone. ( On checking the data from console, the data has got injested properly ).
Then executed both the nodes in distributed mode, data from only one machineis accessible.
The default-distributed-db-config.json file is specified below :
{
"autoDeploy": true,
"readQuorum": 1,
"writeQuorum": 1,
"executionMode": "undefined",
"readYourWrites": true,
"servers": {
"*": "master"
},
"clusters": {
"internal": {
},
"address": {
"servers" : [ "orientmaster" ]
},
"address_1": {
"servers" : [ "orientslave1" ]
},
"*": {
"servers": ["<NEW_NODE>"]
}
}
}
There are two clusters created for the vertex named address namely address and address_1. The data in machine orientslave1 is stored using ETL tool into cluster address_1 , similarly the data in machine orientmaster is stored into the cluster address. ( I've ensured that both of these cluster ids are different at time of creation )
However when these two machines are connected together in distributed mode, the data in cluster address_1 is only visible
The ETL json is attached below :
{
"source": { "file": { "path": "/home/ubuntu/labvolume1/DataStorage/geo1_5lacs.csv" } },
"extractor": { "csv": {"columnsOnFirstLine": false, "columns":["place:string"] } },
"transformers": [
{ "vertex": { "class": "ADDRESS", "skipDuplicates":true } }
],
"loader": {
"orientdb": {
"dbURL": "plocal:/home/ubuntu/labvolume1/orientdb/databases/ETL_Test1",
"dbType": "graph",
"dbUser": "admin",
"dbPassword": "admin",
"dbAutoCreate": true,
"wal": false,
"tx":false,
"classes": [
{"name": "ADDRESS", "extends": "V", "clusters":1}
], "indexes": [
{"class":"ADDRESS", "fields":["place:string"], "type":"UNIQUE" }
]
}
}
}
Please let me know, if there is anything i'm doing wrongly

Related

Adding a custom tag based on topicName (wildcard) via using JmxTrans to send Kafka JMX to influxDb

Basically what i wanted to achieve was to get MessageInPerSec metric for all the topic in kafka and to add the custom tag as topicName in the influx db so as to query based on the topic not based on the 'ObjDomain' definition, below are my JmxTrans configuration, (Note using the wildcard for the topic as to fetch the data MessageInPerSec JMX attribute for all the topic)
{
"servers": [
{
"port": "9581",
"host": "192.168.43.78",
"alias": "kafka-metric",
"queries": [
{
"outputWriters": [
{
"#class": "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url": "http://192.168.43.78:8086/",
"database": "kafka",
"username": "admin",
"password": "root"
}
],
"obj": "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=*",
"attr": [
"Count",
"MeanRate",
"OneMinuteRate",
"FiveMinuteRate",
"FifteenMinuteRate"
],
"resultAlias": "newTopic"
}
],
"numQueryThreads": 2
}
]
}
which yields a result in the Influx DB as follow
[name=newTopic, time=1589425526087, tags={attributeName=FifteenMinuteRate,
className=com.yammer.metrics.reporting.JmxReporter$Meter, objDomain=kafka.server,
typeName=type=BrokerTopicMetrics,name=MessagesInPerSec,topic=backblaze_smart},
precision=MILLISECONDS, fields={FifteenMinuteRate=1362.9446063537794, _jmx_port=9581
}]
and create tag with whole objDomain spefcified in the config, but i wanted to have topic as a seperate tag that is something as follow
[name=newTopic, time=1589425526087, tags={attributeName=FifteenMinuteRate,
className=com.yammer.metrics.reporting.JmxReporter$Meter, objDomain=kafka.server,
topic=backblaze_smart,
typeName=type=BrokerTopicMetrics,name=MessagesInPerSec,topic=backblaze_smart},
precision=MILLISECONDS, fields={FifteenMinuteRate=1362.9446063537794, _jmx_port=9581
}]
was not able to find any adequate documentation for the same on how to use the wildcard value of topic as a separate tag using jmxtrans and writing it to the InfluxDB.
You just need to add the following additional properties for Influx output writer. Just make sure you are using the latest version of jmxtrans release. The docs are here: https://github.com/jmxtrans/jmxtrans/wiki/InfluxDBWriter
"typeNames": ["topic"],
"typeNamesAsTags": "true"
I have listed your config with the above modifications.
{
"servers": [
{
"port": "9581",
"host": "192.168.43.78",
"alias": "kafka-metric",
"queries": [
{
"outputWriters": [
{
"#class": "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url": "http://192.168.43.78:8086/",
"database": "kafka",
"username": "admin",
"password": "root",
"typeNames": ["topic"],
"typeNamesAsTags": "true"
}
],
"obj": "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=*",
"attr": [
"Count",
"MeanRate",
"OneMinuteRate",
"FiveMinuteRate",
"FifteenMinuteRate"
],
"resultAlias": "newTopic"
}
],
"numQueryThreads": 2
}
]
}

Populating only vertex from CSV file

Need help to know how should I populate my vertex class in orientdb with the csv file. The format in csv file is
name,type,status
xxxxx,ABC,3
yyyyy,ABC,1
zzzzz,123,5
--
I have a vertex and edges extended in OrientDB, where the vertex have 3 property name,type and status. I only want the vertex to get populated from csv, the edges will be created dynamically via API
I tried to create ETL file as below :
{
"source":{"file": { "path": "/tmp/ientdb-community-2.2.18/config/data.csv" } },
"extractor": { "csv": {} },
"transformers": [
{ "vertex": { "class": "MyObject" } }
],
"loader": {
"orientdb": {
"dbURL": "remote:localhost/mydb",
"dbUser": "root",
"dbPassword": "root",
"dbType": "graph",
"classes": [
{"name": "MyObject", "extends": "V"},
], "indexes": [
{"class":"MyObject", "fields":["name:string"], "type":"UNIQUE" }
]
}
}
}
I find that, if I use plocal the root/root credential is not working. And the classes are not as same as when logged in with remote (after starting server)
I tried your code and it works for me, this is what I get:
the only changes that I made to your code are: credential, and dbUrl plocal instead of remote:
{
"source":{"file": { "path": "mypath/config/data.csv" } },
"extractor": { "csv": {} },
"transformers": [
{ "vertex": { "class": "MyObject" } }
],
"loader": {
"orientdb": {
"dbURL": "plocal:mypath/databases/mydb",
"dbType": "graph",
"dbUser": "<user name>",
"dbPassword": "<user password>",
**BEGIN UPDATE**
"serverUser": "<server administrator user name, usually root>",
"serverPassword": "<server administrator user password that is provided at server startup>",
**END UPDATE**
"classes": [
{"name": "MyObject", "extends": "V"},
], "indexes": [
{"class":"MyObject", "fields":["name:string"], "type":"UNIQUE" }
]
}
}
}
By the way I noticed that your path is called: ientdb-community-2.2.18 is that correct?
Hope it helps.
Regards.

Utilizing OrientDB ETL to create 2 vertices and a connected edge at every line of CSV

I'm utilizing OrientDB ETL tool to import a large amount of data in GBs. The format of the CSV is such that ( I'm using orientDB 2.2 ) :
"101.186.130.130","527225725","233 djfnsdkj","0.119836317542"
"125.143.534.148","112212983","1227 sdfsdfds","0.0465215171983"
"103.149.957.752","112364761","1121 sdfsdfds","0.0938863016658"
"103.190.245.128","785804692","6138 sdfsdfsd","0.117767539364"
I'm required to create Two vertices one with the value in Column1(key being the value itself) and another Vertex having values in column 2 & 3 ( Its key concatenated with both values and both present as attributes in the second vertex type, the 4th column will be the property of the edge connecting both of these vertices.
I used the below code and it works ok with some errors, one problem is all values in each csv row is stored as properties within the IpAddress vertex, Is there any way to store only the IpAddress in it. Secondly please can you let me know the method to concatenate two values read from the csv.
{
"source": { "file": { "path": "/home/abcd/OrientDB/examples/ip_address.csv" } },
"extractor": { "csv": {"columnsOnFirstLine": false, "columns": ["ip:string", "dpcb:string", "address:string", "prob:string"] } },
"transformers": [
{ "merge": { "joinFieldName":"ip", "lookup":"IpAddress.ip" } },
{ "edge": { "class": "Located",
"joinFieldName": "address",
"lookup": "PhyLocation.loc",
"direction": "out",
"targetVertexFields": { "geo_address": "${input.address}", "dpcb_number": "${input.dpcb}"},
"edgeFields": { "confidence": "${input.prob}" },
"unresolvedLinkAction": "CREATE"
}
}
],
"loader": {
"orientdb": {
"dbURL": "remote:/localhost/Bulk_Transfer_Test",
"dbType": "graph",
"dbUser": "root",
"dbPassword": "tiger",
"serverUser": "root",
"serverPassword": "tiger",
"classes": [
{"name": "IpAddress", "extends": "V"},
{"name": "PhyLocation", "extends": "V"},
{"name": "Located", "extends": "E"}
], "indexes": [
{"class":"IpAddress", "fields":["ip:string"], "type":"UNIQUE" },
{"class":"PhyLocation", "fields":["loc:string"], "type":"UNIQUE" }
]
}
}
}

Orientdb. Setting up a sharded orientdb

orientdb version 2.1.11
I'm trying to setup on 3 nodes, I want to shard the data like the description of default-distribute-config.json.
write node1 -> node1, node2
write node2 -> node2, node3
write node3 -> node3, node1
{
"autoDeploy": true,
"hotAlignment": false,
"executionMode": "undefined",
"readQuorum": 1,
"writeQuorum": 2,
"failureAvailableNodesLessQuorum": false,
"readYourWrites": true,
"servers": {
"*": "master"
},
"clusters": {
"internal": {
},
"index": {
},
"person_node1": {
"servers": ["node1","node2"]
},
"person_node2": {
"servers": ["node2","node3"]
},
"person_node3": {
"servers": ["node3","node1"]
},
"*": {
"servers": ["<NEW_NODE>"]
}
}
}
but when I started nodes, they didn't wok like this. Sometime they work like this (copy from log file):
"person_node1": {
"servers": ["node1"]
},
"person_node2": {
"servers": ["node2"]
},
"person_node3": {
"servers": ["node3"]
},
"*": {
"servers": ["node2","node1","node3","<NEW_NODE>"]
},
Is there any detail document describing the configuration?
thanks.
The configuration is always updated by removing the absent nodes. If you want a more sticky configuration, set "hotAlignment": true, after 2.1.10 is safe to set it to true.
We're working on it by providing a more flexible behaviour.

"No nodes configured for partition" after creating a database via ETL

I've just created a custom database using the following ETL config,
{
"source": { "file": { "path": "./mydata.csv" } },
"extractor": { "row": {} },
"transformers": [
{ "csv": {} },
{ "vertex": { "class": "MyClass" } }
],
"loader": {
"orientdb": {
"dbURL": "plocal:/opt/orientdb/databases/MyData",
"dbUser": "root",
"dbPassword": "qrefhiuqwriouhwqv",
"dbType": "graph",
"classes": [
{"name": "MyClass", "extends": "V"},
]
}
}
}
Now, when I go to the web console, I can see I have 433k records of type MyClass created at database MyData.
When I try to query it with "select from MyClass", I get the error
2015-04-06 23:56:25:541 SEVERE Internal server error:
com.orientechnologies.orient.server.distributed.ODistributedException:
No nodes configured for partition 'MyClass.[]' request:
id=-1 from=node1428362873334 task=command_sql(select from MyClass) userName= [ONetworkProtocolHttpDb]
What am I doing wrong?