Sharding in orientdb distributed database - orientdb

I am creating a distributed database in orientDB 2.2.6 with 3 nodes, namely master1, master2 and master3. I modified the hazelcast.xml and orientdb.server.config.xml files on each of the nodes. I used a common default-distributed-db-config.json on all 3 nodes which looks like as shown below.
{
"autoDeploy": true,
"readQuorum": 1,
"writeQuorum": "majority",
"executionMode": "undefined",
"readYourWrites": true,
"failureAvailableNodesLessQuorum": false,
"servers": {
"*": "master"
},
"clusters": {
"internal": {
},
"address": {
"owner" : "master1",
"servers": [ "master1" ]
},
"address_1": {
"owner" : "master1",
"servers" : [ "master1" ]
},
"ip": {
"owner" : "master2",
"servers" : [ "master2" ]
},
"ip_1": {
"owner" : "master2",
"servers" : [ "master2" ]
},
"id": {
"owner" : "master3",
"servers" : [ "master3" ]
},
"id_1": {
"owner" : "master3",
"servers" : [ "master3" ]
},
"*": {
"servers": [ "<NEW_NODE>" ]
}
}
}
Then I started the distributed server in the master1 machine, master2 and master3 in this order and let them synchronize the default DB. Then I created a database and three classes(Address, IP, ID) and their properties and indexes in the master1 machine. As I mentioned in the default-distributed-db-config.json file, Address class has two clusters and they are residing in the master1 machine. Class IP has two clusters and they reside in master2 machine.
When I insert values into Address class, as expected they are getting into master1 machine's clusters, following the round-robin strategy. But when I insert values for IP from the master2 machine they are creating a cluster in master1 and inserting into the new cluster. Basically, all the values are getting into master1 machine. When I do List Clusters, the clusters in master2 and master3 machines are empty.
So, I could not distribute the data across the three nodes. It basically stores the data into single machine. How to shard the data ? Is there any issue with the way I am trying to insert the data ?
Thanks

In current OrientDB releases, write operations (create/update/delete) are not forwarded. only the reads are. For this reason, the client should be connected to the server that handles the cluster you want your data written to.
Usually, this isn't a problem, because a local cluster is selected, but if you want to write on a specific cluster on a remote server this is not supported yet.

Related

Kubernetes - ExternalIP vs InternalIP

When I run the following command to get info from my on-prem cluster,
kubectl cluster-info dump
I see the followings per each node.
On master
"addresses": [
{
"type": "ExternalIP",
"address": "10.10.15.47"
},
{
"type": "InternalIP",
"address": "10.10.15.66"
},
{
"type": "InternalIP",
"address": "10.10.15.47"
},
{
"type": "InternalIP",
"address": "169.254.6.180"
},
{
"type": "Hostname",
"address": "k8s-dp-masterecad4834ec"
}
],
On worker node1
"addresses": [
{
"type": "ExternalIP",
"address": "10.10.15.57"
},
{
"type": "InternalIP",
"address": "10.10.15.57"
},
{
"type": "Hostname",
"address": "k8s-dp-worker5887dd1314"
}
],
On worker node2
"addresses": [
{
"type": "ExternalIP",
"address": "10.10.15.33"
},
{
"type": "InternalIP",
"address": "10.10.15.33"
},
{
"type": "Hostname",
"address": "k8s-dp-worker6d2f4b4c53"
}
],
My question here is..
1.) Why some nodes have different ExternalIP and InternalIP and some don't?
2.) Also for the node that have different ExternalIP and InternalIP are in same CIDR range and both can be reached from outside. What is so internal / external about these two IP address? (What is the purpose?)
3.) Why some node have random 169.x.x.x IP-address?
Trying to still learn more about Kubernetes and it would be greatly helpful if someone can help me understand. I use contiv as network plug-in
What you see is part of the status of these nodes:
InternalIP: IP address of the node accessible only from within the cluster
ExternalIP: IP address of the node accessible from everywhere
Hostname: hostname of the node as reported by the kernel
These fields are set when a node is added to the cluster and their exact meaning depends on the cluster configuration and is not completely standardised, as stated in the Kubernetes documentation.
So, the values that you see are like this, because your specific Kubernetes configuration sets them like this. With another configuration you get different values.
For example, on Amazon EKS, each node has a distinct InternalIP, ExternalIP, InternalDNS, ExternalDNS, and Hostname (identical to InternalIP). Amazon EKS sets these fields to the corresponding values of the node in the cloud infrastructure.

AWS RDS Stack update always replaces the DB Cluster

I first restored an Aurora RDS Cluster using a cluster snapshot with a cloud formation template. Then removed the snapshot identifier, updated the password and performed a stack update keeping everything else unchanged in the CFT. But stack always prints the
Requested update requires the creation of a new physical resource;
hence creating one.
message and start creating a new cluster. Here is my CFT for the cluster.
"DatabaseCluster": {
"Type": "AWS::RDS::DBCluster",
"DeletionPolicy": "Snapshot",
"Properties": {
"BackupRetentionPeriod": {
"Ref": "BackupRetentionPeriod"
},
"Engine": "aurora-postgresql",
"EngineVersion": {
"Ref": "EngineVersion"
},
"Port": {
"Ref": "Port"
},
"MasterUsername": {
"Fn::If" : [
"isUseDBSnapshot",
{"Ref" : "AWS::NoValue"},
{"Ref" : "MasterUsername"}
]
},
"MasterUserPassword": {
"Fn::If" : [
"isUseDBSnapshot",
{"Ref" : "AWS::NoValue"},
{"Ref" : "MasterPassword"}
]
},
"DatabaseName": {
"Fn::If" : [
"isUseDBSnapshot",
{"Ref" : "AWS::NoValue"},
{"Ref" : "DBName"}
]
},
"SnapshotIdentifier" : {
"Fn::If" : [
"isUseDBSnapshot",
{"Ref" : "SnapshotIdentifier"},
{"Ref" : "AWS::NoValue"}
]
},
"PreferredBackupWindow": "01:00-02:00",
"PreferredMaintenanceWindow": "mon:03:00-mon:04:00",
"DBSubnetGroupName": {"Ref":"rdsDbSubnetGroup"},
"StorageEncrypted":{"Ref" : "StorageEncrypted"},
"DBClusterParameterGroupName": {"Ref" : "RDSDBClusterParameterGroup"},
"VpcSecurityGroupIds": [{"Ref" : "CommonSGId"}]
}
}
According to the AWS RDS CFT doc MasterUserPassword update doesn't need a cluster replacement.
Is there anything wrong with my CFT or is this an issue with AWS?
If you just wish to update the password of the DB instance, you shouldn't remove the Snapshot identifier. I understand that you might be worried of losing data if the snapshot is being restored.
However, that is not the case with Cloudformation. Cloudformation precisely checks what changes you have made and performs a relevant operation. If you are changing just the password, then it will not tamper your data - whatever state it is in.
However, if you remove the snapshot identifier means you want to change the DB and remove the snapshot from it. So it will replace your DB instance.
Check the below link for more details on what happens on changing each parameter.
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-rds-dbcluster.html#cfn-rds-dbcluster-snapshotidentifier
It clearly specifies that any chance in snapshot identifier will result in replacement

Orient DB distributed replica on embedded server

We are setting a distributed OrientDB database on an embedded server (we are using OrientDB v.2.2.31). We would like to have a master-replica configuration, but we have encountered some issues in doing that.
We have setted the default-distributed-db-config.json file in the following way, both for the master and for the replica:
{
"autoDeploy": true,
"hotAlignment": true,
"executionMode": "asynchronous",
"readQuorum": 1,
"writeQuorum": 1,
"failureAvailableNodesLessQuorum": false,
"readYourWrites": true,
"newNodeStrategy" : "static",
"servers": {
"orientdb_master": "master",
"orientdb_replica1": "replica"
},
"clusters": {
"internal": {
},
"index": {
},
"*": {
"servers": ["<NEW_NODE>"]
}
}
}
"orientdb_master" and "orientdb_replica1" are the hostnames associated to the the master and slave server, respectively.
We start the master server first and then the other server: the connection between them takes place without problems, but the server that should be the replica is actually another master (and so, we have a multi-master configuration).
How can we specify that the second server is a replica? There are other parameters that it is necessary to set?
Thanks in advance
Instead of setting orientdb_replica1 (the hostname), you should use the node name you assigned at startup. You can find it under config/orientdb-server-config.xml.

Only data from node 1 visible in a 2 node OrientDB cluster

I created a 2-node OrientDB cluster by following the below steps. But while distributing it, the data present in only one of the node is accessible. Please can you help me debug this issue. The OrientDB version is 2.2.6
Steps involved :
Utilized plocal mode in ETL tool and stored part of the data in node 1 and the other part in node2. The data stored actually belongs to just one class of vertex alone. ( On checking the data from console, the data has got injested properly ).
Then executed both the nodes in distributed mode, data from only one machineis accessible.
The default-distributed-db-config.json file is specified below :
{
"autoDeploy": true,
"readQuorum": 1,
"writeQuorum": 1,
"executionMode": "undefined",
"readYourWrites": true,
"servers": {
"*": "master"
},
"clusters": {
"internal": {
},
"address": {
"servers" : [ "orientmaster" ]
},
"address_1": {
"servers" : [ "orientslave1" ]
},
"*": {
"servers": ["<NEW_NODE>"]
}
}
}
There are two clusters created for the vertex named address namely address and address_1. The data in machine orientslave1 is stored using ETL tool into cluster address_1 , similarly the data in machine orientmaster is stored into the cluster address. ( I've ensured that both of these cluster ids are different at time of creation )
However when these two machines are connected together in distributed mode, the data in cluster address_1 is only visible
The ETL json is attached below :
{
"source": { "file": { "path": "/home/ubuntu/labvolume1/DataStorage/geo1_5lacs.csv" } },
"extractor": { "csv": {"columnsOnFirstLine": false, "columns":["place:string"] } },
"transformers": [
{ "vertex": { "class": "ADDRESS", "skipDuplicates":true } }
],
"loader": {
"orientdb": {
"dbURL": "plocal:/home/ubuntu/labvolume1/orientdb/databases/ETL_Test1",
"dbType": "graph",
"dbUser": "admin",
"dbPassword": "admin",
"dbAutoCreate": true,
"wal": false,
"tx":false,
"classes": [
{"name": "ADDRESS", "extends": "V", "clusters":1}
], "indexes": [
{"class":"ADDRESS", "fields":["place:string"], "type":"UNIQUE" }
]
}
}
}
Please let me know, if there is anything i'm doing wrongly

Grant access to RDS layer using Cloudformation for app layer

I have an RDS database that I bring up using Cloudformation. Now I have a Cloudformation document that brings up my app server tier. How can I grant my app servers access to the RDS instance?
If the RDS instance was created by my Cloudformation document, I know I could do this:
"DBSecurityGroup": {
"Type": "AWS::RDS::DBSecurityGroup",
"Properties": {
"EC2VpcId" : { "Ref" : "VpcId" },
"DBSecurityGroupIngress": { "EC2SecurityGroupId": { "Fn::GetAtt": [ "AppServerSecurityGroup", "GroupId" ]} },
"GroupDescription" : "Frontend Access"
}
}
But the DBSecurityGroup will already exist by the time I run my app cloudformation. How can I update it?
Update Following what huelbois pointed out to me below, I understood that I could just create an AWS::EC2::SecurityGroupIngress in my app Cloudformation. As I am using a VPC and the code huelbois posted is for classic, I can confirm that this works:
In RDS Cloudformation:
"DbVpcSecurityGroup" : {
"Type" : "AWS::EC2::SecurityGroup",
"Properties" : {
"GroupDescription" : "Enable JDBC access on the configured port",
"VpcId" : { "Ref" : "VpcId" },
"SecurityGroupIngress" : [ ]
}
}
And in app Cloudformation:
"specialRDSRule" : {
"Type": "AWS::EC2::SecurityGroupIngress",
"Properties" : {
"IpProtocol": "tcp",
"FromPort": 5432,
"ToPort": 5432,
"GroupId": {"Ref": "DbSecurityGroupId"},
"SourceSecurityGroupId": {"Ref": "InstanceSecurityGroup"}
}
}
where DbSecurityGroupId is the id of the group setup above (something like sg-27324c43) and is a parameter to the app Cloudformation document.
When you want to use already existing resources in a CloudFormation template, you can use the previously created ids, instead of Ref or GetAtt.
In your example, you can use:
{ "EC2SecurityGroupId": "sg-xxxNNN" }
where "sg-xxxNNN" is the id of your DB SecurityGroup (not sure of the DB SecurityGroup prefix, since we don't use EC2-classic but VPC).
I would recommend using a parameter for your SecurityGroup in your template.
*** update **
For your specific setup, I would use a "DBSecurityGroupIngress" resource, to add a new sg to your RDS instance.
In your first stack (RDS), you create an empty DBSecurityGroup like this:
"DBSecurityGroup": {
"Type": "AWS::RDS::DBSecurityGroup",
"Properties": {
"EC2VpcId" : { "Ref" : "VpcId" },
"DBSecurityGroupIngress": [],
"GroupDescription" : "Frontend Access"
}
}
This DBSecurityGroup is refered to by the DBInstance. (I guess you have specific requisites for using DBSecurityGroup instead of VPCSecurityGroup).
In your App stack, you create a DBSecurityGroupIngress resource, which is a child of the DBSecurityGroup your created in the first stack:
"specialRDSRule" : {
"Type":"AWS::RDS::DBSecurityGroupIngress",
"Properties" : {
"DBSecurityGroupName": "<the arn of the DBSecurityGroup>",
"CIDRIP": String,
"EC2SecurityGroupId": String,
"EC2SecurityGroupName": String,
"EC2SecurityGroupOwnerId": String
}
}
You need the arn of the DBSecurityGroup, which is "arn:aws:rds:::secgrp:". The other parameters come from your App stack, not sure if you need everything (I don't do EC2-classic security groups, only VPC).
Reference : http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-rds-security-group-ingress.html
We use the same mechanism with VPC SecurityGroups, with Ingress & Egress rules, so we can have two SG reference each-other.