OrientDB specify cluster from JSON structure - orientdb

Working with OrientDB document 2.1.11
I'm trying to save a document from a JSON object, but I need the subObject in it to be inserted in a specific cluster. Is there a way to tell OrientDb to save in a cluster from the JSON
Here is an example of the JSON to be save :
{
"#rid":"#18:0",
"#class":"Supplier",
"#type":"d",
"#version":1,
"code":"SUP1",
"active":true,
"language":"en",
"divisions":[ // LinkList
{
"#class":"Division",
"code":"tre",
"rate":"5",
"#type":"d",
"description":{
"fr":"",
"en":"rew"
}
}
],
"createdDate":"2016-05-04 09:24:35",
"name":"Supplier1",
"currency":"CAD"
}
How to I specify that #class:"Division has to be in a specific cluster? Can the subObject in a JSON structure indicate wich cluster has to be updated?
I'm using the JAVA API database.save(doc, "supplier_1") to save the Supplier object is a specific cluster (i.e: "supplier_1").

I don't think it's possible doing that as you can read here , records stored in embedded types has no #rid so, as long as they're contained inside the main record, you can't put'em into a specific cluster.
Hope it helps.
Ivan

Related

Loopback (DB2) - Can not create an instance of PersistedModel that uses a schema other than the userid

I am trying to define a model that is based on the PersistedModel to access a table in DB2, call it MY_SCHEMA.MY_TABLE.
I created the model MY_TABLE, based on PersistedModel, with a Data Source (datasources.json) where the definition includes the attribute "schema": "MY_SCHEMA". The data source also contains the userid my_userid, used for the connection.
Current Behavior
When I try to call the API for this model, it tries to access the table my_userid.MY_TABLE.
Expected Behavior
It should access MY_SCHEMA.MY_TABLE.
The DB2 instance happens to be on a System Z. I have created a table called my_userid.MY_TABLE and that will work, however for the solution we are trying to build, there are multiple schemas required.
Note that this only appears to be an issue with Db2 on System Z. I can change schemas on Db2 LUW.
What LoopBack connector are you using? What version? Can you also check what version of loopback-ibmdb is installed in your node_modules folder?
AFAICT, LoopBack's DB2-related connectors support schema field, see https://github.com/strongloop/loopback-ibmdb/blob/master/lib/ibmdb.js#L96-L100
self.schema = this.username;
if (settings.schema) {
self.schema = settings.schema.toUpperCase();
}
self.connStr += ';CurrentSchema=' + self.schema;
Have you considered configuring the database connection using DSN instead of individual fields like hostname and username?
In your datasource config JSON:
"dsn": "DATABASE={dbname};HOSTNAME={hostname};UID={username};PWD={password};CurrentSchema=MY_SCHEMA"

How to transform and extract fields in Kafka sink JDBC connector

I am using a 3rd party CDC tool that replicates data from a source database into Kafka topics. An example row is shown below:
{
"data":{
"USER_ID":{
"string":"1"
},
"USER_CATEGORY":{
"string":"A"
}
},
"beforeData":{
"Data":{
"USER_ID":{
"string":"1"
},
"USER_CATEGORY":{
"string":"B"
}
}
},
"headers":{
"operation":"UPDATE",
"timestamp":"2018-05-03T13:53:43.000"
}
}
What configuration is needed in the sink file in order to extract all the (sub)fields under data and headers and ignore those under beforeData so that the target table in which the data will be transferred by Kafka Sink will contain the following fields:
USER_ID, USER_CATEGORY, operation, timestamp
I went through the transformation list in confluent's docs but I was not able to find how to use them in order to achieve the aforementioned target.
I think you want ExtractField, and unfortunately, it's a Map.get operation, so that means 1) nested fields cannot be gotten in one pass 2) multiple fields need multiple transforms.
That being said, you might to attempt this (untested)
transforms=ExtractData,ExtractHeaders
transforms.ExtractData.type=org.apache.kafka.connect.transforms.ExtractField$Value
transforms.ExtractData.field=data
transforms.ExtractHeaders.type=org.apache.kafka.connect.transforms.ExtractField$Value
transforms.ExtractHeaders.field=headers
If that doesn't work, you might be better off implementing your own Transformations package that can at least drop values from the Struct / Map.
If you're willing to list specific field names, you can solve this by:
Using a Flatten transform to collapse the nesting (which will convert the original structure's paths into dot-delimited names)
Using a Replace transform with rename to make the field names be what you want the sink to emit
Using another Replace transform with whitelist to limit the emitted fields to those you select
For your case it might look like:
"transforms": "t1,t2,t3",
"transforms.t1.type": "org.apache.kafka.connect.transforms.Flatten$Value",
"transforms.t2.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.t2.renames": "data.USER_ID:USER_ID,data.USER_CATEGORY:USER_CATEGORY,headers.operation:operation,headers.timestamp:timestamp",
"transforms.t3.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.t3.whitelist": "USER_ID,USER_CATEGORY,operation,timestamp",

How to store and retrieve image in orientdb using java?

Hi I am new to OrientDB and I search about this in google and I could find this.
http://orientdb.com/docs/last/Binary-Data.html.
May be this question is not a valid but I have a doubt what will be type of element which will store binary data.
1.if we are trying to save image as Schema Full property?
2 if we are trying to save image as Schema less property?
As mentioned in above document.
ODocument doc = new ODocument();
doc.field("binary", "Binary data".getBytes());
doc.save();
where will 'doc' will get saved?
Would it possible to give some example on how to save image/binary data and retrieve it.
They binary data type for binary types is OType.BINARY
If you don't specify a class for the document, it will be saved in the "default" cluster. Then you can query it with SELECT FROM cluster:default WHERE ...
BUT I strongly discourage you from doing it, please also consider that in v 3.0 automatic save to the default cluster no longer supported (but you can still do doc.save("default") explicitly)
In general it's much better to create a specific class and save your docs there, eg.
//create the schema only the first time of course
OClass class = db.getMetadata().getSchema().createClass("Image");
class.createProperty("binary", OType.BINARY); // if you want it schemaful
ODocument doc = db.newInstance("Image")
doc.field("binary", "Binary data".getBytes());
doc.save();

How to model nested json data on redshift to query specific neseted property

I have the following JSON file structure on S3:
{
"userId": "1234",
"levelA": {
"LevelB": [
{
"bssid": "University",
"timestamp": "153301355685"
},
{
"bssid": "Mall",
"timestamp": "153301355688"
}
]
}
}
Now one of our future queries would be:
Return the total of users who saw bssid=University
So in my case it will return 1 (because userId=1234 contains that bssid's value)
Is Redshift the right solution for me for this type of query? In the case that it is, how can I model it?
The easiest way to model this would be to create a table with one row for each combination of userId and bssd:
userId, bssid, timestamp
1234,University,153301355685
1234,Mall,153301355688
The difficult part would be converting your JSON (contained in multiple files) into a suitable format for Redshift.
While Amazon Redshift can import data in JSON format, it would not handle the one-to-many relationship within your nested data.
Amazon Redshift also has a JSON_EXTRACT_PATH_TEXT Function that can extract data from a JSON string, but again it wouldn't handle the one-to-many relationship in your data.
I would recommend transforming your data into the above format prior to loading into Redshift. This would need to be done with an external script or ETL tool.
If you are frequently generating such files, a suitable method would be to trigger an AWS Lambda function whenever one of these files is stored in the Amazon S3 bucket. The Lambda function would then parse the file and output the CSV format, ready for loading into Redshift.

Ensure fields with same type across documents in a collection in mongodb

I am in the process of migrating a database in MySQL to MongoDB. However, I am running into a problem where MongoDB changes the document type based on the length/value of the string/integer data used to initialize it. Is there a way to prevent this? I want the types to be same across a collection.
I am new to this technology and apologize if I missed something. I looked around and could not find a solution to this. Any pointers are greatly appreciated.
thanks,
Asha
If you're writing your migration application in C++, check out the BSONObjBuilder class in "bson/bsonobjbuilder.h". If you create your individual BSON documents using the "append" methods of BSONObjBuilder, the builder will use the static types of the fields to set the appropriate BSON type in the output object.
For example:
int count = /*something from a mysql query*/;
std::string name = /*something else from a mysql query*/;
BSONObjBuilder builder;
builder.append("count", count);
builder.append("name", name);
BSONObj result = builder.obj();