Cosmos DB unique key - nosql

It is possible to create unique constraint on property in subcollection in json?
I have json like that:
{
"id":111,
"DataStructs":[
"Name": "aaaaa",
"DataStructCells": [
{
"RowName": "Default",
"ColumnName": "Perfect / HT",
"CellValue": "0.1"
},
{
"RowName": "Default",
"ColumnName": "100% / HT",
"CellValue": "0.2"
}
]
]
}
And I wanted to add unique key to prevent to add two the same RowName: Default
When I create collection I added unique key: /DataStructs/DataStructCells/RowName but it doesn't work.

No, that is impossible. Unique keys can only work with different documents within a logic partition. To achieve your requirement, you can split your array to different documents in a same logic partition. You can refer to this How do I define unique keys involving properties in embedded arrays in Azure Cosmos DB?

Related

MongoDB index for nested map values

I have a MongoDB Collection that contains documents with a nested map, similar to the following document:
{
"_id": "1"
"accounts": {
"account-id-1": { "email": "example1#example.com", ... },
"account-id-2": { "email": "example2#example.com", ... },
}
}
The accounts map contains account IDs as keys and the remaining account data as values/objects. Now I want to add an index for the email field of the nested object, but I can't do that by defining the fields as one would normally do for nested fields, e.g. accounts.account-id-1.email because the mid part (account-id-1) is different for each entry.
I have read about wildcard indexes, but it seems to me that the index expression always ends withe the special wildcard symbol $**, but never has it in the middle.
My question is whether it's possible to define such an index in the following way or similarly: accounts.$**.email, so that only the email field gets indexed.

Is there a way to define single fields that are never indexed in firestore in all collections

I understand that index has a cost in firestore. Most of the time we simply store objects without really caring about index and even if we don’t want most of the fields to be indexed.
If I understand correctly, any field at any level are indexed. I.e. for the following document in pseudo json
{
"root_field1": "abc" (indexed)
"root_field2": "def" (indexed)
"root_field3": {
"Sub_field1: "ghi" (indexed)
"sub_field2: "jkl" (indexed)
"sub_field3: {
"Inner_field1: "mno" (indexed)
"Inner_field2: "pqr" (indexed)
}
}
Let’s assume that I have the following record
{
"name": "abc"
"birthdate": "2000-01-01"
"gender": "m"
}
Let’s assume that I just want the field "name" to be indexed. One solution (A), without having to specify every field is to define it this way (i.e. move the root fields to a sub level unindexed), and exclude unindexed from being indexed
{
"name": "abc"
"unindexed" {
"birthdate": "2000-01-01"
"gender": "m"
}
Ideally I would like to just specify a prefix such as _ to prevent each field to be indexed but there is no global solution for that.
{
"name": "abc"
"_birthdate": "2000-01-01"
"_gender": "m"
}
Is my solution (A) correct and is there a more elegant generic solution?
Thanks!
Accordinig to the documentation
https://cloud.google.com/firestore/docs/query-data/indexing
Add a single-field index exemption
Single-field index exemptions allow you to override automatic index
settings for specific fields in a collection. You can add a
single-field exemptions from the console:
Go to the Single Field Indexes section.
Click Add Exemption.
Enter a Collection ID and Field path.
Select new indexing settings for this field. Enable or disable
automatically updated ascending, descending, and array-contains
single-field indexes for this field.
Click Save Exemption.

Unique multikey in mongodb for field of array of embedded fields

I have this Document structure:
id: "xxxxx",
name: "John",
pets: [
{
"id": "yyyyyy",
"type": "Chihuahua",
},
{
"id": "zzzzzz",
"type": "Labrador",
}
]
The pets field is not an array of embedded documents (not referencing any other collection).
I want the pets id to be unique across the documents and the document itself, but it seems the mongodb official docs say its not possible and doesnt offer other solution:
For unique indexes, the unique constraint applies across separate documents in the collection rather than within a single document.
Because the unique constraint applies to separate documents, for a
unique multikey index, a document may have array elements that result
in repeating index key values as long as the index key values for that
document do not duplicate those of another document.
https://docs.mongodb.com/manual/core/index-multikey/
I have tried this using mongodb golang driver:
_, err = collection.Indexes().CreateOne(context.TODO(), mongo.IndexModel{
Keys: bson.M{"pets.id": 1},
Options: options.Index().SetUnique(true),
})
but like the docs said, it allows 2 pets of a person to have same ID, while not allowing a pet from a different person to have the same ID compared to the pet of the first person...
is there anyway to enforce this in mongodb ?

How to do bulkcreate with updateOnDuplicate when there are composite keys?

I'm using Postgres/sequelize .
I need to do bulkupdate and the table is partitioned , so I can't use "Insert into ~ on conflict" .
It looks like I can use bulkCreate with 'updateOnDuplicate' option but I don't know how to define multiple keys. I mean there is no pk in the table but I know two columns together will make unique records.
In this case, how to do bulkupdate ?
Model.bulkCreate(dataToUpdate, { updateOnDuplicate: ["user_id", "token", "created_at"] })
In my case I was using json object array in my model indexes fields like name value pair which was causing issue, after updating it to array of strings fixed my issue in seqialize version 6.
indexes: [
{
name: "payment_clearance_pkey",
unique: true,
fields: ["payment_module_id", "account_id"]
},
]

How to query nested fields in MongoDB using Presto

I'm setting up a Presto cluster which I'd like to use to query a MongoDB instance. Data in my Mongo instance has the following structure:
{
_id: <value>
somefield: <value>
otherfield: <value>
nesting_1: {
nested_field_1_1: <value>
nested_field_1_2: <value>
...
}
nesting_2: {
nesting_2_1: {
nested_field_2_1_1: <value>
nested_field_2_1_2: <value>
...
}
nesting_2_2: {
nested_field_2_2_1: <value>
nested_field_2_2_2: <value>
...
}
}
}
Just by plugging it, Presto correctly identifies and creates columns for the values in the top level (e.g. somefield, otherfield) and in the first nesting level -- that is, it creates a column for nesting_1, and its content is a row(nested_field_1_1 <type>, nested_field_1_2 <type>, ...), and I can query table.nesting1.nested_field_1_1.
However, fields with an extra nesting layer (e.g. nesting_2 and everything within it) are missing from the table schema. Presto's documentation for the MongoDB connector does mention that:
At startup, this connector tries guessing fields’ types, but it might not be correct for your collection. In that case, you need to modify it manually. CREATE TABLE and CREATE TABLE AS SELECT will create an entry for you.
While that seems to explain my use case, it's not very clear on how to "modify it manually" -- a CREATE TABLE statement doesn't seem appropriate, as the table is already there. The documentation also has a section on how to declare fields and their types, but it's also not very clear on how to deal with multiple nesting levels.
My question is: how do I setup Presto's MongoDB connector so that I can query fields in the third nesting layer?
Answers can assume that:
all nested fields' names are known;
there are only 3 layers;
there is no need to preserve the layered table layout (i.e. I don't mind if my resulting Presto table has all nested fields as unique columns like somefield, rather than one field with rows like nesting_1 in the above example);
extra points if the solution doesn't require me to explicitly declare the names and types of all columns in the third layer, as I have over 1500 of them -- but this is not a hard requirement.
On mongodb.properties, the property mongodb.schema-collection can be used to describe the schema of your MongoDB collections. As described in the documentation, this property is optional and the default is _schema.
it's not very clear on how to "modify it manually" -- a CREATE TABLE statement doesn't seem appropriate, as the table is already there.
It is supposed to be created and populated automatically but what I've noticed is that it is populated until some queries are executed, and it only generates the schema for the collections that are queried.
However, there is a open bug, some fields/columns are not automatically picked up.
Also, once an entry for a collection is created/populated it won't be updated automatically, any update needs to be done manually (if the collection start to have new fields they won't be detected automatically).
To manually update the schema, the field column is just another entry in the fields array, as mentioned in the doc, it has three parts :
name Name of the column in the Presto table, it needs to match with the name of the collection field.
type Presto type of the column. Here are the available types, the ROW type can be used for nested properties.
hidden Hides the column from DESCRIBE <table name> and SELECT *. Defaults to false.
My question is: how do I setup Presto's MongoDB connector so that I can query fields in the third nesting layer?
The schema definition for a MongoDB collection like the one you posted will be containing something like:
...
"fields": [
{
"name": "_id",
"type": "ObjectId",
"hidden": true
},
{
"name": "somefield",
"type": "varchar",
"hidden": false
},
{
"name": "otherfield",
"type": "varchar",
"hidden": false
},
{
"name": "nesting_1",
"type": "row(nested_field_1_1 varchar, nested_field_1_2 bigint)",
"hidden": false
},
{
"name": "nesting_2",
"type": "row(nesting_2_1 row(nested_field_2_1_1 varchar, nested_field_2_1_2 varchar),nesting_2_2 row(nested_field_2_2_1 varchar, nested_field_2_2_2 varchar))",
"hidden": false
}
]
...
It can be queried using . over the columns, like:
SELECT nesting_2.nesting_2_1.nested_field_2_1_1 FROM table;
If the mongo collection being queried does not have a fixed schema, indicated in the _schema collection, Presto is not able to infer the document structure.
If you prefer,the option is to explicitly declare the schema in the connector configuration, using field mongodb.schema-collection, as described in the documentation. You can set it to a different mongo collection which stores the same values, and create this collection directly.
Nested fields can be declared using the ROW data type, which is also described in the docs and behaves like what would be a struct or dictionary in other programming languages.
You can create a collection in mongodb, for example "presto_schema" in your database and insert sample schema like this
db.presto_schema.insertOne({
"table" : "your_collection",
"fields" : [
{
"name" : "_id",
"type" : "ObjectId",
"hidden" : true
},
{
"name" : "last_name",
"type" : "varchar",
"hidden" : false
},
{
"name" : "id",
"type" : "varchar",
"hidden" : false
}
]})
In your presto mongodb.properties, add the property like this:
mongodb.schema-collection=presto_schema
From now, presto will use "presto_schema" instead of your default "_schema" to query.