Query to Find and Remove PG Duplicated JSONB Keys and Values - postgresql

I would like to be able to Update JSONB column values by removing duplicates, and similarities.
JSONB VALUES:
{"Service Types": {"INSURANCE": true, "Insurance": true}}
{"Service Types": {"HOSPITALS": true, "Hospitals": true}}
{"Service Types": {"DENTISTS": true, "Dentists": true}}
{"Service Types": {"Physicians": true, "PHYSICIANS & SURGEONS": true}}
EXPECTED RESULT:
{"Service Types": {"Insurance": true}}
{"Service Types": {"Hospitals": true}}
{"Service Types": {"Dentists": true}}
{"Service Types": {"PHYSICIANS & SURGEONS": true}}
EXAMPLE OF DUPLICATES
Duplicates: {"INSURANCE": true, "Insurance": true}
EXPECTED: {"Insurance": true}
SIMILARITIES:
{"Physicians": true, "PHYSICIANS & SURGEONS": true}
EXPECTED: {"Physicians": true,} or {"PHYSICIANS & SURGEONS": true}

Related

Is there any way of creating a unique index which allows empty values in mongodb?

I am trying to create a index with schema
{
_id: "",
name: ""
}
I want to make an index that supports names with unique values and permits adding data to databases when names have the value "" (empty string).
I tried using this:
db.collection.createIndex(
{name: 1},
{unique: true, partialFilterExpression: {name: {$type: "string"}}}
)
An empty string "" is still a string, so partialFilterExpression: {name: {$type: "string"}} will index this document.
In principle, it would be this one:
db.collection.createIndex(
{name: 1},
{unique: true, partialFilterExpression: {name: {$ne: ""}}}
)
However, partialFilterExpression supports only these operators:
equality expressions (i.e. field: value or using the $eq operator),
$exists: true expression,
$gt, $gte, $lt, $lte expressions,
$type expressions,
$and operator,
$or operator,
$in operator
Looks like, this is not possible. Only way would be to skip the attribute and use { partialFilterExpression: { name: { $exists: true } } }
You can use the string sort ordering to skip the empty string.
Have the partial filter require that name be type string and greater than "":
partialFilterExpression: {$and: [
{name: {$type: "string"}},
{name: {$gt: ""}}
]}

Complex count query in MongoDB

I have a json that have the following structure:
{"data": {
"key1": "value1",
"key2": "value2",
"manualTests": [{"name": "component1", "passed": false, "x": 12},
{"name": "component2", "passed": true},
{"name": "component3", "passed": false, "responseTime": 5}],
"automaticTests": [{"name": "component4", "passed": false},
{"name": "component5", "passed": true, "conversion": "Z"},
{"name": "component6", "passed": false}],
"semiautomaticTests": [{"name": "component7", "passed": true},
{"name": "component8", "passed": true},
{"name": "component9", "passed": true}]
}}
My mongoDB contains a really huge number of these and I need to get a list of all the components that have not passed the tests. So the output desired should be:
{
"component1": 150,
"component2": 35,
"component3": 17,
"component4": 5,
"component5": 3,
"component6": 1
}
The numbers are random and for each component they show how many components did pass the tests. How do I calculate it in mongoDB? The format is not strict, the prime requirement is that the output should contain a name of a component failed and their number out of the whole sample.
You can try below aggregation query.
$match stage to consider only tests where there is at-least one fail component.
$project with $filter to extract all the failed components followed by $concatArrays to merge all failed components across all tests.
$unwind to flatten the array and $group to count for each failed component.
db.colname.aggregate([
{"$match":{
"$or":[
{"manualTests.passed":false},
{"automaticTests.passed":false},
{"semiautomaticTests.passed":false}
]
}},
{"$project":{
"tests":{
"$concatArrays":[
{"$filter":{"input":"$manualTests","as":"mt","cond":{"$eq":["$$mt.passed",false]}}},
{"$filter":{"input":"$automaticTests","as":"at","cond":{"$eq":["$$at.passed",false]}}},
{"$filter":{"input":"$semiautomaticTests","as":"st","cond":{"$eq":["$$st.passed",false]}}}
]
}
}},
{"$unwind":"$tests"},
{"$group":{"_id":"$tests.name","count":{"$sum":1}}}
])

MongoDB: count number of matches for OR condition query

Given a MongoDB with nested documents
collection = client.test.or_example
new_documents = [
{'_id': 1, 'proportions':{'A': 0.3, 'B': 0.1}},
{'_id': 2, 'proportions':{'C': 0.3, 'D': 0.1}},
{'_id': 3, 'proportions':{'A': 0.3, 'C': 0.3}},
{'_id': 4, 'proportions':{'B': 0.1, 'D': 0.3}},
{'_id': 5, 'proportions':{'A': 0.1, 'B': 0.3}}]
collection.insert_many(new_documents)
I can construct a query that uses OR conditions
collection.find({'$or': [{'proportions.A': {'$gt': 0.2}},
{'proportions.B': {'$gt': 0.2}},
{'proportions.C': {'$gt': 0.2}}]}
which returns four documents (id's 1, 2, 3, 5). Now I'd like to sort these documents by the number of OR conditions they satisfy, so 3, 1, 2, 5 (with respective number of matches 2, 1, 1, 1).
I've been experimenting with counting the number of OR matches in an aggregation pipeline, but can't get it to work. I've managed to create a related field "coverage", but my current try for "number_matches" isn't valid syntax.
results = collection.aggregate([
{
'$match': {'$or': [{'proportions.A': {'$gt': 0.2}},
{'proportions.B': {'$gt': 0.2}},
{'proportions.C': {'$gt': 0.2}}]}
},
{
'$addFields':
{
'coverage': {'$sum': [ '$proportions.A', '$proportions.B', '$proportions.C']},
'number_matches': {'$sum': [ {'cond': [{'proportions.A': {'$gt': 0.2}}, 1, 0]},
{'cond': [{'proportions.B': {'$gt': 0.2}}, 1, 0]},
{'cond': [{'proportions.C': {'$gt': 0.2}}, 1, 0]} ] }
}
},
{
'$sort': {'number_matches': -1}
}
])
Also, my current try feels rather convoluted, so I hope there might be a simpler way.
I'm looking for a solution that works with MongoDB 3.4, but in case there's a more elegant or faster solution for 3.6, I'd also be interested in that.
Maybe you will need to add an intermediate step
Execute the $match
Create a match for each proportion (matchA, matchB, matchC)
Sum the matches

MongoDB: How to create sparse+unique indexes with optional keys?

I have a collection stats.daily that stores multiple dimensional data, but depending the type it requires other specific keys.
And also it's required that these keys (depending the type) are unique.
Example, in this case url key is required because type is "url":
{
"site": 1,
"type": "url",
"url": "http://google.com/"
"totals": {
"variable1": 12, // incrementing values
"variable2": 32
}
}
Another example:
{
"site": 1,
"type": "domain",
"domain": "google.com",
"totals": {...}
}
So I create the indexes:
db.coll.createIndex({site: 1, type: 1, url: 1}, {unique: true, sparse: true});
db.coll.createIndex({site: 1, type: 1, domain: 1}, {unique: true, sparse: true});
It doesn't works, it returns exception: E11000 duplicate key error index.
Makes sense for the unique index, but it should work for the sparse part.
What's the best solution to accomplish what I need?
Edit:
In some cases, depending the type, there may have more than one related key:
{
"site": 1,
"type": "search",
"engine": "google",
"term": "programming",
"totals": {...}
}
And the unique index will include these two new keys.

How do i move data from one column to another in mongodb?

I want to move my data from the column "demographic.education" into "demographic.school". How could i do this?
For example:
db.users.update({"demographic.education":{$exists: true, $ne: ""}}, {$set: {"demographic.school":demographic.education}})
You can use $rename modifier for this:
db.users.update({"demographic.education": {$exists: true, $ne: ""}}, {$rename: {"demographic.education": "demographic.school"}})
Documentation