In MongoDB, how to clone a column in a collection? - mongodb

Is there a way to add a new column in a collection which is clone of an existing column in the same collection?
PersonTable
_id | Name
1 | John
Result
_id | Name | Name(cloned)
1 | John | John
Hopefully without a foreach loop.

You can use bulkWrite operation
const persons = await PersionTable.find({})
const updateTable = await PersionTable.bulkWrite(
persons.map((person) => {
person.clonedName = person.name
return({
updateOne : {
filter: { _id: person._id },
update: { $set: person }
}
})
})
)

Related

How to handle nested array in a DRUID

My json is as below:
{
"id":11966121,
"employer_id":175,
"account_attributes":[
{
"id":155387028,
"is_active":false,
"created_at":"2018-06-06T02:12:25.243Z",
"updated_at":"2021-03-15T17:38:04.598Z"
},
{
"id":155387062,
"is_active":true,
"created_at":"2018-06-06T02:12:25.243Z",
"updated_at":"2021-03-15T17:38:04.598Z"
}
],
"created_at":"2017-12-13T18:31:04.000Z",
"updated_at":"2021-03-14T23:50:43.180Z"
}
I want to parse the message and have a table with flatten account_attributes
Considering the sample payload the o/p should have two rows:
id |account_attributes_id| is_active | created_at | updated_at|
11966121|155387028|false |2018-06-06T02:12:25.243Z|2021-03-15T17:38:04.598Z |
11966121|155387062|true |2018-06-06T02:12:25.243Z|2021-03-15T17:38:04.598Z |
Is this possible?

Cross-venue visitor reporting approach in Location Based Service system

I'm finding an approach to resolve cross-venue vistor report for my client, he wants an HTTP API that return the total unique count of his customer who has visited more than one shop in day range (that API must return in 1-2 seconds).
The raw data sample (...millions records in reality):
--------------------------
DAY | CUSTOMER | VENUE
--------------------------
1 | cust_1 | A
2 | cust_2 | A
3 | cust_1 | B
3 | cust_2 | A
4 | cust_1 | C
5 | cust_3 | C
6 | cust_3 | A
Now, I want to calculate the cross-visitor report. IMO the steps would be as following:
Step 1: aggregate raw data from day 1 to 6
--------------------------
CUSTOMER | VENUE VISIT
--------------------------
cus_1 | [A, B, C]
cus_2 | [A]
cus_3 | [A, C]
Step 2: produce the final result
Total unique cross-customer: 2 (cus_1 and cus_3)
I've tried somes solutions:
I firstly used MongoDB to store data, then using Flask to write an API that uses MongoDB's utilities: aggregation, addToSet, group, count... But the API's response time is unacceptable.
Then, I switched to ElasticSearch with hope on its Aggregation command sets, but they do not support pipeline group command on the output result from the first "terms" aggregation.
After that, I read about Redis Sets, Sorted Sets,... But they couldn't help.
Could you please show me a clue to solve my problem.
Thank in advanced!
You can easily do this with Elasticsearch by leveraging one date_histogram aggregation to bucket by day, two terms aggregations (first bucket by customer and then by venue) and then only select the customers which visited more than one venue any given day using the bucket_selector pipeline aggregation. It looks like this:
POST /sales/_search
{
"size": 0,
"aggs": {
"by_day": {
"date_histogram": {
"field": "date",
"interval": "day"
},
"aggs": {
"customers": {
"terms": {
"field": "customer.keyword"
},
"aggs": {
"venues": {
"terms": {
"field": "venue.keyword"
}
},
"cross_selector": {
"bucket_selector": {
"buckets_path": {
"venues_count": "venues._bucket_count"
},
"script": {
"source": "params.venues_count > 1"
}
}
}
}
}
}
}
}
}
In the result set, you'll get customers 1 and 3 as expected.
UPDATE:
Another approach involves using a scripted_metric aggregation in order to implement the logic yourself. It's a bit more complicated and might not perform well depending on the number of documents and hardware you have, but the following algorithm would yield the response 2 exactly as you expect:
POST sales/_search
{
"size":0,
"aggs": {
"unique": {
"scripted_metric": {
"init_script": "params._agg.visits = new HashMap()",
"map_script": "def cust = doc['customer.keyword'].value; def venue = doc['venue.keyword'].value; def venues = params._agg.visits.get(cust); if (venues == null) { venues = new HashSet(); } venues.add(venue); params._agg.visits.put(cust, venues)",
"combine_script": "def merged = new HashMap(); for (v in params._agg.visits.entrySet()) { def cust = merged.get(v.key); if (cust == null) { merged.put(v.key, v.value) } else { cust.addAll(v.value); } } return merged",
"reduce_script": "def merged = new HashMap(); for (agg in params._aggs) { for (v in agg.entrySet()) {def cust = merged.get(v.key); if (cust == null) {merged.put(v.key, v.value)} else {cust.addAll(v.value); }}} def unique = 0; for (m in merged.entrySet()) { if (m.value.size() > 1) unique++;} return unique"
}
}
}
}
Response:
{
"took": 1413,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0,
"hits": []
},
"aggregations": {
"unique": {
"value": 2
}
}
}

Embedding fields in all mongodb documents

I have a collection with documents that follows this structure:
child:
{
id: int
name: string
age: int
dob: date
school: string
class: string
}
I would like to embed certain fields, into something like this:
child:
{
id : int
personalInfo {
name: string
age: int
dob: date
}
educationInfo {
school: string
class: string
}
}
How would one go across in doing this in code? I am new to Mongodb, so I apologize if my syntax is incorrect. All of the fields have one-to-one relationships with the child (i.e. one child has one id, one name, one age, one school etc.), so I'm also wondering if embedding is even necessary.
Please try to use $set to set the new field personalInfo and educationInfo, with #unset to remove old fields age, name etc. Before do it, it would be better to check all those fields exists through $exists, here are sample codes as below,
> var personfields = [ "name", "age", "dob" ];
> var educationFields = [ "school", "class" ];
> var query = {};
> personFields.forEach(function(k){ query[k] = {$exists: 1}});
> educationFields.forEach(function(k){ query[k] = {$exists: 1}});
> db.collection.find(query).forEach(function(doc){
var personalInfo = {};
var educationInfo = {};
for (var k in doc) {
if (personFields.indexOf(k) !== -1){
personalInfo[k] = doc[k];
} else if (educationFields.indexOf(k) !== -1) {
educationInfo[k] = doc[k];
}
}
db.collection.update({_id: doc._id},
{$set: {
personalInfo: personalInfo,
educationInfo: educationInfo},
$unset: {'name': '',
'age': '',
'dob': '',
'school': '',
'class': ''}});
})
It's OK to embed them, that's what document dB's are for. So if you need a migration, you'll basically use mongodb's functions like update ,with $set and $unset.
See more here: https://docs.mongodb.org/manual/reference/method/db.collection.update/

How to update nested object (i.e., doc field having object type) in single document in mongodb

I have doc collection which have object type field named price (i.e., see below), I just want to update/insert that field by adding new key value pairs to it.
suppose i have this as collection (in db):
[
{
_id: 1,
price: {
amazon: 102.1,
apple: 500
}
},
....
....
];
Now I want to write an query which either update price's or inserts if not exist in price.
let's suppose these as input data to update/insert with:
var key1 = 'ebay', value1 = 300; // will insert
var key2 = 'amazon', value2 = 100; // will update
assume doc having _id: 1 for now.
Something like $addToSet operator?, Though $addToSet only works for array & i want to work within object).
expected output:
[
{
_id: 1,
price: {
amazon: 100, // updated
apple: 500,
ebay: 300 // inserted
}
},
....
....
];
How can i do/achieve this?
Thanks.
You could construct the update document dynamically to use the dot notation and the $set operator to do the update correctly. Using your example above, you'd want to run the following update operation:
db.collection.update(
{ "_id": 1 },
{
"$set": { "price.ebay": 300, "price.amazon": 100 }
}
)
So, given the data input, you would want to construct an update document like { "price.ebay": 300, "price.amazon": 100 }
With the inputs as you have described
var key1 = 'ebay', value1 = 300; // will insert
var key2 = 'amazon', value2 = 100; // will update
Construct the update object:
var query = { "_id": 1 },
update = {};
update["price."+key1] = value1;
update["price."+key2] = value2;
db.collection.update(query, {"$set": update});

sql server Row Number with partition over in MongoDB for returning a subset of rows

How to write below query using MongoDB-Csharp driver
SELECT SubSet.*
FROM ( SELECT T.ProductName ,
T.Price ,
ROW_NUMBER() OVER ( PARTITION BY T.ProductName ORDER BY T.ProductName ) AS ProductRepeat
FROM myTable T
) SubSet
WHERE SubSet.ProductRepeat = 1
What I am trying to achieve is
Collection
ProductName|Price|SKU
Cap|10|AB123
Bag|5|ED567
Cap|20|CD345
Cap|5|EC123
Expected results is
ProductName|Price|SKU
Cap|10|AB123
Bag|5|ED567
Here is the one attempt (please don't go with the object and fields)
public List<ProductOL> Search(ProductOL obj, bool topOneOnly)
{
List<ProdutOL> products = new List<ProductOL>();
var database = MyMongoClient.Instance.OpenToRead(dbName: ConfigurationManager.AppSettings["MongoDBDefaultDB"]);
var collection = database.GetCollection<RawBsonDocument>("Products");
List<IMongoQuery> build = new List<IMongoQuery>();
if (!string.IsNullOrEmpty(obj.ProductName))
{
var ProductNameQuery = Query.Matches("ProductName", new BsonRegularExpression(obj.ProductName, "i"));
build.Add(ProductNameQuery);
}
if (!string.IsNullOrEmpty(obj.BrandName))
{
var brandNameQuery = Query.Matches("BrandName", new BsonRegularExpression(obj.BrandName, "i"));
build.Add(brandNameQuery);
}
var fullQuery = Query.And(build.ToArray());
products = collection.FindAs<ProductOL>(fullQuery).SetSortOrder(SortBy.Ascending("ProductName")).ToList();
if (topOneOnly)
{
var tmpProducts = new List<ProductOL>();
foreach (var item in products)
{
if (tmpProducts.Any(x => x.ProductName== item.ProductName)) { }
else
tmpProducts.Add(item);
}
products = tmpProducts;
}
return products;
}
my mongo query works and gives me the right results. But that is not effeciant when I am dealing with huge data, so I was wondering if mongodb has any concepts like SQL Server for Row_Number() and Partitioning
If your query returns the expected results but isn't efficient, you should look into index usage with explain(). Given your query generation code includes conditional clauses, it seems likely you will need multiple indexes to efficiently cover common variations.
I'm not sure how the C# code you've provided relates to the original SQL query, as they seem to be entirely different. I'm also not clear how grouping is expected to help your query performance, aside from limiting the results returned.
Equivalent of the SQL query
There is no direct equivalent of ROW_NUMBER() .. PARTITION BY grouping in MongoDB, but you should be able to work out the desired result using either the Aggregation Framework (fastest) or Map/Reduce (slower but more functionality). The MongoDB manual includes an Aggregation Commands Comparison as well as usage examples.
As an exercise in translation, I'll focus on your SQL query which is pulling out the first product match by ProductName:
SELECT SubSet.*
FROM ( SELECT T.ProductName ,
T.Price ,
ROW_NUMBER() OVER ( PARTITION BY T.ProductName ORDER BY T.ProductName ) AS ProductRepeat
FROM myTable T
) SubSet
WHERE SubSet.ProductRepeat = 1
Setting up the test data you provided:
db.myTable.insert([
{ ProductName: 'Cap', Price: 10, SKU: 'AB123' },
{ ProductName: 'Bag', Price: 5, SKU: 'ED567' },
{ ProductName: 'Cap', Price: 20, SKU: 'CD345' },
{ ProductName: 'Cap', Price: 5, SKU: 'EC123' },
])
Here's an aggregation query in the mongo shell which will find the first match per group (ordered by ProductName). It should be straightforward to translate that aggregation query to the C# driver using the MongoCollection.Aggregate() method.
I've included comments with the rough equivalent SQL fragment in your original query.
db.myTable.aggregate(
// Apply a sort order so the $first product is somewhat predictable
// ( "ORDER BY T.ProductName")
{ $sort: {
ProductName: 1
// Should really have additional sort by Price or SKU (otherwise order may change)
}},
// Group by Product Name
// (" PARTITION BY T.ProductName")
{ $group: {
_id: "$ProductName",
// Find first matching product details per group (can use $$CURRENT in MongoDB 2.6 or list specific fields)
// "SELECT SubSet.* ... WHERE SubSet.ProductRepeat = 1"
Price: { $first: "$Price" },
SKU: { $first: "$SKU" },
}},
// Rename _id to match expected results
{ $project: {
_id: 0,
ProductName: "$_id",
Price: 1,
SKU: 1,
}}
)
Results given the test data appear to be what you were looking for:
{ "Price" : 10, "SKU" : "AB123", "ProductName" : "Cap" }
{ "Price" : 5, "SKU" : "ED567", "ProductName" : "Bag" }
Notes:
This aggregation query uses the $first operator, so if you want to find the second or third product per grouping you'd need a different approach (eg. $group and then take the subset of results needed in your application code)
If you want predictable results for finding the first item in a $group there should be more specific sort criteria than ProductName (for example, sorting by ProductName & Price or ProductName & SKU). Otherwise the order of results may change in future as documents are added or updated.
Thanks to #Stennie with the help of his answer I could come up with C# aggregation code
var match = new BsonDocument
{
{
"$match",
new BsonDocument{
{"ProductName", new BsonRegularExpression("cap", "i")}
}
}
};
var group = new BsonDocument
{
{"$group",
new BsonDocument
{
{"_id", "$ProductName"},
{"SKU", new BsonDocument{
{
"$first", "$SKU"
}}
}
}}
};
var project = new BsonDocument{
{
"$project",
new BsonDocument
{
{"_id", 0 },
{"ProductName","$_id" },
{"SKU", 1}
}}};
var sort = new BsonDocument{
{
"$sort",
new BsonDocument
{
{
"ProductName",1 }
}
}};
var pipeline = new[] { match, group, project, sort };
var aggResult = collection.Aggregate(pipeline);
var products= aggResult.ResultDocuments.Select(BsonSerializer.Deserialize<ProductOL>).ToList();
Using AggregateArgs
AggregateArgs args = new AggregateArgs();
List<BsonDocument> piple = new List<BsonDocument>();
piple.Add(match);
piple.Add(group);
piple.Add(project);
piple.Add(sort);
args.Pipeline = piple;
// var pipeline = new[] { match, group, project, sort };
var aggResult = collection.Aggregate(args);
products = aggResult.Select(BsonSerializer.Deserialize<ProductOL>).ToList();