how to limit only one document for every user

how to limit only one document for every user - mongodb

I have one mongo collection like this:
| user_id | log | create_at |
| 1 | "login" | 1490688500 |
| 1 | "logout" | 1490688400 |
| 2 | "view_xxx" | 1490688300 |
| 2 | "cpd" | 1490688100 |
How can I get only one latest log for every user, such as
| 1 | "logon" | 1490688500 |
| 2 | "view_xxx" | 1490688300 |

You can use mongodb aggregation framework and you can run the following command:
db.collection.aggregate(
[
{ '$sort' : { 'created_at' : -1 } },
{ '$group' : { '_id' : '$user_id' , 'log' : { '$last' : '$log' }, 'created_at' : { '$last' : '$created_at' } } }
]
)
docs:
https://docs.mongodb.com/manual/reference/operator/aggregation/last/
https://docs.mongodb.com/manual/reference/operator/aggregation/sort/
https://docs.mongodb.com/manual/reference/operator/aggregation/group/

Related

Query to create Nested JSon object with PostgreSQL

I'm trying to trigger the result of a query to nested Json using
array_to_json and row_to_json function that was added in PostgreSQL
I'm having trouble figuring out the best way as nested objects Here
I've tried query and output with table test3
select array_to_json(array_agg(row_to_json(test3))) from (select
student_name, student_id, promoted from test3 where loc in (select
max(loc) from test3)) test3;
student_id | student_id | promoted | maths | Science | loc
-----------|------------|----------|-------|---------|----------------------------- 19 | John | yes | 75 | 76 | 2022-06-28
06:10:27.25911-04 12 | Peter | no | 79 | 65
| 2022-06-28 06:10:27.25911-04 87 | Steve | yes |
69 | 76 | 2022-06-28 06:59:57.216754-04 98 | Smith
| yes | 78 | 82 | 2022-06-28 06:59:57.216754-04
[ { "student_name": "Steve", "student_id" : 87, "promoted" :
"yes", }, { "student_name": "Smith", "student_id" : 98,
"promoted" : "yes",
} ]
But i need to generate Json output as below
{ "students" : [
{ "student_name" : "Steve", "Details":{
"student_id" : 87,
"promoted" : "yes" }
}, { "student_name" : "Smith", "Details":{
"student_id" : 98,
"promoted" : "yes" }}]}

Mongodb embedding

I'm novice to mongodb please accept my apologies if this question is rudimentary.
I was looking to embed documents in csv to places collection. I'm not sure how to do this in mongo compass.
Any help would be appreciated.
places Collection
{
"_id" : 123456,
"acceptedPaymentModes": "cash",
"address": {
"city": "victoria",
"country": "Mexico",
"state": "tamaulipas",
"street": "frente al tecnologico"
},
}
CSV file (openingHours)
PlaceId | hours | days
123456 | 00:00-23:30 | Mon;Tue;Wed;Thu;Fri;
123456 | 00:00-23:30 | Sat;
123456 | 00:00-23:30 | Sun;
is it possible to embed the opening hours into places collection?
Thanks in advance.

How to concatenate documents from different collections in one with Mongo and the aggregation framework?

I have 4 collections in my database
(with the name of their attributes indicated in parenthesis):
1. posts
(with attributes --- CreationDate, PostId, PostTypeId and UserId),
2. users
(with attributes --- CreationDate and Id),
3. votes
(with attributes --- CreationDate, Id and UserId)
and
4. comments
(with attributes --- CreationDate, Id and UserId).
I want to create a collection named facts that merges the information contained in the previous table.
It is going to be a collection formed by the collections posts, users, votes, and comments.
I mean, each document of the collection facts can come from the collections posts, users, votes or comments.
The structure of the collection facts will be:
PostId | PostTypeId | userId | VoteId | CommentId
if document comes from posts | present | present | present | null | null |
if document comes from users | null | null | present | null | null |
if document comes from votes | present | null | present | present | null |
if document comes from comments | present | null | present | null | present |
As you see this is going to be a very sparse collection (it is going to have a lot of null values).
How can I construct this fact collection using MongoDB and the aggregation framework?
I tried this to merge the posts and the users collections, but it did not shown any results:
respuestas = db.users.aggregate( [
{'$lookup': {
'from': "posts",
'localField': "Id",
'foreignField': "OwnerUserId",
'as': "p"}
},
{ '$unwind': '$p'},
{
"$group": {
"_id": "$Id",
"users": {
"$push": {
'CreationDate' : '$CreationDate',
'Post' : '',
'PostType' : '',
'UserId': '$Id',
'VoteId' : '',
'CommentId' : ''
}
},
"p": {
"$push": {
'CreationDate' : '$p.CreationDate',
'Post' : '$p.Id',
'PostType' : '$p.PostTypeId',
'UserId': '$p.OwnerUserId',
'VoteId' : '',
'CommentId' : ''
}
}
}
},
{'$limit': 20}
])
list(respuestas)

Finally I found the way to do what I was looking for. I post the solution in case someone benefits from it.
db.users.aggregate( [
{ '$project' :
{
'CreationDate' : '$CreationDate',
'Post' : '',
'PostType' : '',
'UserId': '$Id',
'VoteId' : '',
'CommentId' : ''
}
},
{'$out': "StackOverflowFacts"}
])
db.votes.aggregate( [
{ '$project' :
{
'CreationDate' : '$CreationDate',
'Post' : '$PostId',
'PostType' : '',
'UserId': '$UserId',
'VoteId' : '$Id',
'CommentId' : ''
}
},
{'$merge' : { 'into' : "StackOverflowFacts" } }
])
db.comments.aggregate( [
{ '$project' :
{
'CreationDate' : '$CreationDate',
'Post' : '$PostId',
'PostType' : '',
'UserId': '$UserId',
'VoteId' : '',
'CommentId' : '$Id'
}
},
{'$merge' : { 'into' : "StackOverflowFacts" } }
])
db.posts.aggregate( [
{ '$project' :
{
'CreationDate' : '$CreationDate',
'Post' : '$Id',
'PostType' : '$PostTypeId',
'UserId': '$OwnerUserId',
'VoteId' : '',
'CommentId' : ''
}
},
{'$merge' : { 'into' : "StackOverflowFacts" } }
])
As you can see, with $out you can create a new collection with output of a query. Then, using $merge, you can add content (adding rows) to the created collection.
You can also improve the collection by creating some indexes, in order to do the queries faster: using db.StackOverflowFacts.create_index.
You can visualize the beginning of the collection by doing:
list(db.StackOverflowFacts.find().limit( 5 ))

Error on querying FIWARE Orion with orderBy custom attribute

I am facing the following issue while querying Orion with orderBy, asking the results to return on chronological order. My data looks like this:
{
"id": "some_unique_id",
"type": "sensor",
"someVariable": {
"type": "String",
"value": "1",
"metadata": {}
},
"datetime": {
"type": "DateTime",
"value": "2018-09-21T00:38:57.00Z",
"metadata": {}
},
"humidity": {
"type": "Float",
"value": 55.9,
"metadata": {}
},
"someMoreVariables": {
"type": "Float",
"value": 6.29,
"metadata": {}
}
My call looks like this:
http://my.ip.com:1026/v2/entities?type=sensor&offset=SOMENUMBERMINUS30&limit=30&orderBy=datetime
Unfortunately, the response is the following:
{
"error": "InternalServerError",
"description": "Error at querying MongoDB"}
Both tenant and subtenant are used in the call, while the Orion version is 1.13.0-next and tenant has been indexed inside the MongoDB. I am running Orion and MongoDB from different Docker instances in the same server.
As always, any help will be highly appreciated.
EDIT1: After fgalan's recommendation, I am adding the relative record from the log (I am sorry, I didn't do it from the beginning):
BadInput some.ip
time=2018-10-16T07:47:36.576Z | lvl=ERROR | corr=bf8461dc-d117-11e8-b0f1-0242ac110003 | trans=1539588749-153-00000013561 | from=some.ip | srv=some_tenant | subsrv=/some_subtenant | comp=Orion | op=AlarmManager.cpp[211]:dbError | msg=Raising alarm DatabaseError: nextSafe(): { $err: "Executor error: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.", code: 17144 }
time=2018-10-16T07:47:36.576Z | lvl=ERROR | corr=bf8461dc-d117-11e8-b0f1-0242ac110003 | trans=1539588749-153-00000013561 | from=some.ip | srv=some_tenant | subsrv=/some_subtenant | comp=Orion | op=AlarmManager.cpp[235]:dbErrorReset | msg=Releasing alarm DatabaseError
From the above, it is clear that indexing is required. I have already done that according to fgalan's answer to another question I had in the past: Indexing Orion
EDIT2: Orion response after indexing:
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "orion.entities"
},
{
"v" : 2,
"key" : {
"location.coords" : "2dsphere"
},
"name" : "location.coords_2dsphere",
"ns" : "orion.entities",
"2dsphereIndexVersion" : 3
},
{
"v" : 2,
"key" : {
"creDate" : 1
},
"name" : "creDate_1",
"ns" : "orion.entities"
}
]

You have an index {creDate: 1} which is the fine if you order by entity creation date using dateCreated (or doesn't specify the orderBy parameters, as creation date is the default ordering):
GET /v2/entities
GET /v2/entities?orderBy=dateCreated
However, if you plan to order by a different attribute defined by you (as I understand datetime is) and get the OperationFailed: Sort operation used more than the maximum error, then you have to create an index for the value of such attribute. In particular you have to create this index:
{ attrs.datetime.value: 1 }
EDIT: as suggested by a comment to this answer, the command for creating the above index typically is:
db.entities.createIndex({"attrs.datetime.value": 1});
EDIT2: have a look to this section in documentation for more detail on this kind of indexes.

Spark Dataframe : Pivot with sorting

I am reading the following json file into Dataframe in spark:
{"id" : "a", "country" : "uk", "date" : "2016-01-01"}
{"id" : "b", "country" : "uk", "date" : "2016-01-02"}
{"id" : "c", "country" : "fr", "date" : "2016-02-01"}
{"id" : "d", "country" : "de", "date" : "2016-03-01"}
{"id" : "e", "country" : "tk", "date" : "2016-04-01"}
{"id" : "f", "country" : "be", "date" : "2016-05-01"}
{"id" : "g", "country" : "nl", "date" : "2016-06-01"}
{"id" : "h", "country" : "uk", "date" : "2016-06-01"}
I then apply groupBy on it and pivot it on date, here's the (pseudo) code:
val df = spark.read.json("file.json")
val dfWithFormattedDate = df.withColumn("date", date_format(col("date"), "yyyy-MM"))
dfWithFormattedDate.groupBy("country").pivot("date").agg(countDistinct("id").alias("count")).orderBy("country")
This gives me the Dataframe with country and pivoted dates (months) as columns. I would then like to order the results in descending order of total count. However, I don't have count as one of the columns and I can't apply pivot after applying count() on groupBy as it returns Dataset and not RelationalGroupedDataset. I have tried the following as well:
dfWithFormattedDate.groupBy("country").pivot("date").count()
This does not give me count column either. Is there any way I can gave both count and pivoted dates in resultant Dataset so that I can order by count desc?
Update
Here's the current output:
country|2016-01|2016-02|2016-03| ....
fr | null | 1 | null |
be | null | null | null |
uk | 2 | null | null |
Here's the expected output:
country|count|2016-01|2016-02|2016-03| ....
uk | 3 | 2 | null | null |
fr | 1 | null | 1 | null |
be | 1 | null | null | null |
As you can see, I need the count column in the result and order the rows in descending order of count. Ordering without explicitly having count column is fine as well.

If our starting point is this DataFrame :
import org.apache.spark.sql.functions.{date_format ,col, countDistinct}
val result = df.withColumn("date", date_format(col("date"), "yyyy-MM"))
.groupBy("country").pivot("date").agg(countDistinct("id").alias("count"))
.na.fill(0)
We then can simply calculate the rowsum for all the columns excluding the country column:
import org.apache.spark.sql.functions.desc
val test = result.withColumn("count",
result.columns.filter(_!="country")
.map(c => col(c))
.reduce((x, y) => x + y))
.orderBy(desc("count"))
test.show()
+-------+-------+-------+-------+-------+-------+-------+-----+
|country|2016-01|2016-02|2016-03|2016-04|2016-05|2016-06|count|
+-------+-------+-------+-------+-------+-------+-------+-----+
| uk| 2| 0| 0| 0| 0| 1| 3|
| be| 0| 0| 0| 0| 1| 0| 1|
| de| 0| 0| 1| 0| 0| 0| 1|
| tk| 0| 0| 0| 1| 0| 0| 1|
| nl| 0| 0| 0| 0| 0| 1| 1|
| fr| 0| 1| 0| 0| 0| 0| 1|
+-------+-------+-------+-------+-------+-------+-------+-----+