I'm trying to trigger the result of a query to nested Json using
array_to_json and row_to_json function that was added in PostgreSQL
I'm having trouble figuring out the best way as nested objects Here
I've tried query and output with table test3
select array_to_json(array_agg(row_to_json(test3))) from (select
student_name, student_id, promoted from test3 where loc in (select
max(loc) from test3)) test3;
student_id | student_id | promoted | maths | Science | loc
-----------|------------|----------|-------|---------|----------------------------- 19 | John | yes | 75 | 76 | 2022-06-28
06:10:27.25911-04 12 | Peter | no | 79 | 65
| 2022-06-28 06:10:27.25911-04 87 | Steve | yes |
69 | 76 | 2022-06-28 06:59:57.216754-04 98 | Smith
| yes | 78 | 82 | 2022-06-28 06:59:57.216754-04
[ { "student_name": "Steve", "student_id" : 87, "promoted" :
"yes", }, { "student_name": "Smith", "student_id" : 98,
"promoted" : "yes",
} ]
But i need to generate Json output as below
{ "students" : [
{ "student_name" : "Steve", "Details":{
"student_id" : 87,
"promoted" : "yes" }
}, { "student_name" : "Smith", "Details":{
"student_id" : 98,
"promoted" : "yes" }}]}
Related
I'm novice to mongodb please accept my apologies if this question is rudimentary.
I was looking to embed documents in csv to places collection. I'm not sure how to do this in mongo compass.
Any help would be appreciated.
places Collection
{
"_id" : 123456,
"acceptedPaymentModes": "cash",
"address": {
"city": "victoria",
"country": "Mexico",
"state": "tamaulipas",
"street": "frente al tecnologico"
},
}
CSV file (openingHours)
PlaceId | hours | days
123456 | 00:00-23:30 | Mon;Tue;Wed;Thu;Fri;
123456 | 00:00-23:30 | Sat;
123456 | 00:00-23:30 | Sun;
is it possible to embed the opening hours into places collection?
Thanks in advance.
I have a dataframe in which one of the string type column contains a list of items that I want to explode and make it part of the parent dataframe. How can I do it?
Here is the code to create a sample dataframe:
from pyspark.sql import Row
from collections import OrderedDict
def convert_to_row(d: dict) -> Row:
return Row(**OrderedDict(sorted(d.items())))
df=sc.parallelize([{"arg1": "first", "arg2": "John", "arg3" : '[{"name" : "click", "datetime" : "1570103345039", "event" : "entry" }, {"name" : "drag", "datetime" : "1580133345039", "event" : "exit" }]'},{"arg1": "second", "arg2": "Joe", "arg3": '[{"name" : "click", "datetime" : "1670105345039", "event" : "entry" }, {"name" : "drop", "datetime" : "1750134345039", "event" : "exit" }]'},{"arg1": "third", "arg2": "Jane", "arg3" : '[{"name" : "click", "datetime" : "1580105245039", "event" : "entry" }, {"name" : "drop", "datetime" : "1650134345039", "event" : "exit" }]'}]) \
.map(convert_to_row).toDF()
Running this code will create a dataframe as shown below:
+------+----+--------------------+
| arg1|arg2| arg3|
+------+----+--------------------+
| first|John|[{"name" : "click...|
|second| Joe|[{"name" : "click...|
| third|Jane|[{"name" : "click...|
+------+----+--------------------+
The arg3 column contains a list which I want to explode it into the detailed columns. I want the dataframe as follows:
arg1 | arg2 | arg3 | name | datetime | event
How can I achieve that?
You need to specify array to the schema in from_json function:
from pyspark.sql.functions import explode, from_json
schema = 'array<struct<name:string,datetime:string,event:string>>'
df.withColumn('data', explode(from_json('arg3', schema))) \
.select(*df.columns, 'data.*') \
.show()
+------+----+--------------------+-----+-------------+-----+
| arg1|arg2| arg3| name| datetime|event|
+------+----+--------------------+-----+-------------+-----+
| first|John|[{"name" : "click...|click|1570103345039|entry|
| first|John|[{"name" : "click...| drag|1580133345039| exit|
|second| Joe|[{"name" : "click...|click|1670105345039|entry|
|second| Joe|[{"name" : "click...| drop|1750134345039| exit|
| third|Jane|[{"name" : "click...|click|1580105245039|entry|
| third|Jane|[{"name" : "click...| drop|1650134345039| exit|
+------+----+--------------------+-----+-------------+-----+
Note: if your Spark version does not support simpleString format for schema, try the following:
from pyspark.sql.types import ArrayType, StringType, StructType, StructField
schema = ArrayType(
StructType([
StructField('name',StringType())
, StructField('datetime',StringType())
, StructField('event',StringType())
])
)
I am facing the following issue while querying Orion with orderBy, asking the results to return on chronological order. My data looks like this:
{
"id": "some_unique_id",
"type": "sensor",
"someVariable": {
"type": "String",
"value": "1",
"metadata": {}
},
"datetime": {
"type": "DateTime",
"value": "2018-09-21T00:38:57.00Z",
"metadata": {}
},
"humidity": {
"type": "Float",
"value": 55.9,
"metadata": {}
},
"someMoreVariables": {
"type": "Float",
"value": 6.29,
"metadata": {}
}
My call looks like this:
http://my.ip.com:1026/v2/entities?type=sensor&offset=SOMENUMBERMINUS30&limit=30&orderBy=datetime
Unfortunately, the response is the following:
{
"error": "InternalServerError",
"description": "Error at querying MongoDB"}
Both tenant and subtenant are used in the call, while the Orion version is 1.13.0-next and tenant has been indexed inside the MongoDB. I am running Orion and MongoDB from different Docker instances in the same server.
As always, any help will be highly appreciated.
EDIT1: After fgalan's recommendation, I am adding the relative record from the log (I am sorry, I didn't do it from the beginning):
BadInput some.ip
time=2018-10-16T07:47:36.576Z | lvl=ERROR | corr=bf8461dc-d117-11e8-b0f1-0242ac110003 | trans=1539588749-153-00000013561 | from=some.ip | srv=some_tenant | subsrv=/some_subtenant | comp=Orion | op=AlarmManager.cpp[211]:dbError | msg=Raising alarm DatabaseError: nextSafe(): { $err: "Executor error: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.", code: 17144 }
time=2018-10-16T07:47:36.576Z | lvl=ERROR | corr=bf8461dc-d117-11e8-b0f1-0242ac110003 | trans=1539588749-153-00000013561 | from=some.ip | srv=some_tenant | subsrv=/some_subtenant | comp=Orion | op=AlarmManager.cpp[235]:dbErrorReset | msg=Releasing alarm DatabaseError
From the above, it is clear that indexing is required. I have already done that according to fgalan's answer to another question I had in the past: Indexing Orion
EDIT2: Orion response after indexing:
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "orion.entities"
},
{
"v" : 2,
"key" : {
"location.coords" : "2dsphere"
},
"name" : "location.coords_2dsphere",
"ns" : "orion.entities",
"2dsphereIndexVersion" : 3
},
{
"v" : 2,
"key" : {
"creDate" : 1
},
"name" : "creDate_1",
"ns" : "orion.entities"
}
]
You have an index {creDate: 1} which is the fine if you order by entity creation date using dateCreated (or doesn't specify the orderBy parameters, as creation date is the default ordering):
GET /v2/entities
GET /v2/entities?orderBy=dateCreated
However, if you plan to order by a different attribute defined by you (as I understand datetime is) and get the OperationFailed: Sort operation used more than the maximum error, then you have to create an index for the value of such attribute. In particular you have to create this index:
{ attrs.datetime.value: 1 }
EDIT: as suggested by a comment to this answer, the command for creating the above index typically is:
db.entities.createIndex({"attrs.datetime.value": 1});
EDIT2: have a look to this section in documentation for more detail on this kind of indexes.
I have one mongo collection like this:
| user_id | log | create_at |
| 1 | "login" | 1490688500 |
| 1 | "logout" | 1490688400 |
| 2 | "view_xxx" | 1490688300 |
| 2 | "cpd" | 1490688100 |
How can I get only one latest log for every user, such as
| 1 | "logon" | 1490688500 |
| 2 | "view_xxx" | 1490688300 |
You can use mongodb aggregation framework and you can run the following command:
db.collection.aggregate(
[
{ '$sort' : { 'created_at' : -1 } },
{ '$group' : { '_id' : '$user_id' , 'log' : { '$last' : '$log' }, 'created_at' : { '$last' : '$created_at' } } }
]
)
docs:
https://docs.mongodb.com/manual/reference/operator/aggregation/last/
https://docs.mongodb.com/manual/reference/operator/aggregation/sort/
https://docs.mongodb.com/manual/reference/operator/aggregation/group/
With this dataset, I’m trying to do a query which return for all the document, the rank of the articles relevant. But I don’t know if it is possible with only a mongo query.
content_business : {
id : {….} ,
content : {
uid : « 01234 »,
FL : "bla",
langRef : 1,
articles : [
{
name : « aName »,
rank : 104
},
{
name : « unNom »,
rank : 102
}
]
}
}
A content contains a langRef. This is the index to use to get the right article. Here with the value 1 it means that the articles which is relevant is the one with the index 1 { name : « unNom », rank : 102 }.
For the moment I do a db.find({« FL : bla »}), then with an external program I’m getting the rank of the related (sort of articles[langRef].rank)
But not sure it is the better solution.
Have you got any idea ?
Regards,
Blured.
It is not really a sort as far as I understand :
What I have got in entry :
content_business : {
id : {….} ,
content : {
uid : « 01234 »,
FL : "bla",
langRef : 1,
articles : [
{
name : « aName »,
rank : 104
},
{
name : « unNom »,
rank : 102
}
]
}
},
{
id : {….} ,
content : {
uid : « 99999 »,
FL : "bla",
langRef : 0,
articles : [
{
name : « aaaa »,
rank : 888
},
{
name : « bbbb »,
rank : 102
}
]
}
}
What I'd like in result
{
id : {...},
content : { name : "unNom", rank : 102}
},
{
id : {...},
content : {name : "aaa", rank : 888}
}
The first document give the name unNom & rank 102 as the specified langRef is 1
the second document give the name "aaa" & rank 888 as the specified langRef is 0
Regards,
Blured.