I have a table like this in database
+---+-------------+--------------+
|id | service_name| doc_id |org_id|
+---+-------------+--------------+
| 1 | new service | 12 | 119 |
| | | | |
| 2 | new service | 24 | 119 |
| | | | |
| 3 | old service | 13 | 118 |
| | | | |
| 4 | old service | 14 | 118 |
| | | | |
| 5 | new service | 20 | 119 |
+---+-------------+--------------+
I want to group all the doc_id's according to service_name column
I have tried using
IN my controller
$where_person['org_id'] = $this->post('org_id');
$result_insert = $this->$model_name->fetch_doctor_services($where_person);
In my Model
function fetch_doctor_services($where){
$this->db->select('service_name,doc_id')->from('services');
$this->db->group_by('service_name');
$this->db->where($where);
return $this->db->get()->result();
}
But it does not output data as i desire, by grouping by service_name and all the doc_id's according to that service_name.
where am i going wrong here?
Currently my output is like this.
{ "data":
[ { "service_name": "new service", "doc_id": "12" },
{
"service_name": "old service", "doc_id": "13" }
]
}
You need to use GROUP_CONCAT. See below code on how to use it
$this->db->select('service_name, GROUP_CONCAT( doc_id) ')->from('services');
$this->db->group_by('service_name');
$this->db->where($where);
return $this->db->get()->result();
I have a table called places_log. The schema for places logs is as follows
{
'type': {
'type': 'string',
'required': True,
'allowed': ['in', 'out']
},
'fence_name': {
'type': 'string',
'required': True
},
'time': {
'type': 'datetime',
'required': True
}
}
When a query to get all the documents of this table sorted by fence name and time is made, say the output is as follows
+------------+---------+-----------+
| fence_name | type | time |
+------------+---------+-----------+
| abc | in | 08:30 |
| abc | in | 08:32 |
| abc | out | 09:45 |
| abc | in | 15:18 |
| abc | out | 16:20 |
| abc | out | 16:25 |
| lmn | in | 12:30 |
| pqr | in | 12:40 |
| pqr | out | 13:52 |
| pqr | out | 13:58 |
| xyz | out | 19:43 |
| xyz | out | 19:45 |
+-------------+--------+-----------+
I want a query which will return the following result. For each fence, when there are simultaneous ins, I want the latest in and when there are simultaneous outs, I want the latest out.
+------------+---------+-----------+
| fence_name | type | time |
+------------+---------+-----------+
| abc | in | 08:32 |
| abc | out | 09:45 |
| abc | in | 15:18 |
| abc | out | 16:25 |
| lmn | in | 12:30 |
| pqr | in | 12:40 |
| pqr | out | 13:58 |
| xyz | out | 19:45 |
+-------------+--------+-----------+
Basically there is a feature where the user can create multiple fences on the map we will store the times when the user's vehicle enters or exits the fence. Due to some edge cases, we are getting multiple 'in' events simultaneously without an 'out' event which is not possible. So I am trying to come up with a query where I can only take the last 'in' event (when there are simultaneous ins) and take that time as the time when the vehicle entered the fence.
But the vehicle can enter and exit a fence multiple times. So I have to get all those ins and outs also
Doing an aggregation using group and last will not consider ins and outs which are not simultaneous, for the following aggregation
[
{ "$sort": { "fence_name": 1, "time": 1 } },
{
'$group': {
"_id": {
"fence_name": "$fence_name",
"type": "$type"
},
"time": {
"$last": "$time"
}
}
}
]
We will get a something like this
+------------+---------+-----------+
| fence_name | type | time |
+------------+---------+-----------+
| abc | in | 15:18 |
| abc | out | 16:25 |
| lmn | in | 12:30 |
| pqr | in | 12:40 |
| pqr | out | 13:58 |
| xyz | out | 19:45 |
+-------------+--------+-----------+
Here, I don't get the second time the vehicle entered and exited the fence 'abc'
I want to get multiple ins and outs which are not simultaneous.
And even better if I can get something like this
+------------+---------+-----------+
| fence_name | in | out |
+------------+---------+-----------+
| abc | 08:32 | 09:45 |
| abc | 15:18 | 16:25 |
| lmn | 12:30 | null |
| pqr | 12:40 | 13:58 |
| xyz | null | 19:45 |
+-------------+--------+-----------+
I have an aggregate table that needs to be updated frequently from a table that is generated by new data every few hours. Since
the aggregate table grows a lot every update, I need an efficient way to update the aggregate table.
Can someone please show me how to merge information from a new table to an aggregate table as shown below?
For example
val aggregate_table = Seq(("A", 10),("B", 20),("C", 30),("D", 40)).toDF("id", "total")
| id | total |
|-----+-------|
| "A" | 10 |
| "B" | 20 |
| "C" | 30 |
| "D" | 40 |
val new_info_table = Seq(("X"),("B"),("C"),("B"),("A"),("A"),("C"),("B")).toDF("id")
| id |
|-----|
| "X" |
| "B" |
| "C" |
| "B" |
| "A" |
| "A" |
| "C" |
| "B" |
Resulting aggregate_table after aggregate_table has been merged with
new_info_table
aggregate_table
| id | total |
|-----+-------|
| "A" | 12 |
| "B" | 23 |
| "C" | 33 |
| "D" | 40 |
| "X" | 1 |
I need to "extract" some data contained in an Iterable[MyObject] (it was a RDD[MyObject] before a groupBy).
My initial RDD[MyObject] :
|-----------|---------|----------|
| startCity | endCity | Customer |
|-----------|---------|----------|
| Paris | London | ID | Age |
| | |----|-----|
| | | 1 | 1 |
| | |----|-----|
| | | 2 | 1 |
| | |----|-----|
| | | 3 | 50 |
|-----------|---------|----------|
| Paris | London | ID | Age |
| | |----|-----|
| | | 5 | 40 |
| | |----|-----|
| | | 6 | 41 |
| | |----|-----|
| | | 7 | 2 |
|-----------|---------|----|-----|
| New-York | Paris | ID | Age |
| | |----|-----|
| | | 9 | 15 |
| | |----|-----|
| | | 10| 16 |
| | |----|-----|
| | | 11| 46 |
|-----------|---------|----|-----|
| New-York | Paris | ID | Age |
| | |----|-----|
| | | 13| 7 |
| | |----|-----|
| | | 14| 9 |
| | |----|-----|
| | | 15| 60 |
|-----------|---------|----|-----|
| Barcelona | London | ID | Age |
| | |----|-----|
| | | 17| 66 |
| | |----|-----|
| | | 18| 53 |
| | |----|-----|
| | | 19| 11 |
|-----------|---------|----|-----|
I need to count them by age range by and groupBy startCity - endCity
The final result should be :
|-----------|---------|-------------|
| startCity | endCity | Customer |
|-----------|---------|-------------|
| Paris | London | Range| Count|
| | |------|------|
| | |0-2 | 3 |
| | |------|------|
| | |3-18 | 0 |
| | |------|------|
| | |19-99 | 3 |
|-----------|---------|-------------|
| New-York | Paris | Range| Count|
| | |------|------|
| | |0-2 | 0 |
| | |------|------|
| | |3-18 | 3 |
| | |------|------|
| | |19-99 | 2 |
|-----------|---------|-------------|
| Barcelona | London | Range| Count|
| | |------|------|
| | |0-2 | 0 |
| | |------|------|
| | |3-18 | 1 |
| | |------|------|
| | |19-99 | 2 |
|-----------|---------|-------------|
At the moment I'm doing this by count 3 times the same data (first time with 0-2 range, then 10-20, then 21-99).
Like :
Iterable[MyObject] ite
ite.count(x => x.age match {
case Some(age) => { age >= 0 && age < 2 }
}
It's working by giving me an Integer but not efficient at all I think since I have to count many times, what's the best way to do this please ?
Thanks
EDIT : The Customer object is a case class
def computeRange(age : Int) =
if(age<=2)
"0-2"
else if(age<=10)
"2-10"
// etc, you get the idea
Then, with an RDD of case class MyObject(id : String, age : Int)
rdd
.map(x=> computeRange(x.age) -> 1)
.reduceByKey(_+_)
Edit:
If you need to group by some columns, you can do it this way, provided that you have a RDD[(SomeColumns, Iterable[MyObject])]. The following lines would give you a map that associates each "range" to its number of occurences.
def computeMapOfOccurances(list : Iterable[MyObject]) : Map[String, Int] =
list
.map(_.age)
.map(computeRange)
.groupBy(x=>x)
.mapValues(_.size)
val result1 = rdd
.mapValues( computeMapOfOccurances(_))
And if you need to flatten your data, you can write:
val result2 = result1
.flatMapValues(_.toSeq)
Assuming that you have Customer[Object] as a case class as below
case class Customer(ID: Int, Age: Int)
And your RDD[MyObject] is a rdd of case class as below
case class MyObject(startCity: String, endCity: String, customer: List[Customer])
So using above case classes you should be having input (that you have in table format) as below
MyObject(Paris,London,List(Customer(1,1), Customer(2,1), Customer(3,50)))
MyObject(Paris,London,List(Customer(5,40), Customer(6,41), Customer(7,2)))
MyObject(New-York,Paris,List(Customer(9,15), Customer(10,16), Customer(11,46)))
MyObject(New-York,Paris,List(Customer(13,7), Customer(14,9), Customer(15,60)))
MyObject(Barcelona,London,List(Customer(17,66), Customer(18,53), Customer(19,11)))
And you've also mentioned that after grouping you have Iterable[MyObject] which is equivalent to below step
val groupedRDD = rdd.groupBy(myobject => (myobject.startCity, myobject.endCity)) //groupedRDD: org.apache.spark.rdd.RDD[((String, String), Iterable[MyObject])] = ShuffledRDD[2] at groupBy at worksheetTest.sc:23
So the next step for you to do is to use mapValues to iterate through the Iterable[MyObject], and then count the ages belonging to each ranges, and finally converting to the output you require as below
val finalResult = groupedRDD.mapValues(x => {
val rangeAge = Map("0-2" -> 0, "3-18" -> 0, "19-99" -> 0)
val list = x.flatMap(y => y.customer.map(z => z.Age)).toList
updateCounts(list, rangeAge).map(x => CustomerOut(x._1, x._2)).toList
})
where updateCounts is a recursive function
def updateCounts(ageList: List[Int], map: Map[String, Int]) : Map[String, Int] = ageList match{
case head :: tail => if(head >= 0 && head < 3) {
updateCounts(tail, map ++ Map("0-2" -> (map("0-2")+1)))
} else if(head >= 3 && head < 19) {
updateCounts(tail, map ++ Map("3-18" -> (map("3-18")+1)))
} else updateCounts(tail, map ++ Map("19-99" -> (map("19-99")+1)))
case Nil => map
}
and CustomerOut is another case class
case class CustomerOut(Range: String, Count: Int)
so the finalResult is as below
((Barcelona,London),List(CustomerOut(0-2,0), CustomerOut(3-18,1), CustomerOut(19-99,2)))
((New-York,Paris),List(CustomerOut(0-2,0), CustomerOut(3-18,4), CustomerOut(19-99,2)))
((Paris,London),List(CustomerOut(0-2,3), CustomerOut(3-18,0), CustomerOut(19-99,3)))
I have a table structure with 2 table like this:
result table: 1 Row with generic into + a row UUID.
--------------------------
| uuid | name | other |
--------------------------
| result1 | foo | bar |
--------------------------
| result2 | foo2 | bar2 |
--------------------------
criteria_result:
-----------------------------------
| result_uuid | crit_uuid | value |
-----------------------------------
| result1 | crit1 | 7 |
-----------------------------------
| result1 | crit2 | 8 |
-----------------------------------
| result1 | crit3 | 9 |
-----------------------------------
| result1 | crit7 | 4 |
-----------------------------------
| result2 | crit1 | 2 |
-----------------------------------
What I need is 1 row per result table but that group all the criteria_result table inside, ex:
----------------------------------------------------
| uuid | name | result_crit |
----------------------------------------------------
| result1 | foo | [
| | crit1 | crit2 | crit3 | crit7 |
| | 7 | 8 | 9 | 4 |]
----------------------------------------------------
| result2 | foo2 | [
| | crit1 |
| | 2 | ]
----------------------------------------------------
Or even
-----------------------------------------
| uuid | name | result_crit |
-----------------------------------------
| result1 | foo | [ | name | value |
| crit1 | 7 |
| crit2 | 8 |
| crit3 | 9 |
| crit7 | 4 | ]
-----------------------------------------
-----------------------------------------
| result2 | foo2 | [ | name | value |
| crit1 | 2 | ]
-----------------------------------------
Anything that can get only 1 result per row when I export it but also have all Criteria of that row/result in a sub array/object.
SELECT
result.uuid,
result.name,
criteria_result.result_uuid
FROM
public.criteria_result,
public.result
WHERE
result.uuid = criteria_result.result_uuid;
I tried CUBE, GROUP BY, GROUPING SETS, but I don't seems to get it right or find the answer :/.
Thanks
Note: I do have a recent Postgres 9.5.1.