How can I use Group By with FileMaker? Kind of similar problem like this
Filemaker sum by group. Can someone explain how to use Summary GetSummary?
Id |Value
1 |50
1 |50
2 |10
2 |5
1|100
2|15
Create a new summary field (ValueSummary) for Value field.
Sort the records by ID field.
Use GetSummary ( ValueSummary; ID ) to get the summary of a particular ID value.
Related
I have a delta table as shown below. There is a id column and columns for count of signals. There will be around 300 signals and so the number of columns will be approx 601 in the final delta table.
The requirement is to have a single record for each id_no in the final table.
+-----+--------------+-----------------+--------------+-----------------+--------------+-----------------+--------------------+..............+------------------+--------------------+----------------------------+
|id_no|sig01_total |sig01_valid_total|sig02_total |sig02_valid_total|sig03_total |sig03_valid_total|sig03_total_valid |..............|sig300_valid_total|sig300_total_valid |load_timestamp |
+-----+--------------+-----------------+--------------+-----------------+--------------+-----------------+--------------------+..............|------------------+--------------------+----------------------------+
|050 |25 |23 |45 |43 |66 |60 |55 |..............|60 |55 |2021-08-10T16:58:30.054+0000|
|051 |78 |70 |15 |14 |10 |10 |9 |..............|10 |9 |2021-08-10T16:58:30.054+0000|
|052 |88 |88 |75 |73 |16 |13 |13 |..............|13 |13 |2021-08-10T16:58:30.054+0000|
+-----+--------------+-----------------+--------------+-----------------+--------------+-----------------+--------------------+..............+------------------+--------------------+----------------------------+
I could perform the upsert based on the id_no using delta merge option as shown below.
targetDeltaTable.alias("t")
.merge(
sourceDataFrame.alias("s"),
"t.id_no = s.id_no")
.whenMatched().updateAll()
.whenNotMatched().insertAll()
.execute()
This code will execute weekly and the final tabe has to be updated weekly. Not all signals might be available for every week data.
But the issue with merging is, If the id_no is available, then i have to do an additional check for the columns (signal columns) availability also in the existing table.
If the signal column exists then i have to do an addition of the new count with existing count.
If the signal column does not exists for that particular id_no then i have to add that signal as a new column to the existing row.
I have tried to execute delta table upsert command as shown below, but since the number of columns are not static every data load and considering the huge number of columns it was not succeeded.
DeltaTable.forPath(spark, "/data/events/target/")
.as("t")
.merge(
updatesDF.as("s"),
"t.id_no = s.id_no"
.whenMatched
.updateExpr(//some condition to be applied to check whether column exists or not?
Map("sig01_total" -> "t.sig01_total"+"s.sig01_total"
->
->........))
.whenNotMatched
.insertAll()
.execute()
How can I acheive this requirement? Any leads appreciated!
I'm trying to make an ssrs report show 20 rows per page and I've tried using
=Ceiling(RowNumber(Nothing)/20)
THIS DOES NOT WORK.
My rows contain a field that counts a number so it's just making that one field equal 20 for each page. Is there any way to make the tablix object only print a set number of rows per page so it's not reliant on the group by of the dataset?
I've also tried changing that 'nothing' to a field in the dataset but that throws an error.
Here's how my report tablix is set up:
| Part | Part Description | Part Qty | Part Weight | Total Weight |
|[Part]|[Part Description]|[SUM(PartQty)]|[Part Weight]|[SUM(TotalWeight)]
What I expect to get with the =Ceiling expression:
| Part | Part Description | Part Qty | Part Weight | Total Weight |
|PART1 |Part 1 Desc |12 |5.00 |[60.00]
|PART2 |Part 2 Desc |3 |5.00 |[15.00]
|PART3 |Part 3 Desc |5 |5.00 |[25.00]
|PART4 |Part 4 Desc |7 |5.00 |[35.00]
...Continue until 20 rows then page break
This is what I'm getting:
| Part | Part Description | Part Qty | Part Weight | Total Weight |
|PART1 |Part 1 Desc |12 |5.00 |[60.00]
|PART2 |Part 2 Desc |3 |5.00 |[15.00]
|PART3 |Part 3 Desc |5 |5.00 |[25.00]
--Page Break
|PART4 |Part 4 Desc |7 |5.00 |[35.00]
The field PartQty will count up to equal 20 then a page break.
You need to do the following:
Add a new row group and the the "group on" option to =CEILING(RowNumber(nothing)/20). This group should be the outermost group so if you only had a details group you could right click the details group and choose "add group => parent group"
Delete the new column if one was just created - we don't need this.
Right-click the new row group you created in hte row gorup panel at the bottom of the screen and click 'Properties'
In the "Page Breaks" section, set the option to 'Between each instance of a group'
Note: Whilst this will do what you want be aware that it won't actually set the number of rows per page, it sets the number of rows per group and then puts a break between each group. If you set the value in CEILING to 100 for instance, each group probably go over several pages.
I have a Dataframe whose Department values must be from a given group of values.
-----------------------
Id Name Department
-----------------------
1 John Sales
2 Martin Maintenance
3 Keith Sales
4 Rob Unknown
5 Kevin Unknown
6 Peter Maintenance
------------------------
The valid values for Department are stored in a String Array.
['Sales','Maintenance','Training']
If the Department value in the DataFrame is anything other than the allowed values, it must be replaced by 'Training'. So the new DataFrame will be -
-----------------------
Id Name Department
-----------------------
1 John Sales
2 Martin Maintenance
3 Keith Sales
4 Rob Training
5 Kevin Training
6 Peter Maintenance
------------------------
What could be a feasible solution?
You can achieve your requirement by using when/otherwise, concat and lit inbuilt functions as
val validDepartments = Array("Sales","Maintenance","Training")
import org.apache.spark.sql.functions._
df.withColumn("Department", when(concat(validDepartments.map(x => lit(x)):_*).contains(col("Department")), col("Department")).otherwise("Training")).show(false)
which should give you
+---+------+---+-----------+
|Id |Name |Age|Department |
+---+------+---+-----------+
|1 |John |35 |Sales |
|2 |Martin|34 |Maintenance|
|3 |Keith |33 |Sales |
|4 |Rob |34 |Training |
|5 |Kevin |35 |Training |
|6 |Peter |36 |Maintenance|
+---+------+---+-----------+
A simple udf function should get your requirement fulfilled too as
val validDepartments = Array("Sales","Maintenance","Training")
import org.apache.spark.sql.functions._
def containsUdf = udf((department: String) => validDepartments.contains(department) match {case true => department; case false => "Training"} )
df.withColumn("Department", containsUdf(col("Department"))).show(false)
which should give you the same result
I hope the answer is helpful
I want to be able to add up to 10 tags to a record in a database (MongoDB) but I don't want to add 10 columns with the related indexes on each of them. So I thought I'd add the unique sum of these tags.
e.g. (with 6 tags)
|------------|
|value | tag |
|------------|
|1 |a |
|2 |b |
|4 |c |
|8 |d |
|16 |e |
|32 |f |
|------------|
e.g.
a + b = 3
b + c + d = 14
I then store just the sum of the value in Mongo.
These combinations are always unique and I can "reconstitute" them back into tags when pulled from persistent storage using iteration.
int tagSum
for each (tag in tagCollection.OrderDescending)
{
if (tagSum >= (int)tag)
{
TagProperty.Add(targetAge);
tagSum -= (int)tag;
}
}
My problem however is that I thought there must be a mathematical formula that I could use to query for a specific tag e.g. find the "c tag" by passing in the value 4. I'm either wrong or I cannot find it.
I'm happy to go with the Multikeys solution in Mongo but I have a lot of other data to index and using 1 index instead of 10 would just be nicer‽
Multikeys is the right approach to this problem as it can find any document from the single index on the array without a table-scan. In your case you can just put the appropriate selection of letters representing the tags into the array: ["a", "d", "e"].
In more complicated cases where each field could contain the same tag values, for example song names, album names, artist names, ... I sometimes add the tag phrase twice: once on its own and once with the field name pre-pended, e.g. "artist:Hello". Now I can search on the tag word occurring in any field OR the tag word occurring in a specific field and either way it will use the index to find matching records.
Convert the tags a-f to an integer 0-5, call it tagValue, then the number you want is 1<<tagValue.
I have a table as follows
|GroupID | UserID |
--------------------
|1 | 1 |
|1 | 2 |
|1 | 3 |
|2 | 1 |
|2 | 2 |
|3 | 20 |
|3 | 30 |
|5 | 200 |
|5 | 100 |
Basically what this does is create a "group" which user IDs get associated with, so when I wish to request members of a group I can call on the table.
Users have the option of leaving a group, and creating a new one.
When all users have left a group, there's no longer that groupID in my table.
Lets pretend this is for a chat application where users might close and open chats constantly, the group IDs will add up very quickly, but the number of chats will realistically not reach millions of chats with hundreds of users.
I'd like to recycle the group ID numbers, such that when I goto insert a new record, if group 4 is unused (as is the case above), it gets assigned.
There are good reasons not to do this, but it's pretty straightforward in PostgreSQL. The technique--using generate_series() to find gaps in a sequence--is useful in other contexts, too.
WITH group_id_range AS (
SELECT generate_series((SELECT MIN(group_id) FROM groups),
(SELECT MAX(group_id) FROM groups)) group_id
)
SELECT min(gir.group_id)
FROM group_id_range gir
LEFT JOIN groups g ON (gir.group_id = g.group_id)
WHERE g.group_id IS NULL;
That query will return NULL if there are no gaps or if there are no rows at all in the table "groups". If you want to use this to return the next group id number regardless of the state of the table "groups", use this instead.
WITH group_id_range AS (
SELECT generate_series(
(COALESCE((SELECT MIN(group_id) FROM groups), 1)),
(COALESCE((SELECT MAX(group_id) FROM groups), 1))
) group_id
)
SELECT COALESCE(min(gir.group_id), (SELECT MAX(group_id)+1 FROM groups))
FROM group_id_range gir
LEFT JOIN groups g ON (gir.group_id = g.group_id)
WHERE g.group_id IS NULL;