Spotfire - Calculate average only if there are minimum 3 values - average

I want to create a cross table in Spotfire where in which Average is calculated only when there are at least 3 values. If there are no values or less than 3 values the average should be blank.
+-------+-----+---------+
| Month | Age | Average |
+-------+-----+---------+
| 1 | 10 | |
| 2 | 11 | |
| 3 | 2 | 7.7 |
| 4 | | |
| 5 | 13 | |
| 6 | 14 | |
| 7 | | |
| 8 | 19 | |
| 9 | 20 | |
| 10 | 21 | 20 |
+-------+-----+---------+

If I'm understanding you correctly, you want to group by Month, and then have something like this as your aggregation:
If(Count()>2,Avg([Age]),null) as [AverageAge_3Min]

Related

Facet a Mutli-value(MVA) type field in sphinx

I have executed below query in sphinx,
select MVA_FIELD from mySphinxIndex facet MVA_FIELD order by count(*) desc;
What I got is like,
+----------------------------+----------+
| MVA_FIELD | count(*) |
+----------------------------+----------+
| | 664 |
| 0 | 536 |
| 13 | 439 |
| 4,13 | 8 |
| 19,13 | 8 |
| 18,13,20 | 8 |
| 8,17,18 | 8 |
| 8,18,13 | 8 |
| 8,15,18 | 8 |
| 8,13,20 | 7 |
| 17,13 | 7 |
| 18,19,20 | 7 |
| 8,17 | 7 |
| 13,17,19 | 7 |
| 11,6 | 7 |
| 6,11,13 | 7 |
| 15,18 | 7 |
| 11,13,20 | 7 |
| 11,13,17 | 7 |
| 6,18,19 | 6 |
| 7,20 | 6 |
| 8,11,13 | 6 |
| 13,17,20 | 6 |
I want to get the count of each ids in MVA_FIELD. For example, I just want the count of 0, 4, 13,... each id separately. How to achieve this ?
Honestly dont how how to do it with FACET suger, but with a normal GROUP BY query, would just use the GROUPBY() function when grouping by a MVA attribute
SELECT GROUPBY() AS value,COUNT(*) FROM mySphinxIndex GROUP BY MVA_FIELD ORDER BY COUNT(*) DESC;
From the docs
A special GROUPBY() function is also supported. It returns the GROUP BY key. That is particularly useful when grouping by an MVA value, in order to pick the specific value that was used to create the current group.

Redshift Distribution By Child Columns

My Situation
I have some tables in my redshift cluster that all break down into either an order_id, shipment_id, or shipment_item_id depending on how granular the table is. order_id is a 1 to many relationship on shipment_id and shipment_id is a 1 to many on shipemnt_item_id.
My Question
I distribute on order_id, so all shipment_id and shipment_item_id records should be on the same nodes across the tables since they are grouped by order_id. My question is, when I have to join on shipment_id or shipment_item_id then will redshift know that the records are on the same nodes, or will it still broadcast the tables since they aren't joined on order_id?
Example Tables
unified_order shipment_details
+----------+-------------+------------------+ +-------------+-----------+--------------+
| order_id | shipment_id | shipment_item_id | | shipment_id | ship_day | ship_details |
+----------+-------------+------------------+ +-------------+-----------+--------------+
| 1 | 1 | 1 | | 1 | 1/1/2017 | stuff |
| 1 | 1 | 2 | | 2 | 5/1/2017 | other stuff |
| 1 | 1 | 3 | | 3 | 6/14/2017 | more stuff |
| 1 | 2 | 4 | | 4 | 5/13/2017 | less stuff |
| 1 | 2 | 5 | | 5 | 6/19/2017 | that stuff |
| 1 | 3 | 6 | | 6 | 7/31/2017 | what stuff |
| 2 | 4 | 7 | | 7 | 2/5/2017 | things |
| 2 | 4 | 8 | +-------------+-----------+--------------+
| 3 | 5 | 9 |
| 3 | 5 | 10 |
| 4 | 6 | 11 |
| 5 | 7 | 12 |
| 5 | 7 | 13 |
+----------+-------------+------------------+
Distribution
distribution_by_node
+------+----------+-------------+------------------+
| node | order_id | shipment_id | shipment_item_id |
+------+----------+-------------+------------------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 2 |
| 1 | 1 | 1 | 3 |
| 1 | 1 | 2 | 4 |
| 1 | 1 | 2 | 5 |
| 1 | 1 | 3 | 6 |
| 1 | 5 | 7 | 12 |
| 1 | 5 | 7 | 13 |
| 2 | 2 | 4 | 7 |
| 2 | 2 | 4 | 8 |
| 3 | 3 | 5 | 9 |
| 3 | 3 | 5 | 10 |
| 4 | 4 | 6 | 11 |
+------+----------+-------------+------------------+
The Amazon Redshift documentation does not go into detail how information is shared between nodes, but it is doubtful that it "broadcasts the tables".
Rather, information is probably sent between nodes based on need -- only the relevant columns would be shared, and possibly only sub-ranges of the data.
Rather than worrying too much about the internal implementation, you should test various DISTKEY and SORTKEY strategies against real queries to determine performance.
Follow the recommendations from Choose the Best Distribution Style to minimize the amount of data that needs to be sent between nodes and consult Amazon Redshift Best Practices for Designing Queries to improve queries.
You can EXPLAIN your query to see how data will be distributed (or not) during the execution. In this doc you'll see how to read the query plan:
Evaluating the Query Plan

Tibco Spotfire - Calculate average only if there are minimum 3 values in a column - see desc

I want to calculate average in Spotfire only when there are minimum 3 values. if there are no values or just 2 values the average should be blank
Raw data:
Product Age Average
1
2
3 10
4 12
5 13 11
6
7 18
8 19
9 20 19
10 21 20
The only way I could really do this is with 3 calculated columns. Insert these calculated columns in this order:
If(Min(If([Age] IS NULL,0,[Age])) over (LastPeriods(3,[Product]))<>0,1) as [BitFlag]
Avg([Age]) over (LastPeriods(3,[Product])) as [TempAvg]
If([BitFlag]=1,[TempAvg]) as [Average]
This will give you the following results. You can ignore / hide the two columns you don't care about.
RESULTS
+---------+-----+---------+------------------+------------------+
| Product | Age | BitFlag | TempAvg | Average |
+---------+-----+---------+------------------+------------------+
| 1 | | | | |
| 2 | | | | |
| 3 | 10 | | 10 | |
| 4 | 12 | | 11 | |
| 5 | 13 | 1 | 11.6666666666667 | 11.6666666666667 |
| 6 | | | 12.5 | |
| 7 | 18 | | 15.5 | |
| 8 | 19 | | 18.5 | |
| 9 | 20 | 1 | 19 | 19 |
| 10 | 21 | 1 | 20 | 20 |
| 11 | | | 20.5 | |
| 12 | 22 | | 21.5 | |
| 13 | 36 | | 29 | |
| 14 | | | 29 | |
| 15 | 11 | | 23.5 | |
| 16 | 23 | | 17 | |
| 17 | 14 | 1 | 16 | 16 |
+---------+-----+---------+------------------+------------------+

emacs org mode: how do i refer to the current row number?

i would like to use the current row number of my org table in cell calculations, either in relation to the table as a whole or in relation to an hline.
if i have the following table:
|---+---+---|
| x | y | z |
|---+---+---|
| 2 | 4 | 8 |
| 2 | 4 | 8 |
| 2 | 4 | 8 |
| 2 | 4 | 8 |
| 2 | 4 | 8 |
| 2 | 4 | 8 |
| 2 | 4 | 8 |
| 2 | 4 | 8 |
| 2 | 4 | 8 |
|---+---+---|
#+TBLFM: #II..#III$1=2::$2=4::$3=$1*$2
how do I change it so that the in the y column each cell is equal to its table row number, as shown if you turn on grid mode in org? the resulting table would look like:
|---+----+----|
| x | y | z |
|---+----+----|
| 2 | 2 | 4 |
| 2 | 3 | 6 |
| 2 | 4 | 8 |
| 2 | 5 | 10 |
| 2 | 6 | 12 |
| 2 | 7 | 14 |
| 2 | 8 | 16 |
| 2 | 9 | 18 |
| 2 | 10 | 20 |
|---+----+----|
(defmath passIndex (x)
x
)
Number rows:
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
#+TBLFM: $1=passIndex(##)
Number columns:
| 1 | 2 | 3 | 4 | 5 |
#+TBLFM: #1=passIndex($#)
Number rows with header row:
| header |
|--------|
| 2 |
| 3 |
| 4 |
| 5 |
#+TBLFM: $1=passIndex(##)

How to fill right in org-mode?

Perhaps I missed this in the documentation but can Anyone point Me in the direction of how to fill right a series of columns in emacs's org-mode? I believe I saw how to fill down but do not recall seeing how to fill right.
Edit: For example, I am looking for a way to take:
| 8 | 6 | 7 | 5 | 3 | 0 | 9 |
| :=#1$1*2 | | | | | | |
And turn it into:
| 8 | 6 | 7 | 5 | 3 | 0 | 9 |
| :=#1$1*2 | :=#1$2*2 | :=#1$3*2 | :=#1$4*2 | :=#1$5*2 | :=#1$6*2 | :=#1$7*2 |
Which evaluates to:
| 16 | 12 | 14 | 10 | 6 | 0 | 18 |
Starting with this state:
| 8 | 6 | 7 | 5 | 3 | 0 | 9 |
| | | | | | | |
#+TBLFM: #2=#1*2
You get to this state:
| 8 | 6 | 7 | 5 | 3 | 0 | 9 |
| 16 | 12 | 14 | 10 | 6 | 0 | 18 |
#+TBLFM: #2=#1*2
by pressing C-c C-c while on the line with TBLFM.