conditional sum(sumif) in org-table - emacs

I have a table like this:
#+NAME: ENTRY
|------+--------|
| Item | Amount |
|------+--------|
| A | 100 |
| B | 20 |
| A | 120 |
| C | 40 |
| B | 50 |
| A | 20 |
| C | 16 |
|------+--------|
and then I need to sum each item in another table:
#+NAME: RESULT
|------+-----|
| Item | Sum |
|------+-----|
| A | 240 |
| B | 70 |
| C | 56 |
|------+-----|
I've tried using vlookup and remote reference in this table, but I'm not able to sum the resulting list like:
#+TBLFM: $2=vsum((vconcat (org-lookup-all $1 '(remote(ENTRY,#2$1..#>$1)) '(remote(ENTRY,#2$2..#>$2)))))
But it does not give the answer.
So I have to use a place holder to hold the resulting list then sum it:
#+NAME: RESULT
|------+--------------+-----|
| Item | Placeholder | Sum |
|------+--------------+-----|
| A | [100 120 20] | 240 |
| B | [20 50] | 70 |
| C | [40 16] | 56 |
|------+--------------+-----|
#+TBLFM: $2='(vconcat (org-lookup-all $1 '(remote(ENTRY,#2$1..#>$1)) '(remote(ENTRY,#2$2..#>$2))))::$3=vsum($2)
Is there a better solution for this?

One way to do this is without vsum:
#+TBLFM: $2='(apply '+ (mapcar 'string-to-number (org-lookup-all $1 '(remote(ENTRY,#2$1..#>$1)) '(remote(ENTRY,#2$2..#>$2)))))
If you want to use a calc function, you can always use calc-eval:
#+TBLFM: $2='(calc-eval (format "vsum(%s)" (vconcat (org-lookup-all $1 '(remote(ENTRY,#2$1..#>$1)) '(remote(ENTRY,#2$2..#>$2))))))

Related

PostgreSQL: Transforming rows into columns when more than three columns are needed

I have a table like the following one:
+---------+-------+-------+-------------+--+
| Section | Group | Level | Fulfillment | |
+---------+-------+-------+-------------+--+
| A | Y | 1 | 82.2 | |
| A | Y | 2 | 23.2 | |
| A | M | 1 | 81.1 | |
| A | M | 2 | 28.2 | |
| B | Y | 1 | 89.1 | |
| B | Y | 2 | 58.2 | |
| B | M | 1 | 32.5 | |
| B | M | 2 | 21.4 | |
+---------+-------+-------+-------------+--+
And this would be my desired output:
+---------+-------+--------------------+--------------------+
| Section | Group | Level1_Fulfillment | Level2_Fulfillment |
+---------+-------+--------------------+--------------------+
| A | Y | 82.2 | 23.2 |
| A | M | 81.1 | 28.2 |
| B | Y | 89.1 | 58.2 |
| B | M | 32.5 | 21.4 |
+---------+-------+--------------------+--------------------+
Thus, for each section and group I'd like to obtain their percents of fulfillment for level 1 and level 2. To achieve this, I've tried crosstab(), but using this function returns me an error ("The provided SQL must return 3 columns: rowid, category, and values.") because I'm using more than three columns (I need to maintain section and group as identifiers for each row). Is possible to use crosstab in this case?
Regards.
I find crosstab() unnecessary complicated to use and prefer conditional aggregation:
select section,
"group",
max(fulfillment) filter (where level = 1) as level_1,
max(fulfillment) filter (where level = 2) as level_2
from the_table
group by section, "group"
order by section;
Online example

Which is the best way to implement KNN in postgreSQL using a custom distance metric?

Let me explain my problem.
I have a table with this shape:
+----+------+-----------+-----------+
| ID | A | B | W |
+----+------+-----------+-----------+
| 1 | 534 | [a,b,c] | [4,6,2] |
| 2 | 534 | [a,b,d,e] | [6,3,6,2] |
| … | … | … | … |
| 54 | 667 | [a,b,r,e] | [4,6,2,3] |
| 55 | 8789 | [d] | [9] |
| 56 | 8789 | [a,b,d] | [7,2,3] |
| 57 | 8789 | [d,e,f,g] | [4,2,2,8] |
| … | … | … | … |
+----+------+-----------+-----------+
The query the I need to perform is the following: given an input with A,B and W values (e.g. A=8789; B=[a,b]; W=[3,2]) I need to find the "closest" line in the table that has the same value on A.
I've already defined my custom distance function.
The naive approach would be something like (given the input in the example):
SELECT * from my_table T, dist_function(T.B,T.W,ARRAY[a,b],ARRAY[3,2]) as dist
WHERE T.A = 8789
ORDER BY dist ASC
LIMIT 7
In my understanding this is a classical KNN problem for which I realized something already exists:
KNN-GiST
GiST & SP-GiST
SP-Gist example
I'm just not sure about which is the best index to consider.
Thanks.

Copy the date in org table

Suppose such a spreadsheet in org table
|------------+-------+------------+--------+--------+------------|
| Date | Items | Unit Price | Amount | Amount | Categories |
|------------+-------+------------+--------+--------+------------|
| 2019/09/17 | A | 2.64 | 1 | 2.64 | materials |
| | B | 52.67 | 2 | 105.34 | diagnosis |
| | C | 3.08 | 1 | 3.08 | materials |
| | D | 3.85 | 2 | 7.7 | materials |
| | E | 33.66 | 2 | 67.32 | materials |
| | F | 40 | 1 | 40 | treatments |
| | G | 16.5 | 1 | 16.5 | materials |
| | H | 4 | 3 | 12 | treatments |
| | I | 40 | 1 | 40 | bed |
| | M | 6 | 13 | 78 | treatments |
|------------+-------+------------+--------+--------+------------|
#+TBLFM: $5=$3*$4
How could copy the date 2019/09.17 to the bottom of data column?
The link that #manandearth posted in the comments describes how to duplicate (perhaps with slight modifications) the entries in a column. Briefly, pressing S-RET in a cell duplicates its contents from the cell above (if it is not empty) - if the cell is full and the next cell is empty then it duplicates the full cell to the empty cell. If the contents are numeric, then the "duplication" involves a slight modification: it increases the value by 1. The same happens with a date: it increases the date to next day (but the date has to be in a format that Org mode recognizes: either an active date <YYYY-MM-DD> or an inactive data [YYYY-MM-DD]). The increment by default is 1 in these cases, but can be set to something else by setting the variable org-table-copy-increment to a different value. That's the "interactive" case I mention in my comment.
The other way to fill a column in a table is by using a formula. For example here's a formula to fill the first column with a copy of the first entry in the column:
#+TBLFM: #3$1..#>$1 = #2$1
This says: Set all rows from row 3 (#3) to the last row (#>) of column 1 ($1) to the value of the cell in row 2 (#2), column 1 ($1). Note that row 1 is the header. Press C-c C-c on the table formula line above and ... wait, what happened?
|------------+-------+------------+--------+--------+------------|
| Date | Items | Unit Price | Amount | Amount | Categories |
|------------+-------+------------+--------+--------+------------|
| 2019/09/17 | A | 2.64 | 1 | 2.64 | materials |
| 13.196078 | B | 52.67 | 2 | 105.34 | diagnosis |
| 13.196078 | C | 3.08 | 1 | 3.08 | materials |
| 13.196078 | D | 3.85 | 2 | 7.7 | materials |
| 13.196078 | E | 33.66 | 2 | 67.32 | materials |
| 13.196078 | F | 40 | 1 | 40 | treatments |
| 13.196078 | G | 16.5 | 1 | 16.5 | materials |
| 13.196078 | H | 4 | 3 | 12 | treatments |
| 13.196078 | I | 40 | 1 | 40 | bed |
| 13.196078 | M | 6 | 13 | 78 | treatments |
|------------+-------+------------+--------+--------+------------|
#+TBLFM: #3$1..#>$1 = #2$1
It does not quite work in this case for a technical reason: Org mode uses Calc in table formula calculations and Calc looks at 2019/09/17 and says: "Aha, I have to divide 2019 by 9 and then divide the result by 17", and fills the rest of the column with the result of the divisions: 13.196078. You may have meant 2019/09/17 to be a date, but Org mode does not know that: it gives it to Calc which interprets it as an arithmetic expression. The solution here is the same as in the linked answer: make Org mode aware that it's a date by making it either an active date: <2019-09-17> or an inactive date: [2019-09-17]:
|------------------+-------+------------+--------+--------+------------|
| Date | Items | Unit Price | Amount | Amount | Categories |
|------------------+-------+------------+--------+--------+------------|
| [2019-09-17] | A | 2.64 | 1 | 2.64 | materials |
| [2019-09-17 Tue] | B | 52.67 | 2 | 105.34 | diagnosis |
| [2019-09-17 Tue] | C | 3.08 | 1 | 3.08 | materials |
| [2019-09-17 Tue] | D | 3.85 | 2 | 7.7 | materials |
| [2019-09-17 Tue] | E | 33.66 | 2 | 67.32 | materials |
| [2019-09-17 Tue] | F | 40 | 1 | 40 | treatments |
| [2019-09-17 Tue] | G | 16.5 | 1 | 16.5 | materials |
| [2019-09-17 Tue] | H | 4 | 3 | 12 | treatments |
| [2019-09-17 Tue] | I | 40 | 1 | 40 | bed |
| [2019-09-17 Tue] | M | 6 | 13 | 78 | treatments |
|------------------+-------+------------+--------+--------+------------|
#+TBLFM: #3$1..#>$1 = #2$1
This does not do automatic incrementation but if that's what you want, it's easy to accomplish: Calc can do calculations on dates, so we can increment daily by adding to the date in each row the row number minus 2 (e.g. row 3 would get an increment of 3 - 2 = 1, row 4 would get 4 - 2 = 2, etc). To accomplish this, you have to get the row number of the current row: the idiom is ##. Then the formula becomes:
#+TBLFM: #3$1..#>$1 = #2$1 + ## - 2
and the table becomes:
|------------------+-------+------------+--------+--------+------------|
| Date | Items | Unit Price | Amount | Amount | Categories |
|------------------+-------+------------+--------+--------+------------|
| [2019-09-17] | A | 2.64 | 1 | 2.64 | materials |
| [2019-09-18 Wed] | B | 52.67 | 2 | 105.34 | diagnosis |
| [2019-09-19 Thu] | C | 3.08 | 1 | 3.08 | materials |
| [2019-09-20 Fri] | D | 3.85 | 2 | 7.7 | materials |
| [2019-09-21 Sat] | E | 33.66 | 2 | 67.32 | materials |
| [2019-09-22 Sun] | F | 40 | 1 | 40 | treatments |
| [2019-09-23 Mon] | G | 16.5 | 1 | 16.5 | materials |
| [2019-09-24 Tue] | H | 4 | 3 | 12 | treatments |
| [2019-09-25 Wed] | I | 40 | 1 | 40 | bed |
| [2019-09-26 Thu] | M | 6 | 13 | 78 | treatments |
|------------------+-------+------------+--------+--------+------------|
#+TBLFM: #3$1..#>$1 = #2$1+ ## - 2
The various anomalies of the display of dates (do we include the day of the week? do we include the time?) might be worked around using org-time-stamp-custom-formats but that gets us into waters that I have not explored.

Create dummy variables in PostgreSQL

Is it possible to create a dummy variable when querying
For instance the query below will give me only the observations that satisfy the var1 conditions. I also want the remaining observations but with some kind of tag on it (0/1, indicator values would be sufficient)
SELECT distinct ON (id) id,var1,var2,var3
FROM table
where var2 = ANY('{blue,yellow}');
Have
+-----+------+--------+------+
| id | Var1 | Var2 | Var3 |
+-----+------+--------+------+
| 345 | 12 | Blue | 3456 |
| 345 | 12 | Red | 2134 |
| 346 | 45 | Blue | 3451 |
| 347 | 25 | yellow | 1526 |
+-----+------+--------+------+
Want
+-----+------+--------+------+--------------------+
| id | Var1 | Var2 | Var3 | Indicator variable |
+-----+------+--------+------+--------------------+
| 345 | 12 | Blue | 3456 | 1 |
| 345 | 12 | Red | 2134 | 0 |
| 346 | 45 | Blue | 3451 | 1 |
| 347 | 25 | yellow | 1526 | 1 |
+-----+------+--------+------+--------------------+
Instead of a expression in where you can use an expression in select output expressions:
=> select a, a = any('{1,2,3,5,7}') as asmallprime
from generate_series(1,10) as a;
a | asmallprime
----+-------------
1 | t
2 | t
3 | t
4 | f
5 | t
6 | f
7 | t
8 | f
9 | f
10 | f
(10 rows)
Tometzky's answer is sufficient, but if you want something more complex you can also use CASE statements.
Tometzky's example using CASE with an extra indicator
SELECT a, CASE WHEN a = any('{1,2,3,5,7}') THEN 'YES'
WHEN a = any('{4,9}') THEN 'SQUARE' ELSE 'NO' END as asmallprime
FROM generate_series(1,10) as a;

How to compute the dot product of two column (think full column as a vector)?

gave this table:
| a | b | c |
|---+---+----+
| 3 | 4 | |
| 1 | 2 | |
| 1 | 3 | |
| 2 | 2 | |
I want to get the dot product of two column a and b ,the result should be equel to (3*4)+(1*2)+(1*3)+(2*2) which is 21.
I don't want use the clumsy formula (B1*B2+C1*C2+D1*D2+E1*E2) because actually I have a large table waiting to calculate.
I know emacs's Calc tool has a "vprod" function which can do those sort of things ,but I dont' know how to turn the full column to a vector.
Can anybody tell me how to achieve this task,appreciate it!
In emacs-calc, the simple product of 2 vectors calculates the dot product.
This works (I put the result in #6$3; also the parenthesis can be omitted):
| a | b | c |
|---+---+----|
| 3 | 4 | |
| 1 | 2 | |
| 1 | 3 | |
| 2 | 2 | |
|---+---+----|
| | | 21 |
#+TBLFM: #6$3=(#I$1..#II$1)*(#I$2..#II$2)
#I and #II span from the 1st hline to the second.
This can be solved using babel and R in org-mode:
#+name: mytable
| a | b | c |
|---+---+----+
| 3 | 4 | |
| 1 | 2 | |
| 1 | 3 | |
| 3 | 2 | |
#+begin_src R :var mytable=mytable
sum(mytable$a * mytable$b)
#+end_src
#+RESULTS:
: 23