I just want to extract distinct value (one to one matching value in table. Any value is okay). Becuase tableau prep isn't have unique and distinct function, When i google it, they say to make group. But it is not good answer. Look at this example. The desired unique value based on item_cd field.
For example
item_cd vend_nm
100001 A
100001 A
100002 B
100001 C
100003 D
Output
item_cd vend_nm n
100001 A 2
100001 C 1
100002 B 1
100003 D 1
Desired Output
item_cd vend_nm
100001 A
100002 B
100003 D
Related
I just want to create one report where I need max price for each symbol so I wrote following query which works fine on PROD but fails on UAT. So just wanted to know if following query is the appropriate or not.
select from (select sum price by sym,time,src from Table where date within(2019.12.01;2019.12.31) ) where size=(max;price) fby tier
Above query returns 2 column for each symbol instead of 1. Following is the result inner query i.e select sum price by sym,time,src from Table where date within(2019.12.01;2019.12.31)
t:([]time:8#2019.03.11D09:00+"v"$0 4 8 10;sym:8#`GOOG`GOOG`MSFT`MSFT;src:8#`L`O`N`O;price:36.01 35.01 35.5 31.1 39.01 38.01 33.5 32.1;size:8#1427 708 7810 1100)
time sym src price
--------------------------------------------
2019.03.11D09:00:00.000000000 GOOG L 36.01
2019.03.11D09:00:04.000000000 GOOG O 35.01
2019.03.11D09:00:08.000000000 MSFT N 35.5
2019.03.11D09:00:10.000000000 MSFT O 31.1
2019.03.11D09:00:00.000000000 GOOG L 39.01
2019.03.11D09:00:04.000000000 GOOG O 38.01
2019.03.11D09:00:08.000000000 MSFT N 33.5
2019.03.11D09:00:10.000000000 MSFT O 32.1
And output for select from (select sum price by sym,time,src from Table where date within(2019.12.01;2019.12.31) ) where size=(max;price) fby tier is :
t[0,2,4,7]
time sym src price
---------------------------------------------
2019.03.11D09:00:00.000000000 GOOG L 36.01
2019.03.11D09:00:08.000000000 MSFT N 35.5
2019.03.11D09:00:00.000000000 GOOG L 39.01
2019.03.11D09:00:10.000000000 MSFT O 32.1
I suspect that there is something missing with the dataset that you have provided in the question. The results of your inner queries are all floats with remainders, as size is a long, it doesn't make any sense that size=(max;price) is returning any results.
To answer your question in the most general of sense, to get the max price by sym is
select from t where price=(max;price) fby sym
Applying this to the inner result you have provided
q)select from t where price=(max;price) fby sym
time sym src price size
-------------------------------------------------
2019.03.11D09:00:08.000000000 MSFT N 35.5 7810
2019.03.11D09:00:00.000000000 GOOG L 39.01 1427
I have tables with date;sym columns. But each date might have multiple syms. I want to number the occurrences of symbol in each date
For example:
date sym
-------------------
2019.06.04 ABC
2019.06.04 DEF
2019.06.04 ABC
2019.06.05 DEF
2019.06.05 ABC
will give me
date sym c
-------------------
2019.06.04 ABC 1
2019.06.04 DEF 1
2019.06.04 ABC 2 / here ABC appears for the second time on this date.
2019.06.05 DEF 1
2019.06.05 ABC 1
This may be a little cleaner, here the c column is just a running sum of all rows that have been grouped by each combination of date and sym.
q)t:([]date:2019.06.04+0 0 0 1 1;sym:`ABC`DEF`ABC`DEF`ABC)
q)update c:sums i=i by date,sym from t
date sym c
----------------
2019.06.04 ABC 1
2019.06.04 DEF 1
2019.06.04 ABC 2
2019.06.05 DEF 1
2019.06.05 ABC 1
To count the occurrences of syms by date across all of the tables in a HDB we can run a count by date for each of the partitioned tabled .Q.pt and then scan that over pj plus join, as each table is keyed on date (matching keys). As pj is similar to an ij we need to ensure that there are no rows dropped as each date might be missing different syms
q)cntTabs:{2!0!update c:count each sym,sym:first each sym from select sym by date from x} each .Q.pt
q){t:pj[x;y];t,k!y k:key[y] except key[t]}/[cntTabs]
I have a partitioned table, similar to below table:
q)t:([]date:3#2019.01.01; a:1 2 3; a_test:2 3 4; b_test:3 4 5; c: 6 7 8);
date a a_test b_test c
----------------------------
2019.01.01 1 2 3 6
2019.01.01 2 3 4 7
2019.01.01 3 4 5 8
Now, I want to fetch date column and all columns have names with suffix "_test" from table t.
Expected output:
date a_test b_test
------------------------
2019.01.01 2 3
2019.01.01 3 4
2019.01.01 4 5
In my original table, there are more than 100 columns with name having _test so below is not a practical solution in this case.
q)select date, a_test, b_test from t where date=2019.01.01
I tried various options like below, but of no use:
q)delete all except date, *_test from select from t where date=2019.01.01
If the columns you are selecting are variable then you should use a functional qSQL statement to perform the query. The following can be used in your case
q)query:{[tab;dt;c]?[tab;enlist (=;`date;dt);0b;(`date,c)!`date,c]}
q)query[t;2019.01.01;cols[t] where cols[t] like "*_*"]
date a_test b_test
------------------------
2019.01.01 2 3
2019.01.01 3 4
2019.01.01 4 5
In order to craft a particular functional statement, you can parse your query, putting dummy columns in place if you aren't sure what they should be
q)parse "select date,c1,c2 from tab where date=dt"
?
`tab
,,(=;`date;`dt)
0b
`date`c1`c2!`date`c1`c2
A functional select is probably the best way to go here if you require adding further filters.
?[`t;();0b;{x!x}`date,exec c from meta t where c like "*_test"]
The functional form of any select quesry can be obtained by using the -5! operator on any SQL style statement.
In the example below I have created a table with 20 fields, each one beginning with either a or b.
I then use the functional form to define which fields I want.
q)tab:{[x] enlist x!count[x]#0}`$"_" sv ' raze string `a`b,/:\:til 10
q){[t;s]?[t;();0b;{[x] x!x} cols[t] where cols[t] like s]}[tab;"b*"]
b_0 b_1 b_2 b_3 b_4 b_5 b_6 b_7 b_8 b_9
---------------------------------------
0 0 0 0 0 0 0 0 0 0
q){[t;s]?[t;();0b;{[x] x!x} cols[t] where cols[t] like s]}[tab;"a*"]
a_0 a_1 a_2 a_3 a_4 a_5 a_6 a_7 a_8 a_9
---------------------------------------
0 0 0 0 0 0 0 0 0 0
q)-5!" select a,b from c"
?
`c
()
0b
`a`b!`a`b
Alternatively, if I don't require any filtering I can use the # operator as in below:
{[x;s] (cols[x] where cols[x] like s)#x}[ tab;"a*"]
Assume I have a table of events, with Timestamp and Type.
t1, 'b'
t2, 'x'
t3, 's'
t4, 'b'
How can I get a rolling count such that it would give me a list of all timestamps and the cummulative number of events up to taht ts, sort of like a count version of sums
for example for 'b' I d like a table
't1', 1
't2', 1
't3', 1
't4', 2
Here is one way to do it, although there may be a more clever way this uses sums:
//table definition
tab:([]a:`t1`t2`t3`t4;b:"bxsb")
//rolling sum of 1 by column b
update sums count[i]#1 by b from tab
Results in:
a b x
------
t1 b 1
t2 x 1
t3 s 1
t4 b 2
If you wanted replace b you would simply put b: in front of the sums .
One way:
q)t:([]p:asc 4?.z.p+til 1000;t:`b`x`s`b)
q)asc `p xcols ungroup select p,til count i by t from t
p t x
---------------------------------
2017.05.16D09:42:48.259062090 b 0
2017.05.16D09:42:48.259062585 x 0
2017.05.16D09:42:48.259062683 s 0
2017.05.16D09:42:48.259062858 b 1
Ps: Note I have started the sequence at 0 as if to say "I've had 0 events prior to this row" instead of beginning at 1 as per your example. It goes with your req "number of events up to that ts". If you need 1, just add 1 '1+til count i'. Also ensure your time is sorted so as it makes sense when beginning the sequence.
With table t as below:
q)show t: ([]ts:.z.t - desc "u"$(til 4);symb:`b`x`z`b)
ts symb
-----------------
09:46:56.384 b
09:47:56.384 x
09:48:56.384 z
09:49:56.384 b
using a vector conditional:
q)select ts, cum_count:sums ?[symb=`b;1;0] from t
ts cum_count
----------------------
09:46:56.384 1
09:47:56.384 1
09:48:56.384 1
09:49:56.384 2
The same, but with a function taking symb as a parameter:
q){select ts, cum_count:sums ?[symb=x;1;0] from t}[`b]
ts cum_count
----------------------
09:46:56.384 1
09:47:56.384 1
09:48:56.384 1
09:49:56.384 2
In fact you don't need a vector conditional because you can just sum the booleans directly:
q){select ts, cum_count:sums symb=x from t}[`b]
ts cum_count
----------------------
09:46:56.384 1
09:47:56.384 1
09:48:56.384 1
09:49:56.384 2
This also works
update x:1+til count i by b from tab
I have beek looking at this problem for a while and while i know i could do this programiticly in LINQ. I started thinking about solutions that would scale if this were a vary large data set. I'm building my experieance with SQL and believe there is a way to get the result with out performing an insert.
What I have is data that looks like this:
ids type total
A01 x 1
A01 x 2
A01 x 3
A01 y 4
B01 y 2
B01 x 3
B01 y 1
C01 x 1
C01 y 2
C01 x 5
C01 y 6
What I want is data that looks like this:
id x total y total
A01 6 4
B01 3 3
C01 6 8
I's my belief incorrect?
...
SUM(CASE type WHEN'x' THEN total ELSE 0 END),
SUM(CASE type WHEN 'y' THEN total ELSE 0 END)
...
Group by
Id
Sorry hard to give full answer on phone
This is called a pivot table, and there are a number of ways to accomplish it.
If you're using SQL Server 2005 or later, the PIVOT operator (MSDN) is a neat option:
select id, [x], [y]
from temp d
PIVOT ( sum(total) for type in ([x],[y]) ) p