Currently I'm trying to take a list of values from my table and order them alphanumerically so they appear from number to letters. For example I have this data set:
3
8
0.64
0.64 + 2.8
70
90
AK
050LL (Beta)
070
PQ
W3
0.5
0.6
0.8
040
070
1.2
1.5
1.6
100
150
187
2.8
250
3.0
6.3
800
8mm
And I want it to print 0.5 first and then W3 last. I am using an Lpad to grab the data, but it displays like shown above, with no ordering. Is there a way I can sort these by numbers, then numbers+letters, finally letters in PostgreSQL? Do I need to have some special clause for the ordering to be correct?
The SQL statement:
SELECT *
FROM data_table
ORDER BY LPAD(parameter_type, 10) ASC
OFFSET 0 ROWS FETCH NEXT 1000 ROWS ONLY;
Related
I have this table and I want to add another column to calculate the average of the seconds column.
For example:
my table:
id
avg
1
2.5
2
3.2
3
4.1
4
0.8
my desired table:
id
avg
daily_avg
1
2.5
2.65
2
3.2
2.65
3
4.1
2.65
4
0.8
2.65
Is there any simple and short way to do it?
Im using postgreSQL
Thanks
demo: db<>fiddle
You can use the AVG() window function:
SELECT
id, avg,
AVG(avg) OVER () AS daily_avg
FROM mytable
Are there any instances where a negative time type could give unexpected results if used for specific purposes? When time deltas are calculated between negative time values and non-negative time values for example, there do not appear to be any issues.
time
val
00:00:31.384
-0.3170017
00:06:00.139
0.9033492
00:07:01.099
-0.7661049
Then, for the purpose of a window join later over a 10-min window
win:00:10:00;
winForJoin: (neg win;00:00:00) +\:(exec time from data);
first[winForJoin] gives -00:09:28.616 -00:03:59.861 -00:02:58.901
winForJoin[1]-winForJoin[0] gives 10 minutes as expected
If I understand correctly, you're asking how would a window join behave if the opening interval was a negative time? (due to the interval subtraction taking the values into negative territory, relative to 00:00).
The simple answer is that it won't behave any differently than if the times were numbers, but in practice you may see results you don't expect depending on how your table is set up and what you're trying to achieve.
Taking the example in the official wiki as a starting point: https://code.kx.com/q/ref/wj/
q)t:([]sym:3#`ibm;time:10:01:01 10:01:04 10:01:08;price:100 101 105)
q)a:101 103 103 104 104 107 108 107 108
q)b:98 99 102 103 103 104 106 106 107
q)q:([]sym:`ibm; time:10:01:01+til 9; ask:a; bid:b)
q)f:`sym`time
q)w:-2 1+\:t.time
/add volume too so it's easier to follow:
q)s:908 360 522 257 858 585 90 683 90;
q)update size:s from `q
/add an alternative range which has negative starting time
q)w2:(-11:00;1)+\:t.time
The window join takes all rows in q whose times are between the pairs of time ranges:
q)q[`time]within/:flip w
110000000b
011110000b
000001111b
Under the covers it's asking: are these positive numbers (the quote times) in between those two positive numbers (the window range). There's no reason it can't also ask: are these positive numbers in between this negative number and this positive number
q)q[`time]within/:flip w2
110000000b
111110000b
111111111b
You'll notice that all of them are greater than the negative time - meaning that it will include all rows from the beginning of the q table, up until the end time of that pair. This can be considered expected behaviour - if your start time is negative you must mean "from the beginning of time" - aka, all rows from the beginning of the table.
Comparing sum of size shows how the results differ:
q)wj[w;f;t;(q;(sum;`size))]
sym time price size
-----------------------
ibm 10:01:01 100 1268
ibm 10:01:04 101 1997
ibm 10:01:08 105 1448
q)wj[w2;f;t;(q;(sum;`size))]
sym time price size
-----------------------
ibm 10:01:01 100 1268
ibm 10:01:04 101 2905
ibm 10:01:08 105 4353
Finally - where it might get complicated.....it depends on what "negative" time means in your table. If you're at 00:00 (midnight) and you subtract 10 minutes, are you trying to access data from 23:50 the day before? Or does 00:00 represent the starting time (row zero) of your table? If you're trying to access 23:50 from the day before then you will have problems because 23:50 is NOT in between your negative start time and your positive end time, e.g:
q)23:50 within(-00:58:59;10:01:02)
0b
Again this all depends on how your data looks and what you're trying to do
I am trying to merge two data tables (tables A and B) in Spotfire 7.10 using insert columns to give the resultant table C. My problem is i cannot get the join i need on Depth because Depth in tables A and B are not exact matches. What i need is to match Table B to Table A based on a match using Depth to its nearest value i.e Depth 10.5 (table B) matches Depth 10 (Table A). Is this possible in Spotfire or using an TERR R script?
Table A
Depth data
10 2
20 4
30 3
40 5
50 7
Table B
Depth data 2
10.5 100
30.5 112
50.5 125
Table C
Depth data data 2
10 2 100
20 4
30 3 112
40 5
50 7 125
many thanks for any help
It depends on the range of values you may have in both tables for Depth, but you may find that simply rounding the result to the nearest 10 in Table B will suffice. Then you can join based on this.
Round([Depth]/10,0)*10
I have a price series and I'd like to know the indices where there has been a change of x bips. I worked out a very ugly way to accomplish this in a loop e.g.
q)bips:200
q)level:0.001*bips / 0.2
q)price: 1.0 1.1 1.3 1.8 1.9 2.0 2.3
q)ix:0
q)lastix:0
q)result:enlist lastix
q)do[count price;if[abs(price[ix]-price[lastix])>level;result,:ix;lastix:ix];ix:ix+1];
q)result
0 2 3 6
This is a simple O(n) algo that walks through the price series and keeps a marked index (lastix) starts from the first element until it finds a price whose difference is greater than bips when found saves that index and updates lastix with the one found ... is there a more idiomatic way to do it?
My if condition inside the loop is somewhat flawed don't know exactly why if I check abs(price[lastix]-price[ix]) instead of abs(price[ix]-price[lastix]) it doesn't give correct results.
UPDATE: I was aware of deltasbut it compares consecutive elements only and that's not what I need in my OP. I apologize if the price series example in the OP was ambiguous and lead to correct results by simply using deltas. Here I have a counter example new prices series:
q)price: 1.0 1.1 1.21 1.42 1.4 1.32 1.63
q)where abs deltas price > level
,0
and this is not correct. The correct result which is produced by the accepted answer is still
0 2 3 6
I think you're looking for something like this maybe:
f:{where differ{$[level<abs[y-x];y;x]}\[x]}
this carries forward the last value that satisfied your condition and uses if for comparison with the scan adverb, and then uses differ to pick out where the condition was satisfied and values were updated.
If I've understood your problem correctly, the same result should come from
newprice:1 1.1 1.3 1.8 1.9 2 2.1
since the final value is more than 0.2 greater than 1.8, the last value at which the level was updated.
q)f newprice
0 2 3 6
Thanks,
Ryan
I'm not sure if this is exactly what you're looking for but deltas will give you the change between consectutive pairs:
q)deltas price
1 0.1 0.2 0.5 0.1 0.1 0.3
Checking for your condition returns a boolean list:
q)level<=deltas price
1011001b
Finally 'where' will return the indices:
q)where level<=deltas price
0 2 3 6
Thanks,
Jamie
level:0.001*bips:200;
result:where level<=abs deltas price:1.0 1.1 1.3 1.8 1.9 2.0 2.3;
result
0 2 3 6
Is this close to what you're looking for?
Deltas checks the difference between the current and next value, abs will take the absolute value, and then you're comparing each difference against "level", which you have predefine, using where to find the associated indice.
You've included index 0 in your answer but if you want to exclude it you can use the two argument form of deltas:
q)where level<=abs deltas[price 0;price]
2 3 6
Where the first argument sets the initial value to take away, in this case the first element of the price list.
An example of where this may be beneficial is if you were running the function for each date in a partitioned db you could pass in the last value from the previous day to ensure you didn't get the indices where there wasn't a significant difference of bips.
I Have table called timings where we are storing 1 million response timings for load testing , now we need to divide this data into 100 groups i.e. - first 500 records as one group and so on , and calculate percentile of each group , rather than average.
so far i tried this query
Select quartile
, avg(data)
, max(data)
FROM (
SELECT data
, ntile(500) over (order by data) as quartile
FROM data
) x
GROUP BY quartile
ORDER BY quartile
but how do i have find the percentile
Usually, if you want to know the percentile, you are safer using cume_dist than ntile. That is because ntile behaves strangely when given few inputs. Consider:
=# select v,
ntile(100) OVER (ORDER BY v),
cume_dist() OVER (ORDER BY v)
FROM (VALUES (1), (2), (4), (4)) x(v);
v | ntile | cume_dist
---+-------+-----------
1 | 1 | 0.25
2 | 2 | 0.5
4 | 3 | 1
4 | 4 | 1
You can see that ntile only uses the first 4 out of 100 buckets, where cume_dist always gives you a number from 0 to 1. So if you want to find out the 99th percentile, you can just throw away everything with a cume_dist under 0.99 and take the smallest v from what's left.
If you are on Postgres 9.4+, then percentile_cont and percentile_disc make it even easier, because you don't have to construct the buckets yourself. The former even gives you interpolation between values, which again may be useful if you have a small data set.
Edit:
Please note that since I originally answered this question, Postgres has gotten additional aggregate functions to help with this. See percentile_disc and percentile_cont here. These were introduced in 9.4.
Original Answer:
ntile is how one calculates percentiles (among other n-tiles, such as quartile, decile, etc.).
ntile groups the table into the specified number of buckets as equally as possible. If you specified 4 buckets, that would be a quartile. 10 would be a decile.
For percentile, you would set the number of buckets to be 100.
I'm not sure where the 500 comes in here... if you want to determine which percentile your data is in (i.e. divide the million timings as equally as possible into 100 buckets), you would use ntile with an argument of 100, and the groups would have more than 500 entries.
If you don't care about avg nor max, you can drop a bunch from your query. So it would look something like this:
SELECT data, ntile(100) over (order by data) AS percentile
FROM data
ORDER BY data