SQL: How to select first record per day, assuming that each day contains more than 1 value - date

I am trying to write a SQL query where the results would show the first value (ID) per user per day for the last year.
I tried using the query below and am able to get results for one day but when I try to change the time range to > 2021-06-01, it does not give me the results I expect.
select * from table
where value in
(
SELECT min(value)
FROM table
WHERE valueid = x
group by user
) and Time = '2022-05-30' and value is not null

Related

How to increment value in counter table

In my table I have the following scheme:
id - integer | date - text | name - text | count - integer
I want just to count some actions.
I want put 1 when date = '30-04-2019' not exist yet.
I want put +1 when is row already exist.
My idea is:
UPDATE "call" SET count = (1 + (SELECT count
FROM "call"
WHERE date = '30-04-2019'))
WHERE date = '30-04-2019'
But it is not working when row doesn't exist.
It is possible without some extra triggers, etc...
You can use a writeable CTE to achieve this. Additionally the UPDATE statement can be simplified to a simple set count = count + 1 there is no need for a sub-select.
with updated as (
update "call"
set count = count + 1
where date = '30-04-2019'
returning id
)
insert into "call" (date, count)
select '30-04-2019', 1
where not exists (select *
form updated);
If the update did not find a row, the where not exists condition will be true and the insert will be executed.
Note that the above is not safe for concurrent execution from multiple transactions. If you want to make this safe, create a unique index on the date column. Then use an INSERT ... ON CONFLICT instead:
insert into "call" (date, count)
values ('30-04-2019', 1)
on conflict (date)
do update
set count = "call".count + 1;
Again: the above requires a unique index (or constraint) on the date column.
Unrelated to the immediate problem, but: storing dates in a text column is a really, really bad idea. You should change your table definition and change the data type for the "date" column to date.

How to query the first row efficiently?

I have a table with large amount of records:
date instrument price
2019.03.07 X 1.1
2019.03.07 X 1.0
2019.03.07 X 1.2
...
When I query for the day opening price, I use:
1 sublist select from prices where date = 2019.03.07, instrument = `X
It takes a long time to execute because it selects all the prices on that day and get the first one.
I also tried:
select from prices where date = 2019.03.07, instrument = `X, i = 0 //It does not return any record (why?)
select from prices where date = 2019.03.07, instrument = `X, i = first i //Seem to work. Does it?
In Oracle an equivalent will be:
select * from prices where date = to_date(...) and instrument = "X" and rownum = 1
and Oracle will stop immediately when it finds the first record.
How to do this in KDB (e.g. stop immediately after it finds the first record)?
In kdb, where subclauses in select statements are executed sequentially. i.e. only those records which pass the first "test" get passed to the second test. With that in mind, looking at your two attempts:
select from prices where date = 2019.03.07, instrument = `X, i = 0 //It does not return any record (why?)
This doesn't (necessarily) return anything, because by the time it gets to the i=0 check, you've already filtered out some records (possibly including the first record in the original table, which would have i=0)
select from prices where date = 2019.03.07, instrument = `X, i = first i //Seem to work. Does it?
This one should work. First you filter by date. Then within the records for that date, you select the records for instrument `X. Then within those records, you take the record where i is the first i (where i has already been filtered down, so first i is simply the index of the first record [still the index from the original table, not the filtered down version])
Q-SQL equivalent for that is select[n] which also performs better than other approaches in most of the cases. Positive 'n' will give first n records and negative will give last n records.
q) select[1] from prices where date = 2019.03.07, instrument = `X
There is no inbuilt functionality to stop after first match. You can write custom function for that but that would probably execute slower than above supported version.

PostgreSQL Calculating a Consecutive session

I have a very large table that contains 4 columns: 1) the status property of a member has changed to:
online, offline, game_lobby, load_screen 2) the status property of a member has changed from: online, offline, game_lobby, and load_screen 3) a member's ID number and 4)the timestamp of when the status property changed). I want to calculate the average time all members spend online, which would be the difference between the timestamp of when a state changes from online to offline and the timestamp of when a state changes from offline to online:
sample dataset
Using the sample linked above, the average calculated would be (01/03/2016 15:32:05 - 01/02/2016 07:18:32 + 03/14/2016 05:46:41 - 03/14/2016 04:09:04
)/2
Here's what I wrote, which gave me a few negative averages calculated for certain members, which can't be right:
with sessions as
( select
date_trunc('week', sc.occurred_at) as week,
sc.occurred_at,
sc.id,
timestampdiff(second,lag(sc.occurred_at) over (order by sc.id asc, sc.occurred_at),
sc.occurred_at)/3600 as session
from state_changes sc
where
((from_state = 'offline' and to_state = 'online') or
(from_state = 'offline' and to_state = 'online'))
and occurred_at at time zone 'America/New_york' > '2016-01-01'
)
select week, avg(session), id
from sessions
group by 1,3;
I can roll-up the averages into a single value instead of by member, but what I wrote is clearly wrong since a small number of the averages are returning negative. Does anyone have any suggestions?
You are basically interested in the time period between going from offline->online and then going back ?->offline. So the trick is to get only those records in a sub-query and then do the lag over those two. You have some problems with your code in exactly those two issues, see code below. In the main query you then get the average and throw out the offline->online row.
SELECT date_trunc('week', logout) AS week,
avg(extract(epoch from logout - login)), -- in seconds
id
FROM (
SELECT lag(occurred_at) OVER (PARTITION BY id ORDER BY occurred_at) AS login,
occurred_at AS logout,
id,
to_state
FROM state_change
WHERE (from_state = 'offline' or to_state = 'offline')
AND occurred_at > '2016-01-01') sub
WHERE to_state = 'offline'
GROUP BY 1,3;

excluding rows from resultset in postgres

This is my result set
i am returning this result set on the base of refid using WHERE refid IN.
Over here, i need to apply a logic without any kind of programming (means SQL query only).
if in result set, i am getting period for particular refid then other rows with the same refid must not returned.
for example, 2667105 having period then myid = 612084598 must not get returned in result set.
according to me this can be achieved using CASE but i have no idea how to use it, i mean that i don't know should i use the CASE statement in SELECT statement or WHERE clause...
EDIT:
This is how it suppose to work,
myid = 612084598 is the default row for refid = 2667105 but if specifically wants the refid for period = 6 then it must return all rows except myid = 612084598
but if i am looking for period = 12, for this period no specific refid present in database.. so for this it must return all rows except first one.. means all rows with the refid which is default one..
Not very clear definition of the problem, but try this:
with cte as (
select
*,
first_value(period) over(partition by refid order by myid) as fv
from test
)
select
myid, refid, period
from cte
where period is not null or fv is null
sql fiddle demo

DB2 Timestamp select statement

I am trying to run a simple query which gets me data based on timestamp, as follows:
SELECT *
FROM <table_name>
WHERE id = 1
AND usagetime = timestamp('2012-09-03 08:03:06')
WITH UR;
This does not seem to return a record to me, whereas this record is present in the database for id = 1.
What am I doing wrong here?
The datatype of the column usagetime is correct, set to timestamp.
#bhamby is correct. By leaving the microseconds off of your timestamp value, your query would only match on a usagetime of 2012-09-03 08:03:06.000000
If you don't have the complete timestamp value captured from a previous query, you can specify a ranged predicate that will match on any microsecond value for that time:
...WHERE id = 1 AND usagetime BETWEEN '2012-09-03 08:03:06' AND '2012-09-03 08:03:07'
or
...WHERE id = 1 AND usagetime >= '2012-09-03 08:03:06'
AND usagetime < '2012-09-03 08:03:07'
You might want to use TRUNC function on your column when comparing with string format, so it compares only till seconds, not milliseconds.
SELECT * FROM <table_name> WHERE id = 1
AND TRUNC(usagetime, 'SS') = '2012-09-03 08:03:06';
If you wanted to truncate upto minutes, hours, etc. that is also possible, just use appropriate notation instead of 'SS':
hour ('HH'), minute('MI'), year('YEAR' or 'YYYY'), month('MONTH' or 'MM'), Day ('DD')