'select count' returns "OS reports: No such file or directory" - kdb

I get this
"./2017.10.14/:./2017.10.15/tableName. OS reports: No such file or directory"
when I try to do
select count i by date from tableName where date within(.z.d-5;.z.d)
works if I do
select colName from trackFeedStats where date within(.z.d-5;.z.d)
I assume it's one of the columns acting strange ...
UPDATE: the issue seems to be mainly when using count i by colName1,colName2,colName3
UPDATE: I checked permissions and everything seems to be alright, table is in the given partition (2017.10.14), no symlinks
UPDATE: I am looking for suggestions on fixing the db. the query is not that important

This can occur for a number of reasons:
The file/directory legitimately isn't there. Have you checked the database directories to see if the table is in each date slice? If not you should look into .Q.chk for filling missing dates.
You can also get this error if you don't have permission to read the files/directories in that database. They could have been written by a different user etc.

I think 'count i' could be the culprit here. It behaves strangely when applied against partitioned databases.
Although you have specified a list of dates in your where clause, under the hood, kdb is executing 'count i' against all table partitions in the database.
For reference, .Q.pn maintains partition counts -
q).Q.pn
tabA |
tabB |
but, as demonstrated here, 'count i' will execute against all date partitions.
q)select count i from tabA where date within 2020.10.10 2020.09.12;
q).Q.pn
tabA | 8001 7998 8101 0 0 8002 8102 7940 7999 0 0 0 0 0 0 0 0 0 0 0 0
tabB | ()
For some reason, .Q.pn doesn't update if you add an additional constraint to your where clause.
q)select count i from tabA where date within 2020.10.10 2020.09.12,price>0;
q).Q.pn
tabA |
tabB |
and the query will actually run. For example, in this database, there are some empty partitions for tabB.
q)select count i from tabB where date within .z.d + -7 0
'./2020.09.30/tabB. OS reports: No such file or directory
but, if we add some other constraint to the where clause, the query will run as we want it to.
q)select count i from tabB where date within .z.d + -7 0,not null sym
x
-----
16948
Running 'count' against a column will work.
q)select count sym from tabB where date within .z.d + -7 0
x
-----
16948
As a workaround, you could update your query to perform the 'count' against some key column, if one exists. Alternatively, you could fill in the missing partitions, by running .Q.chk - https://code.kx.com/q/ref/dotq/#qchk-fill-hdb

Related

User Sessions | Month's Since Last Active Using SQL

UserID
CalMonth
ActiveFlag
Months_since_last_active
A
1/1/2021
1
0
A
2/1/2021
1
A
3/1/2021
2
A
4/1/2021
1
0
B
1/1/2021
1
0
B
2/1/2021
1
B
3/1/2021
1
0
Problem --> The first 3 colums are given. Generate the last one 'Months_since_last_active' by adding 1 until the use is active again
My Solution as below:
With active_sessions as (
Select
User_Id
, CalMonth
, active flag as current_flag
, LAG (ActiveFlag,1) over (partition by User_Id order by CalMonth) as previous_flag
)
Select User_Id, CalMonth, current_flag, sum(case when current_flag =1 then 0
when current_flag IS NULL then Months_since_last_active + 1
END
) as Months_since_last_active
from active_sessions
order by 1,2
I was asked the above question in an interview and told that my proposed solution would not work because:
When it comes to 3/1/2021 and beyond, the previous values of 'Months_since_last_active' are not in the table yet -- they are only in the code
If I wanted to use LAG function, then it'd take innumerable LAG functions to achieve what I was trying to achieve
I will appreciate if someone can comment on my solution.
Your solution has 3 major problems, 2 of them may be related to copy/past errors. The active_sessions CTE is missing the from clause, so there is no data source. Then the main portion uses the aggregate function SUM, however, the query has no group by which is required for the aggregate function. These are easily corrected. The other issue concerns the LAG function and your use of it.
First off in the CTE you alias the result as previous_flag, then in the main query you reference Months_since_last_active which does not exist yet. I think this is the source of the interviewer's first point.
The interviewer's second point also stems form the LAG function. As written it always looks back exactly 1 row, but from the current row yet it needs to look back 2 rows for (userid, calmonth) = ('A', 2021-03-01), and 3 rows for (A, 2021-04-01), etc. Basically you need to look back to to the last row with active_flag = 1. This leads directly to the it'd take innumerable LAG functions as you do not know how far beck you need to look. Suppose you had 30-40 or more inactive rows between active rows. You need a LAG(activeflag,n) ... for each possibility.
A solution. I dislike the problem statement it should not contain by adding 1 until the use is active again (is it yours or theirs). Either way this is an XY. If theirs they should be telling you what to solve, i.e. find number of months since last active. If yours you have created the problem for yourself. The problem statement should not say anything about how to solve the it. I will ignore that portion of the problem (And in a real interview I would/have ignored it, but be prepared to explain why).
What you have a a version of a Gaps And Islands (google it, you will find more that to think about). In this version lets consider each row with activeflag = 'Y' an as island, and anything else as a gap. Nor what you are looking for is the length of the gaps between islands. In the following the island_num CTE does 2 things. It assigns a sequence number to each row for a (userid, calmonth) and generates a boolean for each island. The `gap_points' then joins the results with itself, selecting the assigned for the max island whose calmonth is less than the current rows calmonth. In the main part the Months_since_last_active is assigned 0 if the current row is an island, and the difference between the generated row numbers if it is a gap. (see demo)
with island_num (userid, cal_month, active_flag, is_island, row_num) as
( select am.*
, case when am.activeflag = 1 then true else false end is_island
, row_number() over (partition by am.userid order by am.calmonth) rn
from active_month am
) -- select * from island_num
, gap_points(userid, cal_month, active_flag, is_island, row_num, island_row) as
( select *
from island_num i1
join lateral
(select max(row_num)
from island_num i2
where i1.userid = i2.userid
and i2.cal_month < i1.cal_month
and i2.is_island
) s0
on true
) --select * from gap_points;
select userid "User Id"
, cal_month "Cal Month"
, active_flag "Active Flag"
, case when is_island then 0
else row_num - island_row
end "Months_since_last_active"
from gap_points;

postgresql: how to get the rows greater than a timestamp and also previous nearest row

I have postgresql table with time and state
time state
2021-01-30 09:59:57+00 0
2021-01-30 09:50:36+00 1
2021-01-30 08:38:05+00 0
2021-01-30 08:29:37+00 1
2021-01-30 08:05:16+00 0
2021-01-30 07:42:27+00 1
i am looking for rows greater than 2021-01-30 07:45:00+00 but also want to have the nearest row before also. In this case 2021-01-30 07:42:27+00 similary w.r.t to less than
any option to include the previous one row near to the condition
Use the Window variant of MAX function and union that with standard select. (see fiddle)
select time_ts "Time", state
from test
where time_ts >= '2021-01-30 07:45+00'::timestamp
union
Select *
from (select max (time_ts) over(), state
from test
where time_ts < '2021-01-30 07:45+00'::timestamp
limit 1
) f
order by 1;
In Reply to " max (time_ts) over() can you explain what it does "
First see Documentation on Window functions. Max is one of the Aggregate functions that also has a Window variant. Within Window functions the Over(...) builds groups (partitions) within row set selected, using Over() clause creates a single group over the entire row set. The advantage here is that is avoids the "Group By" required by the standard aggregate (which could produce multiple rows). While selecting a single row it will select the same row for each row in the result set thus the limit 1. You can see this with the following:
select time_ts, max(time_ts) over() from test;
Note: You used TIME as a column name, but that is a valid Postgres data type. I do not use data types as column names. I get confused over talking about the column or the data type.

Issue with using percentile_cont function in Postgresql

This is my table
ID Total
1 2019.21
3 87918.32
2 562900.3
3 982688.98
1 56788.34
2 56792.32
3 909728.23
Now I would like to find the 25th,50th,75th,90th and 100th percentile of the values (Total) in the above Table. Assume my table consists of Whole Lot of data (some 2 Million Records of the same format) . I've Used the Following code :
CODE :
SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY Total) as disc_func
FROM my_table
The Error I've come across :
ERROR: syntax error at or near "("
LINE 3: percentile_disc(0.5) WITHIN GROUP (ORDER BY total...
You use PostgreSQL < 9.4 . It does not support WITHIN GROUP
https://www.postgresql.org/docs/9.4/static/functions-aggregate.html
https://www.postgresql.org/docs/9.3/static/functions-aggregate.html

Postgres Crosstab Dynamic Number of Columns

In Postgres 9.4, I have a table like this:
id extra_col days value
-- --------- --- -----
1 rev 0 4
1 rev 30 5
2 cost 60 6
i want this pivoted result
id extra_col 0 30 60
-- --------- -- -- --
1 rev 4 5
2 cost 6
this is simple enough with a crosstab.
but i want the following specifications:
day column will be dynamic. sometimes increments of 1,2,3 (days), 0,30,60 days (accounting months), and sometimes in 360, 720 (accounting years).
range of days will be dynamic. (e.g., 0..500 days versus 1..10 days).
the first two columns are static (id and extra_col)
The return type for all the dynamic columns will remain the same type (in this example, integer)
Here are the solutions I've explored, none of which work for me for the following reasons:
Automatically creating pivot table column names in PostgreSQL -
requires two trips to the database.
Using crosstab_hash - is not dynamic
From all the solutions I've explored, it seems the only one that allows this to occur in one trip to the database requires that the same query be run three times. Is there a way to store the query as a CTE within the crosstab function?
SELECT *
FROM
CROSSTAB(
--QUERY--,
$$--RUN QUERY AGAIN TO GET NUMBER OF COLUMNS--$$
)
as ct (
--RUN QUERY AGAIN AND CREATE STRING OF COLUMNS WITH TYPE--
)
Every solution based on any buildin functionality needs to know a number of output columns. The PostgreSQL planner needs it. There is workaround based on cursors - it is only one way, how to get really dynamic result from Postgres.
The example is relative long and unreadable (the SQL really doesn't support crosstabulation), so I will not to rewrite code from blog here http://okbob.blogspot.cz/2008/08/using-cursors-for-generating-cross.html.

Select unique values sorted by date

I am trying to solve an interesting problem. I have a table that has, among other data, these columns (dates in this sample are shown in European format - dd/mm/yyyy):
n_place_id dt_visit_date
(integer) (date)
========== =============
1 10/02/2012
3 11/03/2012
4 11/05/2012
13 14/06/2012
3 04/10/2012
3 03/11/2012
5 05/09/2012
13 18/08/2012
Basically, each place may be visited multiple times - and the dates may be in the past (completed visits) or in the future (planned visits). For the sake of simplicity, today's visits are part of planned future visits.
Now, I need to run a select on this table, which would pull unique place IDs from this table (without date) sorted in the following order:
Future visits go before past visits
Future visits take precedence in sorting over past visits for the same place
For future visits, the earliest date must take precedence in sorting for the same place
For past visits, the latest date must take precedence in sorting for the same place.
For example, for the sample data shown above, the result I need is:
5 (earliest future visit)
3 (next future visit into the future)
13 (latest past visit)
4 (previous past visit)
1 (earlier visit in the past)
Now, I can achieve the desired sorting using case when in the order by clause like so:
select
n_place_id
from
place_visit
order by
(case when dt_visit_date >= now()::date then 1 else 2 end),
(case when dt_visit_date >= now():: date then 1 else -1 end) * extract(epoch from dt_visit_date)
This sort of does what I need, but it does contain repeated IDs, whereas I need unique place IDs. If I try to add distinct to the select statement, postgres complains that I must have the order by in the select clause - but then the unique won't be sensible any more, as I have dates in there.
Somehow I feel that there should be a way to get the result I need in one select statement, but I can't get my head around how to do it.
If this can't be done, then, of course, I'll have to do the whole thing in the code, but I'd prefer to have this in one SQL statement.
P.S. I am not worried about the performance, because the dataset I will be sorting is not large. After the where clause will be applied, it will rarely contain more than about 10 records.
With DISTINCT ON you can easily show additional columns of the row with the resulting n_place_id:
SELECT n_place_id, dt_visit_date
FROM (
SELECT DISTINCT ON (n_place_id) *
,dt_visit_date < now()::date AS prio -- future first
,#(now()::date - dt_visit_date) AS diff -- closest first
FROM place_visit
ORDER BY n_place_id, prio, diff
) x
ORDER BY prio, diff;
Effectively I pick the row with the earliest future date (including "today") per n_place_id - or latest date in the past, failing that.
Then the resulting unique rows are sorted by the same criteria.
FALSE sorts before TRUE
The "absolute value" # helps to sort "closest first"
More on the Postgres specific DISTINCT ON in this related answer.
Result:
n_place_id | dt_visit_date
------------+--------------
5 | 2012-09-05
3 | 2012-10-04
13 | 2012-08-18
4 | 2012-05-11
1 | 2012-02-10
Try this
select n_place_id
from
(
select *,
extract(epoch from (dt_visit_date - now())) as seconds,
1 - SIGN(extract(epoch from (dt_visit_date - now())) ) as futurepast
from #t
) v
group by n_place_id
order by max(futurepast) desc, min(abs(seconds))