How can I get the last 10 records - postgresql

I have a db with +- 60000 records in, I need to obtain the last 10 records entered, can this be done via postgres? I was thinking off maybe setting the offset to 50 990 and the limit to 10 or something similar, but not sure if this will be efficient?

Something like the following perhaps:
SELECT * FROM your_table
ORDER BY your_timestamp DESC
LIMIT 10
If you want the result sorted by the timestamp, you can wrap this in another query and sort again. You might want to take a look at the execution plan but this shouldn't be too inefficient.

ORDER BY id DESC LIMIT 10
If id is indexed, this will be very efficient. Could naturally also be a timestamp.

Related

Getting the total number of records in a single knexjs query when using the limit() method

I use knexjs and postgresql. Is it possible in knexjs to get the total of records from the same query in which the limit is used?
For example:
knex.select().from('project').limit(50)
Is it possible to somehow get the total number of records in the same query if there are more than 50?
The question arose due to the fact that my query is much more complex, which uses a lot of subqueries and conditions, and I would not like to make this query twice to get the data in one query and the total number of records (I use the .count() method) from another.
I do not know your obscurification manager (knexjs?) but I would think you should be able to add the window version of the count() function to your select list. In plain SQL something like: Where ... represents your current select list. (see demo)
select ..., count(*) over() total_rows
from project
limit 5;
This works because the window count function counts all rows selected, after all rows selected, but before the LIMIT clause is applied. Note: This adds a column to the result set with the same value in every row.

postgresSQL How to do a SELECT clause with an condition iterating through a range of values?

Hy everyone. This is my first post on Stack Overflow so sorry if it is clumpsy in any way.
I work in Python and make postgresSQL requests to a google BigQuery database. The data structure looks like this :
sample of data
where time is represented in nanoseconds, and is not regularly spaced (it is captured real-time).
What I want to do is to select, say, the mean price over a minute, for each minute in a time range that i would like to give as a parameter.
This time range is currently a list of timestamps that I build externally, and I make sure they are separated by one minute each :
[1606170420000000000, 1606170360000000000, 1606170300000000000, 1606170240000000000, 1606170180000000000, ...]
My question is : how can I extract this list of mean prices given that list of time intervals ?
Ideally I'd expect something like
SELECT AVG(price) OVER( PARTITION BY (time BETWEEN time_intervals[i] AND time_intervals[i+1] for i in range(len(time_intervals))) )
FROM table_name
but I know that doesn't make sense...
My temporary solution is to aggregate many SELECT ... UNION DISTINCT clauses, one for each minute interval. But as you can imagine, this is not very efficient... (I need up to 60*24 = 1440 samples)
Now there very well may already be an answer to that question, but since I'm not even sure about how to formulate it, I found nothing yet. Every link and/or tip would be of great help.
Many thanks in advance.
First of all, your sample data appears to be at millisecond resolution, and you are looking for averages at minute (sixty-second) resolution.
Please try this:
select div(time, 60000000000) as minute,
pair,
avg(price) as avg_price
from your_table
group by div(time, 60000000000) as minute, pair
If you want to control the intervals as you said in your comment, then please try something like this (I do not have access to BigQuery):
with time_ivals as (
select tick,
lead(tick) over (order by tick) as next_tick
from unnest(
[1606170420000000000, 1606170360000000000,
1606170300000000000, 1606170240000000000,
1606170180000000000, ...]) as tick
)
select t.tick, y.pair, avg(y.price) as avg_price
from time_ivals t
join your_table y
on y.time >= t.tick
and y.time < t.next_tick
group by t.tick, y.pair;

See length (count) of query results in workbench

I just started using MySQL Workbench (6.1). The default limit for queries is 1,000 and that's fine I want to keep that.
But the results from the action output message will therefore always say "1000 rows returned".
Is there a setting to see the number of records that would be returned in the query had their been no limit? For sanity checking query results?
I know this is late by a few years, but I think you're asking for a way to see total row count in the bottom of the results pane, like in SQL Server. In SQL Server, you would also go in the messages pane and it would say how many rows were returned. I was actually looking for exactly what you were asking for as well, and seems like there is no way to find that. If you have an ID in your table that is just numeric and is in numeric order, you could order by ID desc and look at the biggest number there. That is what I've decided to do.
The result is not always "1000 rows returned". If there are less records than that you will get the actual count. If you want to know the total number of rows in a table do a select count(*) from table. Alternatively, you can switch off the automatic limit and have all records returned by MySQL Workbench, but that can be time + memory consuming for large tables.
I think removing the row limit will help. By default, MySQL workbench will limit the result set to 1000 rows but you can always disable the limit. Check out https://superuser.com/questions/240291/how-to-remove-1000-row-limit-in-mysql-workbench-queries on how to do that.
You can run a second query to check that
select count(*) from (your original query) as t;
this will return the total rows in actual result.
You can use the SQL count function. It returns the count of the total number of rows a query returns.
A sample query:
select count(*) from tableName where field1 = value1
In workbench, in the dropdown menu at the top, set it to dont limit Then run the query to extract data from table Then under the output pane below, the total count of the query results will be displayed in the message column

Query distinct values from historical database

If I run this query on large Historical database without specifying a date, will KDB be smart enough to retrive status values from index and not bring database down?
select distinct status from trades
The only way kdb can possibly tell all the distinct status is by reading from every partition. Yes this will take a lot of memory but unless you yourself want to maintain a cache of all distinct status, there is nothing else you can do. As previous mentioned an attribute will speed the query up but the query time will still only scale with the number of partitions.
To retrieve using index, kdb provides 'g#' attribute. Distinct alone can take more time which depends on size of your table(it will be linear search without `g# attribute).
Check this-> http://code.kx.com/q4m3/8_Tables/#88-attributes
Let's look at simple example:
q) a: 10000000#1 2 3 5
q) b:`g#a
q) \ts distinct a
68 134217888
q) \ts distinct b
0 288
Difference shows that g# attribute makes a lot of difference in time and space taken during searching. It is becauseg# attribute creates and maintains index on vector.

SQL Sum and Group By for a running Tally?

I'm completely rewriting my question to simplify it. Sorry if you read the prior version. (The previous version of this question included a very complex query example that created a distraction from what I really need.) I'm using SQL Express.
I have a table of lessons.
LessonID StudentID StudentName LengthInMinutes
1 1 Chuck 120
2 2 George 60
3 2 George 30
4 1 Chuck 60
5 1 Chuck 10
These would be ordered by date. (Of course the actual table is thousands of records with dates and other lesson-related data but this is a simplification.)
I need to query this table such that I get all rows (or a subset of rows by a date range or by student), but I need my query to add a new column we might call PriorLessonMinutes. That is, the sum of all minutes of all lessons for the same student in lessons of PRIOR dates only.
So the query would return:
LessonID StudentID StudentName LengthInMinutes PriorLessonMinutes
1 1 Chuck 120 0
2 2 George 60 0
3 2 George 30 60 (The sum Length from row 2 only)
4 1 Chuck 60 120 (The sum Length from row 1 only)
5 1 Chuck 10 180 (The sum of Length from rows 1 and 4)
In essence, I need a running tally of the sum of prior lesson minutes for each student. Ideally the tally shouldn't include the current row, but if it does, no big deal as I can do subtraction in the code that receives the query.
Further, (and this is important) if I retrieve only a subset of records, (for example by a date range) PriorLessonMinutes must be a sum that considers rows that are NOT returned.
My first idea was to use SUM() and to GROUP BY Student, but that isn't right because unless I'm mistaken it would include a sum of minutes for all rows for each student, including rows that come after the row which aren't relevant to the sum I need.
OPTIONS I'M REJECTING: I could scan through all rows in my code that receives it, (although this would force me to retrieve all rows unnecessarily) but that's obviously inefficient. I could also put a real data field in there and populate it, but this too presents problems when other records are deleted or altered.
I have no idea how to write such a query together. Any guidance?
This is a great opportunity to use Windowed Aggregates. The trick is that you need SQL Server 2012 Express. If you can get it, then this is the query you are looking for:
select *,
sum(LengthInMinutes)
over (partition by StudentId order by LessonId
rows between unbounded preceding and 1 preceding)
as PriorLessonMinutes
from Lessons
Note that it returns NULLs instead of 0s (zeroes). If you insist on zeroes, use COALESCE function to turn NULLs into zeroes.
I suggest using a nested query to limit the number of rows returned:
select * from
(
select *,
sum(LengthInMinutes)
over (partition by StudentId order by LessonId
rows between unbounded preceding and 1 preceding)
as PriorLessonMinutes
from Lessons
) as NestedLessons
where LessonId > 3 -- this is an example of a filter
This way the filter is applied after the aggregation is complete.
Now, if you want to apply a filter that doesn't affect the aggregation (like only querying data for a certain student), you should apply the filter to the inner query, as pruning the rows that don't affect the computation early (like data for other students) will improve the performance.
I feel the following code will serve your purpose.Check it:-
select Students.StudentID ,Students.First, Students.Last,sum(Lessons.LengthInMinutes)
as TotalPriorMinutes from lessons,students
where Lessons.StartDateTime < getdate()
and Lessons.StudentID = Students.StudentID
and StartDateTime >= '20090130 00:00:00' and StartDateTime < '20790101 00:00:00'
group by Students.StudentID ,Students.First, Students.Last