Talend filter rows for newest date - filtering

I have a table with n entries for each id. I also have a timestamp. I need to keep only the row with the latest timestamp for each id. I sort by id than by desc(timestamp). But what to do next? tMemorizeRows? You need to specify how many rows to memorize. But n is not constant. And can I somehow use FilterRow?
Many thanks for any advice

After sorting the data by ID asc and Timestamp desc, you can use tAggregateRow and choose ID as group by column. Use "first" for all the remaining columns. This will give you the first value of each column for respective ID, as you have already sorted by ID asc and Timestamp desc.

Related

How to get latest data for a column when using grouping in postgres

I am using postgres alongside sequelize. I have encountered a case where I need to write a coustom query which groups the records are a particular field. I know for the remaning columns that are not used for grouping, I need to use a aggregate function like SUM. But the problem is that for some columns I need to get the one what is the latest one (DESC sorted by created_at). I see no function in sql to do so. Is my only option to write subqueries or is there a better way? Thanks?
For better understanding, If you look at the below picture, I want the group the records with address. So after the query there should only be two records, one with sydney and the other with new york. But when it comes to the distance, I want the result of the query to contain the distance form the row that was most recently created, i.e with the latest created_at.
so the final two query results should be:
sydney 100 2022-09-05 18:14:53.492131+05:45
new york 40 2022-09-05 18:14:46.23328+05:45
select address, distance, created_at
from(
select address, distance, created_at, row_number() over(partition by address order by created_at DESC) as rn
from table) x
where rn = 1

Create SQL Column Counting Frequency of Value in other column

I have the first three columns in SQL. I want to create the 4th column called Count which counts the number of times each unique name appears in the Name column. I want my results to appears like the dataset below, so I don't want to do a COUNT and GROUP BY.
What is the best way to achieve this?
We can try to use COUNT window function
SELECT *,COUNT(*) OVER(PARTITION BY name ORDER BY year,month) count
FROM T
ORDER BY year,month
sqlfiddle

Postgres pagination with non-unique keys?

Suppose I have a table of events with (indexed) columns id : uuid and created : timestamp.
The id column is unique, but the created column is not. I would like to walk the table in chronological order using the created column.
Something like this:
SELECT * FROM events WHERE created >= $<after> ORDER BY created ASC LIMIT 10
Here $<after> is a template parameter that is taken from the previous query.
Now, I can see two issues with this:
Since created is not unique, the order will not be fully defined. Perhaps the sort should be id, created?
Each row should only be on one page, but with this query the last row is always included on the next page.
How should I go about this in Postgres?
SELECT * FROM events
WHERE created >= $<after> and (id >= $<id> OR created > $<after>)
ORDER BY created ASC ,id ASC LIMIT 10
that way the events each timestamp values will be ordered by id. and you can split pages anywhere.
you can say the same thing this way:
SELECT * FROM events
WHERE (created,id) >= ($<after>,$<id>)
ORDER BY created ASC ,id ASC LIMIT 10
and for me this produces a slightly better plan.
An index on (created,id) will help performance most, but for
many circumstances an index on created may suffice.
First, as you said, you should enforce a total ordering. Since the main thing you care about is created, you should start with that. id could be the secondary ordering, a tie breaker invisible to the user that just ensures the ordering is consistent. Secondly, instead of messing around with conditions on created, you could just use an offset clause to return later results:
SELECT * FROM events ORDER BY created ASC, id ASC LIMIT 10 OFFSET <10 * page number>
-- Note that page number is zero based

How do I order my query by a field and still group by a subset of that field in db2?

Sorry if the title is confusing. Here is the query I have
Select MONTH(DATE(TIMESTAMP)), SUM(FIELD1), SUM(FIELD2) from TABLE WHERE TIMESTAMP BETWEEN '2009-07-26 00:00:00' AND '2010-02-24 23:59:59' GROUP BY MONTH(DATE(TIMESTAMP))
This will let me get the month number out of the query. The problem is that right now it is sorting the months 1,2,3,4.... when it spans two separate years. I need to be able to sort this query by year then month.
If I add "ORDER BY TIMESTAMP" at the end of my query I get this error:
Column TIMESTAMP or expression in SELECT list not valid. SQLCODE=-122
Also I changed the field names for this question to keep it clear the field isn't actually called TIMESTAMP
You need to group by year then month.:
SELECT YEAR(YourField),
Month(YourField),
SUM(Field1),
SUM(Field2)
FROM Table
WHERE...
GROUP BY
YEAR(YourField),
Month(YourField)
ORDER BY
YEAR(YourField),
Month(YourField)

sql date order by problem

i have image table, which has 2 or more rows with same date.. now im tring to do order by created_date DESC, which works fine and shows rows same position, but when i change the query and try again, it shows different positions.. and no i dont have any other order by field, so im bit confused on why its doing it and how can i fix it.
can you please help on this.
To get reproducible results you need to have columns in your order by clause that together are unique. Do you have an ID column? You can use that to tie-break:
ORDER BY created_date DESC, id
I suspect that this is happening because MySQL is not given any ordering information other than ORDER BY created_date DESC, so it does whatever is most convenient for MySQL depending on its complicated inner workings (caching, indexing, etc.). Assuming you have a unique key id, you could do:
SELECT * FROM table t ORDER BY t.created_date DESC, t.id ASC
Which would give you the same result every time because putting a comma in the arguments following ORDER BY gives it a secondary ordering rule that is executed when the first ordering rule doesn't produce a clear order between two rows.
To have consistent results, you will need to add at least more column to the 'ORDER BY' clause. Since the values in the created_date column are not unique, there is not a defined order. If you wanted that column to be 'unique', you could define it as a timestamp.