I have a select statement that inserts data into a #temptable, it looks like this:
select null as ID, Name, AnotherId, date into #TempTable from Table.
The resulting #temptable looks like this:
| Id | Name | AnotherId | Datetime |
---------------------------------------------------
| null | Login | 10 |2016-01-01 15:00:00|
| null | Command| 10 |2016-01-01 15:00:01|
| null | Login | 20 |2016-01-01 15:01:00|
| null | Command| 10 |2016-01-01 15:01:00|
| null | Logout | 10 |2016-01-01 15:01:01|
| null | Command| 20 |2016-01-01 15:01:02|
| null | Logout | 20 |2016-01-01 15:02:00|
I would like to put in the Id column a unique ID but with some conditions as:
When there is a Login assign a unique Id (eg for the 1st Login give Id = 1)
Then for the next Login have Id = 2
For the Commands between a Login and a Logout that have the same AnotherId then put the corresponding Id (eg for AnotherId = 10 I should have all the rows that have AnotherId = 10 -> Id = 1)
How should I proceed? Any help appreciated.
Edit: The results I want:
| Id | Name | AnotherId | Datetime |
| 1 | Login | 10 |2016-01-01 15:00:00|
| 1 | Command| 10 |2016-01-01 15:00:01|
| 2 | Login | 20 |2016-01-01 15:01:00|
| 1 | Command| 10 |2016-01-01 15:01:00|
| 1 | Logout | 10 |2016-01-01 15:01:00|
| 2 | Command| 20 |2016-01-01 15:01:02|
| 2 | Logout | 20 |2016-01-01 15:02:00
If I understand correctly, you want logins to have incremental ids, with all rows in-between having the same id.
Another way of expressing this is that the id is the number of logins on or before a given row.
In SQL Server 2012+, you can do this using ANSI standard cumulative sum functionality:
select sum(case when name = 'login' then 1 else 0 end) over
(partition by anotherId order by datetime) as ID,
Name, AnotherId, date
into #TempTable
from Table;
In earlier versions of SQL Server you can do this using outer apply.
EDIT:
The above (although useful) was not a complete understanding of the question. Instead:
select (case when name = 'login' then ID
else max(ID) over (partition by AnotherId order by DateTime)
end) as Id,
Name, AnotherId, date
into #TempTable
from (select sum(case when name = 'login' then 1 else 0 end) over
(order by datetime) as ID,
Name, AnotherId, date
from Table
) t;
Related
I am trying to find the daily count of frequent visitors from a very large data-set. Frequent visitors in this case are visitor IDs used on 2 distinct days in a rolling 3 day period.
My data set looks like the below:
ID | Date | Location | State | Brand |
1 | 2020-01-02 | A | CA | XYZ |
1 | 2020-01-03 | A | CA | BCA |
1 | 2020-01-04 | A | CA | XYZ |
1 | 2020-01-06 | A | CA | YQR |
1 | 2020-01-06 | A | WA | XYZ |
2 | 2020-01-02 | A | CA | XYZ |
2 | 2020-01-05 | A | CA | XYZ |
This is the result I am going for. The count in the visits column is equal to the count of distinct days from the date column, -2 days for each ID. So for ID 1 on 2020-01-05, there was a visit on the 3rd and 4th, so the count is 2.
Date | ID | Visits | Frequent Prior 3 Days
2020-01-01 |Null| Null | Null
2020-01-02 | 1 | 1 | No
2020-01-02 | 2 | 1 | No
2020-01-03 | 1 | 2 | Yes
2020-01-03 | 2 | 1 | No
2020-01-04 | 1 | 3 | Yes
2020-01-04 | 2 | 1 | No
2020-01-05 | 1 | 2 | Yes
2020-01-05 | 2 | 1 | No
2020-01-06 | 1 | 2 | Yes
2020-01-06 | 2 | 1 | No
2020-01-07 | 1 | 1 | No
2020-01-07 | 2 | 1 | No
2020-01-08 | 1 | 1 | No
2020-01-09 | 1 | null | Null
I originally tried to use the following line to get the result for the visits column, but end up with 3 in every successive row at whichever date it first got to 3 for that ID.
,
count(ID) over (Partition by ID order by Date ASC rows between 3 preceding and current row) as visits
I've scoured the forum, but every somewhat similar question seems to involve counting the values rather than the dates and haven't been able to figure out how to tweak to get what I need. Any help is much appreciated.
You can aggregate the dataset by user and date, then use window functions with a range frame to look at the three preceding rows.
You did not tell which database you are running - and not all databases support the window ranges, nor have the same syntax for literal intervals. In standard SQL, you would go:
select
id,
date,
count(*) cnt_visits
case
when sum(count(*)) over(
partition by id
order by date
range between interval '3' day preceding and current row
) >= 2
then 'Yes'
else 'No'
end is_frequent_visitor
from mytable
group by id, date
On the other hand, if you want a record for every user and every day (event when there is no visit), then it is a bit different. You can generate the dataset first, then bring the table with a left join:
select
i.id,
d.date,
count(t.id) cnt_visits,
case
when sum(count(t.id)) over(
partition by i.id
order by d.date
rows between '3' day preceding and current row
) >= 2
then 'Yes'
else 'No'
end is_frequent_visitor
from (select distinct id from mytable) i
cross join (select distinct date from mytable) d
left join mytable t
on t.date = d.date
and t.id = i.id
group by i.id, d.date
I would be inclined to approach this by expanding out the days and visitors using a cross join and then just window functions. Assuming you have all dates in the data:
select i.id, d.date,
count(t.id) over (partition by i.id
order by d.date
rows between 2 preceding and current row
) as cnt_visits,
(case when count(t.id) over (partition by i.id
order by d.date
rows between 2 preceding and current row
) >= 2
then 'Yes' else 'No'
end) as is_frequent_visitor
from (select distinct id from t) i cross join
(select distinct date from t) d left join
(select distinct id, date from t) t
on t.date = d.date and
t.id = i.id;
Scenario:
I have a table, events_table, that consists of records that are inserted by a webhook based on messages I send to my users:
"column_name" (type)
- "time_stamp" (timestamp with time zone)
- "username" (varchar)
- "delivered" (int)
- "action" (int)
Sample Data:
| time_stamp | username | delivered | action |
|:----------------|:---------|:----------|:-------|
|1349733421.460000| user1 | 1 | null |
|1549345346.460000| user3 | 1 | 1 |
|1524544421.460000| user1 | 1 | 1 |
|1345444421.570000| user7 | 1 | null |
|1756756761.980000| user9 | 1 | null |
|1234343421.460000| user171 | 1 | 1 |
|1843455621.460000| user5 | 1 | 1 |
| ... | ... | ... | ... |
The "delivered" column is null by default and 1 when delivered. The "action" column is null by default and is 1 when opened.
Problem:
Using PostgreSQL, how can I count the amount of individuals that opened an email in the previous 30 days from the Monday of each week?
Ideal query results:
| date | count |
|:----------------|:----------|
| 02/24/2020 | 1,234,123 |
| 02/17/2020 | 234,123 |
| 02/10/2020 | 1,234,123 |
| 02/03/2020 |12,341,213 |
| ... | ... |
My attempt:
This is the extent of what I've tried which gives me count of the previous week:
SELECT
date_trunc('week', to_timestamp("time_stamp")) as date,
count("username") as count,
lag(count(1), 1) over (order by "date") as "count_previous_week"
FROM events_table
WHERE "delivered" = 1
and "action" = 1
GROUP BY 1 order by 1 desc
This is my attempt at writing this query.
First I get the lowest and highest dates from the data set. I add 7 days on to the highest date to make sure I include data up to today.
I then run generate_series against these 2 values set with an interval of 7 days to give me every single monday between the 2 points (we can't rely on just mondays within your data set in case we have an empty week)
Then, I simply subquery and aggregate the data based on our generate_series output.
select
__weeks.week_begins,
(
select
count(distinct "username")
from
events_table
where
to_timestamp("time_stamp")::date between week_begins - '30 days'::interval and week_begins
and "delivered" = 1
and "action" = 1
)
from
(
select
generate_series(_.min_date, _.max_date, '7 days'::interval)::date as week_begins
from
(
select
min(date_trunc('week', to_timestamp("time_stamp"))::date) as min_date
max(date_trunc('week', to_timestamp("time_stamp"))::date) as max_date
from
events_table
where
"delivered" = 1
and "action" = 1
) as _
) as __weeks
order by
__weeks.week_begins
I'm not particularly keen on this query because the query planner visits the same table twice, but I can't think of another way to structure it.
On SQL Server 2008R2, I am using this script:
SELECT a.id,
a.ea1,
b.ea2
FROM database1table1 AS a
WHERE a.id LIKE N'Active;
The result set looks like this:
+-----+-----+---------------+---------------+
| Row | ID | EA1 | EA2 |
+-----+-----+---------------+---------------+
| 1 | 1 | wf#email.co | NULL |
| 2 | 1 | NULL | wf2#email.co |
| 3 | 1 | NULL | NULL |
| 4 | 2 | NULL | NULL |
| 5 | 3 | wf3#email.co | NULL |
+-----+-----+---------------+---------------+
etc.
ID = business number.
EA = email address.
In the above output, there are three rows where ID=1, but only two of those have email addresses.
I want my result to output the rows where there is no email address. So for this example, the output should only include rows where ID=2.
I have tried adding this WHERE clause:
AND (a.EA1 IS NULL) AND (a.EA2 IS NULL);
It's still returning rows where ID=1, because one of the rows there has no email address.
Can anyone please suggest an amendment to my script which would only return the row where ID=2?
Many thanks
Try with NOT EXISTS
SELECT
*
FROM
Tbl T
WHERE
T.EA1 IS NULL AND
T.EA2 IS NULL AND
NOT EXISTS
(
SELECT 1 FROM Tbl IT
WHERE
IT.ID = T.ID AND
(
IT.EA1 IS NOT NULL OR
IT.EA2 IS NOT NULL
)
)
;WITH CTE
AS
(
SELECT ID,MAX(ROW) AS RW,MAX(EA1) AS EA1,MAX(EA2) AS EA2
FROM #TEMP GROUP BY ID
)
SELECT * FROM CTE WHERE EA1 IS NULL AND EA2 IS NULL
Output:
ID RW EA1 EA2
2 4 NULL NULL
So, I have the next table:
time | name | ID |
12:00:00| access | 1 |
12:05:00| select | null |
12:10:00| update | null |
12:15:00| insert | null |
12:20:00| out | null |
12:30:00| access | 2 |
12:35:00| select | null |
The table is bigger (aprox 1-1,5 mil rows) and there will be ID equal to 2,3,4 etc and rows between.
The following should be the result:
time | name | ID |
12:00:00| access | 1 |
12:05:00| select | 1 |
12:10:00| update | 1 |
12:15:00| insert | 1 |
12:20:00| out | 1 |
12:30:00| access | 2 |
12:35:00| select | 2 |
What is the most simple method to update the rows without making the log full? Like, one ID at a time.
You can do it with a sub query:
UPDATE YourTable t
SET t.ID = (SELECT TOP 1 s.ID
FROM YourTable s
WHERE s.time < t.time AND s.name = 'access'
ORDER BY s.time DESC)
WHERE t.name <> 'access'
Index on (ID,time,name) will help.
You can do it using CTE as below:
;WITH myCTE
AS ( SELECT time
, name
, ROW_NUMBER() OVER ( PARTITION BY name ORDER BY time ) AS [rank]
, ID
FROM YourTable
)
UPDATE myCTE
SET myCTE.ID = myCTE.rank
SELECT *
FROM YourTable ORDER BY ID
I have a table like this
Event ID | Contract ID | Event date | Amount |
----------------------------------------------
1 | 1 | 2009-01-01 | 100 |
2 | 1 | 2009-01-02 | 20 |
3 | 1 | 2009-01-03 | 50 |
4 | 2 | 2009-01-01 | 80 |
5 | 2 | 2009-01-04 | 30 |
For each contract I need to fetch the latest event and amount associated with the event and get something like this
Event ID | Contract ID | Event date | Amount |
----------------------------------------------
3 | 1 | 2009-01-03 | 50 |
5 | 2 | 2009-01-04 | 30 |
I can't figure out how to group the data correctly. Any ideas?
Thanks in advance.
SQL 2k5/2k8:
with cte_ranked as (
select *
, row_number() over (
partition by ContractId order by EvantDate desc) as [rank]
from [table])
select *
from cte_ranked
where [rank] = 1;
SQL 2k:
select t.*
from table as t
join (
select max(EventDate) as MaxDate
, ContractId
from table
group by ContractId) as mt
on t.ContractId = mt.ContractId
and t.EventDate = mt.MaxDate