limit results based upon 1 column - tsql

I'm a SQL newbie, any help is greatly appreciated! The id is the ID of a person that is in a program, and STI is the step ID that correlates to the ID of the step they are currently on or have completed. What I'm having a hard time figuring out is how to limit the results to only show unique Person ID's. I can't get DISTINCT to work at all.
Here is the query:
SELECT
s.[PersonAliasId] AS [id]
, s.[StepTypeId] AS [sti]
FROM
[Step] s
ORDER BY s.[PersonAliasId]
The results from the above query are:
id sti
11126 19
47331 19
66693 7
68110 19
74838 7
89867 1
89867 2
110105 19
122059 19
122059 21
130273 7
139876 19
150180 19
161929 7
165926 19
169329 19
171922 19
There are multiple steps that we are tracking for each person. When they have completed one step and then moved to another, they show in this query twice. For example, person 122059 has completed step id 19 and are currently on step id 21. I don't really care about the multiple step numbers showing, I really only need the person ID to return once. Can anyone help me figure out what I'm doing wrong?

From my understanding of your question. You only care about unique id. So you could try to use the MAX() function to get the max step for each id.
select
distinct(s.PersonAliasId) AS id,
max(s.StepTypeId) AS sti
FROM Step s
GROUP BY s.PersonAliasId
ORDER BY s.PersonAliasId
db fiddle link
Let me know if I misunderstood anything

Related

PostgreSQL - Is it possible to write a PostgreSQL query that will limits the amount of results it returns based on specific criteria?

I know the wording of my title is vague so I will try to clarify my question.
tasks_table
user_id
completed_date
task_type
task_id
1
11/14/2021
A
34
1
11/13/2021
B
35
1
11/11/2021
A
36
1
11/09/2021
B
37
2
11/12/2021
A
38
2
11/02/2021
A
39
2
11/14/2021
B
40
2
10/14/2021
B
41
The table I am working with has more fields than this, but, these are the ones that are pertinent to the question. The task type can be either A or B.
What I am currently trying to do is get a result set that contains, at max, two tasks per user_id, one of task type A and one of task type B, that have been completed in the past 7 days. For example, the set the query should generate the following result set:
user_id
completed_date
task_type
task_id
1
11/14/2021
A
34
1
11/13/2021
B
35
2
11/12/2021
A
38
2
11/14/2021
B
40
There is a possibility that a user may have only done tasks of one type in that time period, but, it is guaranteed that any given user will have done atleast one task within that time. My question is it possible to create a query that can return such a result or would I have to query for a more generalized result and then trim the set down through logic in my JPA?
To select the most recent task for a given user_id and task_type within the last 7 days from now, if exists, you can try this :
SELECT DISTINCT ON (t.user_id, t.task_type) t.*
FROM tasks_table AS t
WHERE t.completed_date >= current_date - interval '7 days'
ORDER BY t.user_id, t.task_type, t.completed_date DESC

bulk import 80 lines of data via API

I have a tool that every x hours creates a y set of lines that I would simply like to add to a column into a specific smartsheet. And then every x hours I would simply like to overwrite these values with the new ones. That can have a different numbers of lines.
As I read the API in order to add or update anything I need to get all the row and columne IDs of the smart sheet in question.
Isn't there a easy way to formulate a JSON with a set of data and columne name and it just auto adds the rows as needed?
Data example is:
21
23
43
23
12
23
43
23
12
34
54
23
and then it could be:
23
23
55
4
322
12
3
455
3
AUTO
I really find it hard to believe that I need to read so much information into a script to be able to add just row of data. Nothing fancy.
Looking into sticking to just using cURL or Python
Thanks
If you want to add this data as new rows, this is fairly simple. It's only if you would like to replace existing data in existing rows where you would need to specify the row id.
The python SDK allows you to specify just a single column id, like so:
row_a = smartsheet.models.Row()
row_a.cells.append({
'column_id': 642523719853956
'value': 'New Status',
'strict': False
})
For more details, please see the API documentation regarding adding rows.

reshape and merge in stata

I have three data sets:
First, called education.dta. It contains individuals(students) over many years with their achieved educations from yr 1990-2000. Originally it is in wide format, but I can easily reshape it to long. It is presented as wide under:
id educ_90 educ_91 ... educ_00 cohort
1 0 1 1 87
2 1 1 2 75
3 0 0 2 90
Second, called graduate.dta. It contains information of when individuals(students) have finished high school. However, this data set do not contain several years only a "snapshot" of the individ when they finish high school and characteristics of the individual students such as backgroung (for ex parents occupation).
id schoolid county cohort ...
1 11 123 87
2 11 123 75
3 22 243 90
The third data set is called teachers.dta. It contains informations about all teachers at high school such as their education, if they work full or part time, gender... This data set is long.
id schoolid county year education
22 11 123 2011 1
21 11 123 2001 1
23 22 243 2015 3
Now I want to merge these three data sets.
First, I want to merge education.dta and graduate.dta on id.
Problem when education.dta is wide: I manage to merge education and graduation.dta. Then I make a loop so that all the variables in graduation.dta takes the same over all years, for eksample:
forv j=1990/2000 {
gen county j´=.
replace countyj´=county
}
However, afterwards when reshaping to long stata reposts that variable id does not uniquely identify the observations.
further, I have tried to first reshape education.dta to long, and thereafter merge either 1:m or m:1 with education as master, using graduation.dta.
However stata again reposts that id is not unique. How do I deal with this?
In next step I want to merge the above with teachers.dta on schoolid.
I want my final dataset in long format.
Thanks for your help :)
I am not certain that I have exactly the format of your data, it would be helpful if you gave us a toy dataset to look at using dataex (and could even help you figure out the problem yourself!)
But to start, because you are seeing that id is not unique, you need to figure out why there might be multiple ids in any of the datasets. Can someone in graduate.dta or education.dta appear more than once? help duplicates will probably be useful to explore the data in this way.
Because you want your dataset in long format I suggest reshaping education.dta to long first, then doing something like merge m:1 id using "graduate.dta" (once you figure out why some observations are showing up more than once) and then, finally something like merge 1:1 schoolid year using "teacher.dta" and you will have your final dataset.

How to get the total working hours of a particular project project (foreign key it may duplicate) using sql query

Im working on an iPhone application, Using SQLite as the backend
In my application i have two tables, ProjectsTable, Project_Hours_Table in my DB.
Here in first table i have list of projects with their unique_id.
In second table i have project ID's (as foreign key) and the respective working hours.
Now my requirement is to get the project_ID's and the sum of that particular project working hours.
as follows
Project_ID Hours
1 1
2 0
3 5 (1+4)
4 11 (5+6)
5 2
6 13 (7+6)
7 3
can any one please provide suggestions to implement the query
thanks in advance
select Project_Id,
sum(Hours) as Hours
from Proj_HoursTable
group by Project_Id
Since both of the required fields are available in Proj_HoursTable, no need to join to ProjectTable. Adding the project name to the output would require a join to ProjectTable (assumes project id and project name are 1 to 1
select pjt.ProjName,
pht.Project_Id,
sum(pht.Hours) as Hours
from Proj_HoursTable pht join ProjectsTable pjt
on pjt.Proj_Id = pht.Project_Id
group by pht.Project_Id, pjt.ProjName
First You need to display Proj_Id for particular project (That you want to show total hour) and store it in variable.
And then after write query for this
select * from Project_Hours_Table where Proj_Id = Proj_Id
This query return all data of specific Proj_Id that you need.
and store it data to NSMutableDictionary, after you need to get value of Hour field by your dictionary key #"hour"
you return this value as NSSting and you need to convert it in intValue by
int hourCount = [StringHourValue intValue];
and count it it will give you total number of Hour

Calculating change in leaders for baseball stats in MSSQL

Imagine I have a MSSQL 2005 table(bbstats) that updates weekly showing
various cumulative categories of baseball accomplishments for a team
week 1
Player H SO HR
Sammy 7 11 2
Ted 14 3 0
Arthur 2 15 0
Zach 9 14 3
week 2
Player H SO HR
Sammy 12 16 4
Ted 21 7 1
Arthur 3 18 0
Zach 12 18 3
I wish to highlight textually where there has been a change in leader for each category
so after week 2 there would be nothing to report on hits(H); Zach has joined Arthur with most strikeouts(SO) at
18; and Sammy is new leader in homeruns(HR) with 4
So I would want to set up a process something like
a) save the past data(week 1) as table bbstatsPrior,
b) updates the bbstats for the new results - I do not need assistance with this
c) compare between the tables for the player(s with ties) with max value for each column
and spits out only where they differ
d) move onto next column and repeat
In any real world example there would be significantly more columns to calculate for
Thanks
Responding to Brents comments, I am really after any changes in the leaders for each category
So I would have something like
select top 1 with ties player
from bbstatsPrior
order by H desc
and
select top 1 with ties player,H
from bbstats
order by H desc
I then want to compare the player from each query (do I need to do temp tables) . If they differ I want to output the second select statement. For the H category Ted is leader `from both tables but for other categories there are changes between the weeks
I can then loop through the columns using
select name from sys.all_columns sc
where sc.object_id=object_id('bbstats') and name <>'player'
If the number of stats doesn't change often, you could easily just write a single query to get this data. Join bbStats to bbStatsPrior where bbstatsprior.week < bbstats.week and bbstats.week=#weekNumber. Then just do a simple comparison between bbstats.Hits to bbstatsPrior.Hits to get your difference.
If the stats change often, you could use dynamic SQL to do this for all columns that match a certain pattern or are in a list of columns based on sys.columns for that table?
You could add a column for each stat column to designate the leader using a correlated subquery to find the max value for that column and see if it's equal to the current record.
This might get you started, but I'd recommend posting what you currently have to achieve this and the community can help you from there.