I need to retrieve last of these results? (T-SQL) - tsql

This query:
SELECT refPatient_id,actDate,refReason_id,refClinic_id,active
FROM PatientClinicHistory
WHERE refClinic_id = 24
GROUP BY refPatient_id,actDate,refReason_id,refClinic_id,active
ORDER BY refPatient_id,actDate
returns this result:
refPatient_id actDate refReason_id refClinic_id active
============= ==================== ============ ============ ======
15704 2009-02-09 12:48:00 19 24 0
15704 2009-02-10 10:25:00 23 24 1
15704 2009-02-10 10:26:00 19 24 0
15704 2009-02-12 10:16:00 23 24 1
15704 2009-02-13 15:41:00 19 24 0
15704 2009-04-14 17:48:00 19 24 0
15704 2009-06-24 16:06:00 19 24 0
15731 2009-05-20 12:19:00 19 24 0
16108 2009-07-20 11:08:00 19 24 0
16139 2009-03-02 13:55:00 19 24 0
16569 2009-07-13 15:57:00 20 24 0
17022 2009-06-02 16:02:00 19 24 0
17022 2009-08-19 15:08:00 19 24 0
17022 2009-09-01 15:47:00 21 24 0
17049 2009-02-02 16:49:00 19 24 0
17049 2009-02-04 15:16:00 19 24 0
17063 2009-07-22 11:35:00 21 24 0
17063 2009-07-28 10:14:00 22 24 1
17502 2008-12-15 17:25:00 19 24 0
I need to get every patient's last passive action row (active = 0) (So I need to obtain the maximum actDate for each patient).
Should I write a new query after I get all these results in order to filter it?
Edited:
Thank you for your responses, actually I need to get last action for each patient.
e.g:
17022 2009-06-02 16:02:00 19 24 0
17022 2009-08-19 15:08:00 19 24 0
17022 2009-09-01 15:47:00 21 24 0
I need to filter the last row(max actDate for each patient).
17022 2009-09-01 15:47:00 21 24 0

You could try taking the actDate out of the group by and using max function max(actDate)
like
SELECT refPatient_id,Max(actDate),refReason_id,refClinic_id,active
FROM PatientClinicHistory
WHERE refClinic_id = 24
AND active = 0
GROUP BY refPatient_id,refReason_id,refClinic_id,active
ORDER BY refPatient_id

You could use a CTE
;WITH PatientClinicHistoryNew AS
(
SELECT refPatient_id,actDate,refReason_id,refClinic_id,active
FROM PatientClinicHistory
WHERE refClinic_id = 24
GROUP BY refPatient_id,actDate,refReason_id,refClinic_id,active
ORDER BY refPatient_id,actDate
)
SELECT refPatient_id, Max (actDate)
FROM PatientClinicHistoryNew
WHERE 1=1
AND active = 0
GROUP BY refPatient_id

SELECT refPatient_id,MAX(actDate)
FROM PatientClinicHistory
WHERE refClinic_id = 24
GROUP BY refPatient_id
will calculate maximum actDate for each patient. Is it what you want?

Related

How to average data per week?

I hope someone could help me. I am starting to use R.
1st of all I would like to know if it is possible to determine the week of the year with the day my data was collected using R. I made this manually, but takes long time and increases the chance of my making a mistake...
I also am interested in getting the average of each week. For example, I have 2 data points in week 21.
An example of my data:
enter image description here
Week Date Class 1 g/plant Total g/plant 10 berry weigh Brix
21 26/05/2022 34.53571429 34.53571429 25.7 11.55
21 28/05/2022 35.39285714 39.25 27.1 10.98
22 31/05/2022 41.17857143 41.17857143 22.8 11.8
22 03/06/2022 57.60714286 57.60714286 22.2 10.91
23 06/06/2022 23.67857143 23.67857143 26.4 12.3
23 09/06/2022 23.60714286 24.14285714 24.7 12.63
24 14/06/2022 18.82142857 19.78571429 26.4 12.8
24 18/06/2022 20.78571429 20.78571429 30 12.05
25 21/06/2022 3.178571429 3.25 22.2 10.3
25 23/06/2022 0 0 0 0
25 25/06/2022 0 0 0 0
26 28/06/2022 0 0 0 0
26 01/07/2022 0 0 0 0
27 05/07/2022 0 0 0 0
27 09/07/2022 0 0 0 0
28 12/07/2022 0 0 0 0
28 14/07/2022 0 0 0 0
28 16/07/2022 0 0 0 0
30 26/07/2022 50.89285714 50.89285714 27.6 9.85
30 29/07/2022 19.39285714 19.39285714 19.1 10.58
31 02/08/2022 68.57142857 68.57142857 25 8.91
31 06/08/2022 58.75 58.75 24.9 8.81
32 09/08/2022 46.57142857 46.57142857 17.7 8.92
32 11/08/2022 24.25 24.25 17.2 9.77
32 13/08/2022 32.14285714 32.14285714 16 20.41
33 16/08/2022 53.14285714 53.14285714 19.7 10.09
33 20/08/2022 57.96428571 59.25 17.8 9.49
34 25/08/2022 28.10714286 28.10714286 18 9.99
35 30/08/2022 81.03571429 81.60714286 19.6 10.89
35 02/09/2022 22.53571429 22.53571429 14.8 10.04
36 06/09/2022 36.53571429 38.96428571 17.9 11.18
36 09/09/2022 24.5 25.71428571 17.3 10.48
37 16/09/2022 57.35714286 60.96428571 21.2 12.21
38 21/09/2022 5.142857143 7.142857143 13.5 11.58
39 30/09/2022 29.9047619 31.76190476 16.4 15.49
40 07/10/2022 22.9047619 24.47619048 16.4 15.12
41 12/10/2022 14.61904762 14.85714286 12.5 14.14
42 19/10/2022 15.57142857 17.04761905 15.6 14.24
43 26/10/2022 20.14285714 22.0952381 17.6 12.32
Thank you in advance!
Alex
I am interested in getting the average of each week. For example, I have 2 data points in week 21.
I am not sure what to do.

kdb - KDB Apply logic where column exists - data validation

I'm trying to perform some simple logic on a table but I'd like to verify that the columns exists prior to doing so as a validation step. My data consists of standard table names though they are not always present in each data source.
While the following seems to work (just validating AAA at present) I need to expand to ensure that PRI_AAA (and eventually many other variables) is present as well.
t: $[`AAA in cols `t; temp: update AAA_VAL: AAA*AAA_PRICE from t;()]
Two part question
This seems quite tedious for each variable (imagine AAA-ZZZ inputs and their derivatives). Is there a clever way to leverage a dictionary (or table) to see if a number of variables exists or insert a place holder column of zeros if they do not?
Similarly, can we store a formula or instructions to to apply within a dictionary (or table) to validate and return a calculation (i.e. BBB_VAL: BBB*BBB_PRICE.) Some calculations would be dependent on others (i.e. BBB_Tax_Basis = BBB_VAL - BBB_COSTS costs for example so there could be iterative issues.
Thank in advance!
A functional update may be the best way to achieve this if your intention is to update many columns of a table in a similar fashion.
func:{[t;x]
if[not x in cols t;t:![t;();0b;(enlist x)!enlist 0]];
:$[x in cols t;
![t;();0b;(enlist`$string[x],"_VAL")!enlist(*;x;`$string[x],"_PRICE")];
t;
];
};
This function will update t with *_VAL columns for any column you pass as an argument, while first also adding a zero column for any missing columns passed as an argument.
q)t:([]AAA:10?100;BBB:10?100;CCC:10?100;AAA_PRICE:10*10?10;BBB_PRICE:10*10?10;CCC_PRICE:10*10?10;DDD_PRICE:10*10?10)
q)func/[t;`AAA`BBB`CCC`DDD]
AAA BBB CCC AAA_PRICE BBB_PRICE CCC_PRICE DDD_PRICE AAA_VAL BBB_VAL CCC_VAL DDD DDD_VAL
---------------------------------------------------------------------------------------
70 28 89 10 90 0 0 700 2520 0 0 0
39 17 97 50 90 40 10 1950 1530 3880 0 0
76 11 11 0 0 50 10 0 0 550 0 0
26 55 99 20 60 80 90 520 3300 7920 0 0
91 51 3 30 20 0 60 2730 1020 0 0 0
83 81 7 70 60 40 90 5810 4860 280 0 0
76 68 98 40 80 90 70 3040 5440 8820 0 0
88 96 30 70 0 80 80 6160 0 2400 0 0
4 61 2 70 90 0 40 280 5490 0 0 0
56 70 15 0 50 30 30 0 3500 450 0 0
As you've already mentioned, to cover point 2, a dictionary of functions might be the best way to go.
q)dict:raze{(enlist`$string[x],"_VAL")!enlist(*;x;`$string[x],"_PRICE")}each`AAA`BBB`DDD
q)dict
AAA_VAL| * `AAA `AAA_PRICE
BBB_VAL| * `BBB `BBB_PRICE
DDD_VAL| * `DDD `DDD_PRICE
And then a slightly modified function...
func:{[dict;t;x]
if[not x in cols t;t:![t;();0b;(enlist x)!enlist 0]];
:$[x in cols t;
![t;();0b;(enlist`$string[x],"_VAL")!enlist(dict`$string[x],"_VAL")];
t;
];
};
yields a similar result.
q)func[dict]/[t;`AAA`BBB`DDD]
AAA BBB CCC AAA_PRICE BBB_PRICE CCC_PRICE DDD_PRICE AAA_VAL BBB_VAL DDD DDD_VAL
-------------------------------------------------------------------------------
70 28 89 10 90 0 0 700 2520 0 0
39 17 97 50 90 40 10 1950 1530 0 0
76 11 11 0 0 50 10 0 0 0 0
26 55 99 20 60 80 90 520 3300 0 0
91 51 3 30 20 0 60 2730 1020 0 0
83 81 7 70 60 40 90 5810 4860 0 0
76 68 98 40 80 90 70 3040 5440 0 0
88 96 30 70 0 80 80 6160 0 0 0
4 61 2 70 90 0 40 280 5490 0 0
56 70 15 0 50 30 30 0 3500 0 0
Here's another approach which handles dependent/cascading calculations and also figures out which calculations are possible or not depending on the available columns in the table.
q)show map:`AAA_VAL`BBB_VAL`AAA_RevenueP`AAA_RevenueM`BBB_Other!((*;`AAA;`AAA_PRICE);(*;`BBB;`BBB_PRICE);(+;`AAA_Revenue;`AAA_VAL);(%;`AAA_RevenueP;1e6);(reciprocal;`BBB_VAL));
AAA_VAL | (*;`AAA;`AAA_PRICE)
BBB_VAL | (*;`BBB;`BBB_PRICE)
AAA_RevenueP| (+;`AAA_Revenue;`AAA_VAL)
AAA_RevenueM| (%;`AAA_RevenueP;1000000f)
BBB_Other | (%:;`BBB_VAL)
func:{c:{$[0h=type y;.z.s[x]each y;-11h<>type y;y;y in key x;.z.s[x]each x y;y]}[y]''[y];
![x;();0b;where[{all in[;cols x]r where -11h=type each r:(raze/)y}[x]each c]#c]};
q)t:([] AAA:1 2 3;AAA_PRICE:1 2 3f;AAA_Revenue:10 20 30;BBB:4 5 6);
q)func[t;map]
AAA AAA_PRICE AAA_Revenue BBB AAA_VAL AAA_RevenueP AAA_RevenueM
---------------------------------------------------------------
1 1 10 4 1 11 1.1e-05
2 2 20 5 4 24 2.4e-05
3 3 30 6 9 39 3.9e-05
/if the right columns are there
q)t:([] AAA:1 2 3;AAA_PRICE:1 2 3f;AAA_Revenue:10 20 30;BBB:4 5 6;BBB_PRICE:4 5 6f);
q)func[t;map]
AAA AAA_PRICE AAA_Revenue BBB BBB_PRICE AAA_VAL BBB_VAL AAA_RevenueP AAA_RevenueM BBB_Other
--------------------------------------------------------------------------------------------
1 1 10 4 4 1 16 11 1.1e-05 0.0625
2 2 20 5 5 4 25 24 2.4e-05 0.04
3 3 30 6 6 9 36 39 3.9e-05 0.02777778
The only caveat is that your map can't have the same column name as both the key and in the value of your map, aka cannot re-use column names. And it's assumed all symbols in your map are column names (not global variables) though it could be extended to cover that
EDIT: if you have a large number of column maps then it will be easier to define it in a more vertical fashion like so:
map:(!). flip(
(`AAA_VAL; (*;`AAA;`AAA_PRICE));
(`BBB_VAL; (*;`BBB;`BBB_PRICE));
(`AAA_RevenueP;(+;`AAA_Revenue;`AAA_VAL));
(`AAA_RevenueM;(%;`AAA_RevenueP;1e6));
(`BBB_Other; (reciprocal;`BBB_VAL))
);

How to get list of neighbors with distance N from index in matrix? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a matrix like this:
35 1 6 26 19 24
3 32 7 21 23 25
31 9 2 22 27 20
8 28 33 17 10 15
30 5 34 12 14 16
4 36 29 13 18 11
I want a list of neighbors with distance 3 for each cell. For example,
the list of neighbors with distance 3 for (1, 1) is:
[8, 28, 33, 17, 26, 21, 22, 17]
Visual explanation:
[35] 1 6 |26| 19 24
3 32 7 |21| 23 25
31 9 2 |22| 27 20
-------------------
8 28 33 |17| 10 15
-------------------
30 5 34 12 14 16
4 36 29 13 18 11
The list of neighbors with distance 3 for (3, 3) is
[4, 36, 29, 13, 18, 11, 24, 25, 20, 15, 16]
Visual explanation:
35 1 6 26 19 |24|
3 32 7 21 23 |25|
31 9 [2] 22 27 |20|
8 28 33 17 10 |15|
30 5 34 12 14 |16|
------------------------
4 36 29 13 18 |11|
------------------------
Generate an all-zero "index matrix" idx with the same size of your matrix A, and set the "seed" to 1:
A = [ ...
35 1 6 26 19 24; ...
3 32 7 21 23 25; ...
31 9 2 22 27 20; ...
8 28 33 17 10 15; ...
30 5 34 12 14 16; ...
4 36 29 13 18 11 ...
]
idx = zeros(size(A));
idx(3, 2) = 1
We get:
A =
[...]
idx =
0 0 0 0 0 0
0 0 0 0 0 0
0 1 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Now, we use 2-D convolution, i.e. MATLAB's conv2 method to create the correct index matrix w.r.t. to the distance d:
idx = logical(conv2(idx, ones(2*d+1), 'same') - conv2(idx, ones(2*d-1), 'same'))
(Convolution is the key to success.)
Then, we get:
idx =
0 0 0 0 1 0
0 0 0 0 1 0
0 0 0 0 1 0
0 0 0 0 1 0
0 0 0 0 1 0
1 1 1 1 1 0
Since we already casted the indices to logical, we can directly access the proper elements in the matrix A:
B = A(idx).'
The final result:
B =
4 36 29 13 19 23 27 10 14 18
Please notice the difference in the result as you wrote (3, 2) in your second example, but actually marked (3, 3) as "seed".
Hope that helps!
Disclaimer: Tested with Octave 5.1.0, but also works with MATLAB Online.

How to remove all the rows from a matrix that match values in another vector?

I am making an exclude vector, so that the rows containing any value present in the second column of the matrix user from the exclude list are removed. How do I do that efficiently, without using a for loop to iterate through user for each item in exclude one by one?
My code below does not work:
count=0;
% Just showing how I am constructing `exclude`, to show that it can be long.
% So, manually removing each item from `exclude` is not an option.
% And using a for loop to iterate through each element in `exclude` can be inefficient.
for b=1:size(user_cat,1)
if user_cat(b,4)==0
count=count+1;
exclude(count,1) = user_cat(b,1);
end
end
% This is the important line of focus. You can ignore the previous parts.
user = user(user(:,2)~=exclude(:),:);
The last line gives the following error:
Error using ~=
Matrix dimensions must agree.
So, I am having to use this instead:
for b=1:size(exclude,1)
user = user(user(:,2)~=exclude(b,1),:);
end
Example:
user=[1433100000.00000 26 620260 7 1433100000000.00 0 0 2 1 100880 290 23
1433100000.00000 26 620260 7 1433100000000.00 0 0 2 1 100880 290 23
1433100000.00000 25 620160 7 1433100000000.00 0 0 2 1 100880 7274 22
1433100000.00000 21 619910 7 1433100000000.00 24.1190000000000 120.670000000000 2 0 100880 53871 21
1433100000.00000 19 620040 7 1433100000000.00 24.1190000000000 120.670000000000 2 0 100880 22466 21
1433100000.00000 28 619030 7 1433100000000.00 24.6200000000000 120.810000000000 2 0 100880 179960 16
1433100000.00000 28 619630 7 1433100000000.00 24.6200000000000 120.810000000000 2 0 100880 88510 16
1433100000.00000 28 619790 7 1433100000000.00 24.6200000000000 120.810000000000 2 0 100880 12696 16
1433100000.00000 7 36582000 7 1433100000000.00 0 0 2 0 100880 33677 14
1433000000.00000 24 620010 7 1433000000000.00 0 0 2 1 100880 3465 14
1433000000.00000 4 36581000 7 1433000000000.00 0 0 2 0 100880 27809 12
1433000000.00000 20 619960 7 1433000000000.00 0 0 2 1 100880 860 11
1433000000.00000 30 619760 7 1433000000000.00 25.0060000000000 121.510000000000 2 0 100880 34706 10
1433000000.00000 33 619910 7 1433000000000.00 0 0 2 0 100880 15060 9
1433000000.00000 26 619740 6 1433000000000.00 0 0 2 0 100880 52514 8
1433000000.00000 18 619900 6 1433000000000.00 0 0 2 0 100880 21696 8
1433000000.00000 16 619850 6 1433000000000.00 24.9910000000000 121.470000000000 2 0 100880 10505 1
1433000000.00000 16 619880 6 1433000000000.00 24.9910000000000 121.470000000000 2 0 100880 1153 1
1433000000.00000 28 619120 6 1433000000000.00 0 0 2 0 100880 103980 24
1433000000.00000 21 619870 6 1433000000000.00 0 0 2 0 100880 1442 24];
exclude=[ 3
4
7
10
17
18
19
28
30
33 ];
Desired output:
1433100000.00000 26 620260 7 1433100000000.00 0 0 2 1 100880 290 23
1433100000.00000 26 620260 7 1433100000000.00 0 0 2 1 100880 290 23
1433100000.00000 25 620160 7 1433100000000.00 0 0 2 1 100880 7274 22
1433100000.00000 21 619910 7 1433100000000.00 24.1190000000000 120.670000000000 2 0 100880 53871 21
1433000000.00000 24 620010 7 1433000000000.00 0 0 2 1 100880 3465 14
1433000000.00000 20 619960 7 1433000000000.00 0 0 2 1 100880 860 11
1433000000.00000 26 619740 6 1433000000000.00 0 0 2 0 100880 52514 8
1433000000.00000 16 619850 6 1433000000000.00 24.9910000000000 121.470000000000 2 0 100880 10505 1
1433000000.00000 16 619880 6 1433000000000.00 24.9910000000000 121.470000000000 2 0 100880 1153 1
1433000000.00000 21 619870 6 1433000000000.00 0 0 2 0 100880 1442 24
Use ismember to find the indices of the second column of user where elements of exclude exist to get the indices of the rows to be removed. Negate these row indices to get the row indices to be kept and use matrix indexing to keep these rows.
user = user(~ismember(user(:,2),exclude),:);

TimeGrouper, pandas

I use TimeGrouper from pandas.tseries.resample to sum monthly return to 6M as follows:
6m_return = monthly_return.groupby(TimeGrouper(freq='6M')).aggregate(numpy.sum)
where monthly_return is like:
2008-07-01 0.003626
2008-08-01 0.001373
2008-09-01 0.040192
2008-10-01 0.027794
2008-11-01 0.012590
2008-12-01 0.026394
2009-01-01 0.008564
2009-02-01 0.007714
2009-03-01 -0.019727
2009-04-01 0.008888
2009-05-01 0.039801
2009-06-01 0.010042
2009-07-01 0.020971
2009-08-01 0.011926
2009-09-01 0.024998
2009-10-01 0.005213
2009-11-01 0.016804
2009-12-01 0.020724
2010-01-01 0.006322
2010-02-01 0.008971
2010-03-01 0.003911
2010-04-01 0.013928
2010-05-01 0.004640
2010-06-01 0.000744
2010-07-01 0.004697
2010-08-01 0.002553
2010-09-01 0.002770
2010-10-01 0.002834
2010-11-01 0.002157
2010-12-01 0.001034
The 6m_return is like:
2008-07-31 0.003626
2009-01-31 0.116907
2009-07-31 0.067688
2010-01-31 0.085986
2010-07-31 0.036890
2011-01-31 0.015283
However I want to get the 6m_return starting 6m from 7/2008 like the following:
2008-12-31 ...
2009-06-31 ...
2009-12-31 ...
2010-06-31 ...
2010-12-31 ...
Tried the different input options (i.e. loffset) in TimeGrouper but doesn't work.
Any suggestion will be really appreciated!
The problem can be solved by adding closed = 'left'
df.groupby(pd.TimeGrouper('6M', closed = 'left')).aggregate(numpy.sum)
TimeGrouper that is suggested in other answers is deprecated and will be removed from Pandas. It is replaced with Grouper. So a solution to your question using Grouper is:
df.groupby(pd.Grouper(freq='6M', closed='left')).aggregate(numpy.sum)
This is a workaround for what seems a bug, but give it a try and see if it works for you.
In [121]: ts = pandas.date_range('7/1/2008', periods=30, freq='MS')
In [122]: df = pandas.DataFrame(pandas.Series(range(len(ts)), index=ts))
In [124]: df[0] += 1
In [125]: df
Out[125]:
0
2008-07-01 1
2008-08-01 2
2008-09-01 3
2008-10-01 4
2008-11-01 5
2008-12-01 6
2009-01-01 7
2009-02-01 8
2009-03-01 9
2009-04-01 10
2009-05-01 11
2009-06-01 12
2009-07-01 13
2009-08-01 14
2009-09-01 15
2009-10-01 16
2009-11-01 17
2009-12-01 18
2010-01-01 19
2010-02-01 20
2010-03-01 21
2010-04-01 22
2010-05-01 23
2010-06-01 24
2010-07-01 25
2010-08-01 26
2010-09-01 27
2010-10-01 28
2010-11-01 29
2010-12-01 30
I've used integers to help confirm that the sums are correct. The workaround that seems to work is to add a month to the front of the dataframe to trick the TimeGrouper into doing what you need.
In [127]: df2 = pandas.DataFrame([0], index = [df.index.shift(-1, freq='MS')[0]])
In [129]: df2.append(df).groupby(pandas.TimeGrouper(freq='6M')).aggregate(numpy.sum)[1:]
Out[129]:
0
2008-12-31 21
2009-06-30 57
2009-12-31 93
2010-06-30 129
2010-12-31 165
Note the final [1:] is there to trim off the first group.