egen tag skipping - tags

I want to tag new occurrences of a new agent_type. I use the following code to do this:
clear
input long patid float(how_many_drugs agent_type eventdate) byte tag4
01 3 4 14962 1
01 3 5 14962 1
01 3 4 14997 0
01 3 9 14997 0
01 3 5 15025 0
01 3 9 15040 1
01 3 4 15040 0
01 3 5 15082 0
end
format %td eventdate
label values agent_type drugstypes1
label def drugstypes1 4 "alpha blocker", modify
label def drugstypes1 5 "ace_inhib", modify
label def drugstypes1 9 "loop", modify
label def drugstypes1 13 "CCB", modify
egen tag4=tag (patid agent_type_new how_many_drugs)
The code works fine, until we reach the first occurrence of "loop" where a tag is NOT generated. Rather, the tag is generated on the second occurrence of "loop".
Why is this happening and how can I make it work to make a tag on first occurrence?
I have made sure that the data were sorted by patid event_date before running the tag code.

As the original author of this egen function tag(), I can comment on its intent.
The intent is not to tag first occurrences as such. The intent is to tag just one of several occurrences which so far as the user is concerned are equivalent.
As it happens, there are only two systematic ways to tag equivalent occurrences, to tag the first or the last. As groups could be as small as one observation, any rule must work on groups that small. For groups of one, choosing the first is the same as choosing the last, but otherwise that is not so. I chose to tag the first in the original code (long since adopted into official Stata), but that is arbitrary.
So why this is happening to you? The function feels totally free to re-sort the data temporarily, as looking at the code will show you:
viewsource _gtag.ado
This is what is biting.
You want to tag the first occurrences of each distinct value of each drug type for each patient. That is one line, as at bottom. I don't understand why how_many_drugs is used in your code.
clear
input long patid float(how_many_drugs agent_type eventdate) byte tag4
01 3 4 14962 1
01 3 5 14962 1
01 3 4 14997 0
01 3 9 14997 0
01 3 5 15025 0
01 3 9 15040 1
01 3 4 15040 0
01 3 5 15082 0
end
format %td eventdate
label values agent_type drugstypes1
label def drugstypes1 4 "alpha blocker", modify
label def drugstypes1 5 "ace_inhib", modify
label def drugstypes1 9 "loop", modify
label def drugstypes1 13 "CCB", modify
bysort patid agent_type (eventdate) : gen first = _n == 1

I've been playing with this for a little while, and in the end abandoned egen tag(). I couldn't understand why it wouldn't pick up the first occurrence of every agent_type so instead I've opted for this:
bys patid agent_type (eventdate): gen n=_n
sort patid eventdate
replace n=0 if n!=1

Related

Filling a calendar using Arrayformula or LOOKUP

I've made a calendar sheet and would like to fill it using an Arrayformula or some kind of Lookup.
The problem is, the code in each cell is different, do I need it all to be the same code or is it possible to do an Arrayformula that does a different formula for each line?
I spent ages getting the calendar code working but would now like to simplify the code and I'm not sure what my next step should be:
https://docs.google.com/spreadsheets/d/1u_J7bmOFyDlYXhcL5dW3CHFJ1esySAKK_yPc6nFTdLA/edit?usp=sharing
Any advice would be much appreciated.
I've added a new sheet in your file called 'Aresvik'.
The green cells have new formula.
Cell B3 can be =date(B1,1,1)
Then each successive month can be =eomonth(B3,0)+1, =eomonth(J3,0)+1 etc.
The date formula in cell B5 is:
=arrayformula(iferror(vlookup(sequence(7,7,1),{array_constrain(sequence(40,1),day(eomonth(B3,0))+weekday(B3,3),1),query({flatten(split(rept(",",day(eomonth(B3,0))-1),",",0,0));sequence(day(eomonth(B3,0)),1,1)},"offset "&day(eomonth(B3,0))-weekday(B3,3)&" ",0)},2,false),))
It can be copied to each other cell below Mo, so B5 will change to J5, R5, Z5 etc.
Notes
The concept revolves around using the SEQUENCE function to generate a grid of numbers, 6 rows, 7 columns:
sequence(6,7)
which looks like this:
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 32 33 34 35
36 37 38 39 40 41 42
Then using these numbers in a VLOOKUP to get a corresponding date for the calendar. If the first of the month falls on a Thursday (April 2021), the vlookup range needs 3 gaps at the top of the list of dates. player0 has a more elegant solution than my original query using offset, so I've incorporated it below. Cell Z3 is the date 1/4/2021:
=arrayformula(
iferror(
vlookup(sequence(6,7),
{sequence(day(eomonth(Z3,0))+weekday(Z3,2),1,0),
{iferror(sequence(weekday(Z3,2),1)/0,);sequence(day(eomonth(Z3,0)),1,Z3)}},
2,false)
,))
The first column in the vlookup range is:
sequence(day(eomonth(Z3,0))+weekday(Z3,2),1,0)
which is an array of numbers from 0, corresponding with the number of days in the month plus the number of gaps before the 1st day.
The second column in the vlookup range is:
{iferror(sequence(weekday(Z3,2),1)/0,);sequence(day(eomonth(Z3,0)),1,Z3)}},
It is an array of 2 columns in this format: {x;y}, where y sits below x because of the ;.
These are the gaps: iferror(sequence(weekday(Z3,2),1)/0,), followed by the date numbers: sequence(day(eomonth(Z3,0)),1,Z3)
(Example below is April 2021):
0
1
2
3
4
5
6 44317
7 44318
8 44319
9 44320
10 44321
11 44322
12 44323
13 44324
14 44325
15 44326
16 44327
17 44328
18 44329
19 44330
20 44331
21 44332
22 44333
23 44334
24 44335
25 44336
26 44337
27 44338
28 44339
29 44340
30 44341
31 44342
32 44343
33 44344
34 44345
35 44346
36 44347
The vlookup takes each number in the initial sequence (6x7 layout), and brings back the corresponding date from col2 in the range, based on a match in col1.
When the first day of the month is a Monday, iferror(sequence(weekday(BB1,2),1)/0,) generates a gap in col2 of the vlookup range. This is why col1 in the vlookup range has to start with 0.
I've updated the sheet at https://docs.google.com/spreadsheets/d/1u_J7bmOFyDlYXhcL5dW3CHFJ1esySAKK_yPc6nFTdLA/edit#gid=68642071
Values on the calendar are dates so the formatting has to be d.
If you want numbers, then use:
=arrayformula(
iferror(
vlookup(sequence(6,7),
{sequence(day(eomonth(Z3,0))+weekday(Z3,2),1,0),
{iferror(sequence(weekday(Z3,2),1)/0,);sequence(day(eomonth(Z3,0)),1)}},
2,false)
,))
shorter solution:
=INDEX(IFNA(VLOOKUP(SEQUENCE(6, 7), {SEQUENCE(DAY(EOMONTH(B3, ))+WEEKDAY(B3, 2), 1, ),
{IFERROR(ROW(INDIRECT("1:"&WEEKDAY(B3, 2)))/0); SEQUENCE(DAY(EOMONTH(B3, )), 1, B3)}}, 2, )))

Find exact word in order in SphinxQL

I'got got an indexed columns that contains some numbers (ids) and I need to extract rows that match exactly some numbers in a given order.
For example
"Give me rows that contains 1 followed by 1 followed by 25 followed by 30"
1 1 1 2 2 25 25 26 30 31 => is valid
1 1 1 2 2 25 25 26 31 32 => is not valid
1 1 1 2 2 2 2 2 25 30 30 => is valid
I'm trying with 1 >> 1 >> 2 >> 2 but it does not work (I think because it match "1" as single character and not as a "word")
The strict order operator is << , soo
1 << 1 << 25 << 30
should work.
Matching part words/single charactors (as opposed to whole words) would only work if specifically enabled it, eg with min_infix_len=1 and would probably only match if have enable_keywords=1 (unless sphinx is old enough to have enable_stat=0

Formatting rows of crystal report output

I am a new to crystal report (2008) and need help on my formatting problem.
I have output sample as below in crystal report:
srNo Name ID assigned_number
==================================
1 aaa 111 1
2 bbb 222 2
3 ccc 333 3
4 ddd 444 23
5 fff 445 32
6 ggg 432 1
7 ffr 435 2
8 rty 654 43
9 ttt 434 33
10 trt 343 1
11 rre 346 2
12 gth 543 3
13 fgr 644 54
14 yyy 431 2
15 tut 323 3
16 hyj 777 4
17 juu 322 32
Have a look on last column assigned_number, here I want to highlight the row values (with row color) whenever the last column values are 1, 2, 3 consecutively (not 1, 2 or 2, 3).
So, here srNo 1 to 3 and 10 to 12 should be highlighted with row color as the last column values are 1,2,3(consecutively).
Let me know if it's not clear.
Thanks
You right-click on the field in your assigned_number column and choose Format Field. Then in the Border tab you check the Background box and enter a conditional formula under the "x+2" icon next to Background.
The formula is a little tricky. I have not tested this but it could go something like:
if previous ({assigned_number}) = 1 and
next({assigned_number}) = 3 then crRed
else crWhite
This will color the row with the 2 in it. Unfortunately "next" and "previous" are only limited to one record each way, so for 1 and 3 that won't work.
EDIT:
This formula will work but also highlight 1,2 and 2,3 combos. Even with a formula trying to get the previous 2 records ( the 1,2 when you're at 3) or next 2 (2,3 when you're at 1) doesn't work.
if {assigned_number} in [1, 2, 3] and
previous({assigned_number}) = 1 and
next({assigned_number}) = 3 or
{assigned_number} = 1 and
next({assigned_number}) = 2 or
{assigned_number} = 3 and
previous({assigned_number}) = 2
then crRed
else crWhite
If Right(assigned_number) in [1,2,3]
Then crred
else crwhite.
now you can extend this formula to the any number of values.

SSRS 2008 tablix results missing values

I have a report that takes an order number as a parameter and shows a tablix with each part number on a row along with the part description, number ordered, number shipped, number remaining to ship, and number on backorder.
SSMS shows the query returns the same number of rows as the tablix shows. However, the tablix has blanks in several places. I have no filters, no visibility settings, and no special conditions. I have zeros set to display as '-'.
The blanks occur below identical values in two columns: Number Shipped and Number Remaining. That is, a value is not shown (only in these two columns) if it is the same as the value above it, like this:
Item Desc #Ordered #Shipped #Remaining #Backorder
1H abc 4 4 - -
2R def 1 - 1 0
5L ghi 6 6 3
7P jkl 6 6 - -
9Q mno 6 -
There should be a - (for zero) for 5L under #Shipped. 9Q should have a 6 under #Shipped and a - under #Remaining, like this:
Item Desc #Ordered #Shipped #Remaining #Backorder
1H abc 4 4 - -
2R def 1 - 1 0
5L ghi 6 - 6 3
7P jkl 6 6 - -
9Q mno 6 6 - -
What is going on?
In the query, try isnull(#Shipped,'-') to catch the rest of the blanks.
If that doesn't work, use the TextBox Expression:
=switch(len(#Shipped)>0,#Shipped,True,"-")
This will replace the blank values with a dash to match the others.

groupby functions to get subsequent value

In my data I have stock volumes for order sequence and times, I need to go through each part of the order and find when it ends, by grabbing the next part of the chains time.
I am just starting in python and I would do this by subsetting each stock into its own pool, then adding then do another loop to find the time of the next order for that sequence. Ultimately, in R/Matlab you could go X$time[1:end-1] <- X$time[2:end,]
My question: can I use the df.groupby['sequence'].{for each entry get the time from the subsequent entry}???
I think last() would give me the last value of that entire sequence, I would like the time of that the next sequence starts/ appears
I have a set of type:
sequence time
a 1
b 1
a 3
a 5
b 2
I would like
sequence time nexttime
a 1 3
b 1 2
a 3 5
a 5 999
b 2 999
In [24]: df
Out[24]:
sequence time
0 a 1
1 b 1
2 a 3
3 a 5
4 b 2
In [25]: df['nexttime'] = df.groupby('sequence').time.shift(-1).fillna(999)
In [26]: df
Out[26]:
sequence time nexttime
0 a 1 3
1 b 1 2
2 a 3 5
3 a 5 999
4 b 2 999