Convert weekly dates to monthly week data using Stata - date

I have a %tw date format variable in Stata. I want to generate a monthly week variable. Like in the example below, the variable Date2 has 1999w14 now I want to generate 1999mayw1 in Stata. How can I approach this
Date2 date2
1999w14 1999mayw1
1999w14 1999mayw1
1999w14 1999mayw1
1999w14 1999mayw1
1999w14 1999mayw1
1999w14 1999mayw1
1999w14 1999mayw1
1999w14 1999mayw1
1999w15 1999mayw2
1999w15 1999mayw2
1999w15 1999mayw2
1999w15 1999mayw2
1999w15 1999mayw2
1999w15 1999mayw2
1999w15 1999mayw2

Stata has no concept of the first week in a given month. How could it? Weeks don't map neatly on to months, or even vice versa, unless you are talking about February in non-leap years which has exactly 4 weeks, but starting on different days in different years.
If you have some particular concept of a week, e.g. that it starts on Monday, then tell us what it is.
Stata week 1 always starts on 1 January, 2 on 8 January, and so forth, and week 52 always has either 8 or 9 days. If your data match those definitions, fine. Otherwise you may need to do some reading to work out what matches your problem.
If you issue the command below in Stata, you will get clickable links to .pdf versions.
. search week, sj
Search of official help files, FAQs, Examples, SJs, and STBs
SJ-12-4 dm0065_1 . . . . . Stata tip 111: More on working with weeks, erratum
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q4/12 SJ 12(4):765 (no commands)
lists previously omitted key reference
SJ-12-3 dm0065 . . . . . . . . . . Stata tip 111: More on working with weeks
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q3/12 SJ 12(3):565--569 (no commands)
discusses how to convert data presented in yearly and weekly
form to daily dates and how to aggregate such data to months
or longer intervals
SJ-10-4 dm0052 . . . . . . . . . . . . . . . . Stata tip 68: Week assumptions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q4/10 SJ 10(4):682--685 (no commands)
tip on Stata's solution for weeks and on how to set up
your own alternatives given different definitions of the
week
That said, if you're content to take your data as Stata weeks, you can convert to months and label as you wish. This script requires that you download labmask from the Stata Journal site (start with search labmask, sj).
clear
set obs 10
gen wdate = yw(1999, _n)
format wdate %tw
list
gen mdate = mofd(dofw(wdate))
format mdate %tm
bysort mdate (wdate) : gen week = _n
gen label = string(mdate, "%tm") + "w" + string(week)
clonevar wdate2 = wdate
* install from Stata Journal site after -search labmask-
labmask wdate2, values(label)
list, sepby(mdate)
+-----------------------------------------------+
| wdate mdate week label wdate2 |
|-----------------------------------------------|
1. | 1999w1 1999m1 1 1999m1w1 1999m1w1 |
2. | 1999w2 1999m1 2 1999m1w2 1999m1w2 |
3. | 1999w3 1999m1 3 1999m1w3 1999m1w3 |
4. | 1999w4 1999m1 4 1999m1w4 1999m1w4 |
5. | 1999w5 1999m1 5 1999m1w5 1999m1w5 |
|-----------------------------------------------|
6. | 1999w6 1999m2 1 1999m2w1 1999m2w1 |
7. | 1999w7 1999m2 2 1999m2w2 1999m2w2 |
8. | 1999w8 1999m2 3 1999m2w3 1999m2w3 |
9. | 1999w9 1999m2 4 1999m2w4 1999m2w4 |
|-----------------------------------------------|
10. | 1999w10 1999m3 1 1999m3w1 1999m3w1 |
+-----------------------------------------------+

Related

text processing to select date range

I have below input and I want to select lines with dates from now to 2 weeks or 3 weeks and so on.
0029L5 08/19/2017 00:57:33
0182L5 08/19/2017 05:53:57
0183L5 02/17/2018 00:00:16
0091L5 10/19/2022 00:00:04
0045L5 07/27/2017 09:03:56
0059L5 08/14/2017 00:51:50
0100L5 08/20/2017 01:25:39
0111L5 08/21/2017 00:46:15
0128L5 08/21/2017 12:38:51
D00054 07/21/2017 09:01:19
So the desired output if let say I want for 2 weeks from now
0045L5 07/27/2017 09:03:56
D00054 07/21/2017 09:01:19
But if i want for let say 4 weeks then the output should be
0045L5 07/27/2017 09:03:56
0059L5 08/14/2017 00:51:50
D00054 07/21/2017 09:01:19
One way:
awk '{split($2,a,"/");split($3,b,":"); x=mktime(a[3]" "a[1]" "a[2]" "b[1]" "b[2]" "b[3]);y=systime();}x>y && x<(y+(n*7*24*60*60))' n=2 file
where n indicates the number of weeks
split($2,a,"/") => Split the 2nd column on the basis of / and store in array a
split($3,b,":") => Split the 3rd column on the basis of : and store in array b
mktime => gives the time in seconds
x contains the time in file in seconds
y contains the current time in seconds
Here's one solution using bash where file is the name of your file:
while read r; do dd=$(($(date -d "${r:6}" +%s) - $(date +%s))); echo $(($dd/(3600*24))); done < file
This will compute the date difference in seconds between the date in ${r:6} (substring of the current row) and today's date $(date +%s) and convert it to days.
To output only lines where the date difference is less than 2 weeks (1209600 seconds)
while read r; do dd=$(($(date -d "${r:6}" +%s) - $(date +%s))); if [ "$dd" -lt 1209600 ]; then echo $r; fi; done < file
This works fine, Please let me know in case anybody has any other simpler solution for AIX.
awk '{split($2,a,"/");split($3,b,":"); print $1,b[3],b[2],b[1],a[2],a[1],a[3]}' /tmp/TLD_1 | head -10 | while read media sec min hour day mon year; do month=$((10#$mon-1)); expiry=$(perl -e 'use Time::Local; print timegm(#ARGV[0,1,2,3], $ARGV[4], $ARGV[5]), "\n";' $sec $min $hour $day $month $year); current=$(date +%s); twoweeks=$(($current + (2*7*24*60*60))); if [ "$expiry" -gt "$current" -a "$expiry" -lt "$twoweeks" ]; then echo "$media $mon/$day/$year $hour:$min:$sec"; fi; done

How can I put fit 4 items into a RDF triple?

I have this table, representing the refugee population for each country in each year.
| Country | 2000 | 2001 | 2002 | 2003 |
|---------|-------|-------|-------|-------|
| USA | 56213 | 67216 | 67233 | 12367 |
| Chile | 26137 | 12345 | 23345 | 21312 |
How can I make it clear in the RDF triple that it's the population for the year? I can not find any existing vocabulary to reuse. My idea is to coin my own URI and then the local name to be year2000population and then the statement will be:
dbo:USA :year2000population 56213 ;
:year2001population 67216 .
But I'm not happy with this solution, it seems like it is wrong to me.
You can use a pattern for n-ary relationships, which basically means you introduce an anonymous individual for each row in your table, and add each property to the individual rather than force that in a single triple format.
See https://www.w3.org/TR/swbp-n-aryRelations/ for details of the pattern.
Here is a concrete example of reifying the property (ala n-ary relationships) using your data. The basic technique is to create an object represnting the data that belongs together, then refer to that with an object property, :popyear in the example:
:py2000USA :population 56213 ;
:year 2000 .
:py2001USA :population 67216 ;
:year 2001 .
:py2002USA :population 67233 ;
:year 2002 .
:py2003USA :population 12367 ;
:year 2003 .
dbo:USA :popyear :py2000USA ;
:popyear :py2001USA ;
:popyear :py2002USA ;
:popyear :py2003USA ;
:py2000Chile :population 26137 ;
:year 2000 .
:py2001Chile :population 12345 ;
:year 2001 .
:py2002Chile :population 23345 ;
:year 2002 .
:py2003Chile :population 21312 ;
:year 2003 .
dbo:Chile :popyear :py2000Chile ;
:popyear :py2001Chile ;
:popyear :py2002Chile ;
:popyear :py2003Chile ;

Boolean function confusion

I'm trying to do the Kmap for F(A,B,C,D)= A’B’C’D’+AC’D’+B’CD’+A’BCD+BC’D . I'm getting a little confused because not all the variable groupings have the same number of variables. some have 4 and some have 3. is this equivalent to F(A,B,C,D) = F(0,2,4,5,7) ? I don't know if you have to do something extra if theres a variable missing. like in the 2nd grouping (AC'D') theres no B. so do we have to do something to compensate for the missing term or is this just 4.
I'm not sure where those numbers 0,4,2,5,7 are coming from, a Karnaugh map (assuming that's what you meant) simply specifies the truth output for given inputs.
If a term is missing, then it has no effect on the outcome so either of its two possible values will affect the truth output. So, in essence, the following two expressions are identical:
AC'D' <=> A(B)C'D' + A(B')C'D'
If more than one term is missing, then you simply allow for more than two possibilities (2n, where n is the number of missing terms). So A would also be:
ABCD + ABCD' + ABC'D + ABC'D' + AB'CD + AB'CD' + AB'C'D + AB'C'D'
(A matched against the 23 = 8 possibilities for the B, C and D variables).
Hence the map for your particular function:
A'B'C'D' + AC'D' + B'CD' + A'BCD+BC'D
would be, for term one A'B'C'D':
AB:00 01 10 11
CD:00 T . . .
01 . . . .
10 . . . .
11 . . . .
or'ed with term two AC'D', equivalent to ABC'D + AB'C'D:
AB:00 01 10 11
CD:00 . . T T
01 . . . .
10 . . . .
11 . . . .
or'ed with term three B'CD', which expands to AB'CD + A'B'CD:
AB:00 01 10 11
CD:00 . . . .
01 . . . .
10 T . T .
11 . . . .
and, finally, or'ed with term four BC'D, equal to ABC'D + A'BC'D:
AB:00 01 10 11
CD:00 . . . .
01 . T . T
10 . . . .
11 . . . .
Combining all these gives you:
AB:00 01 10 11
CD:00 T . T T
01 . T . T
10 T . T .
11 . . . .
A term with 3 variables just means that it covers 2 cells in the Karnaugh map.
AB 00 01 11 10
CD
00 XX YYYYYY
01
11
10
So XX is A'B'C'D' and YYYYYY is AC'D'.
The point in AC'D' is that the value of B is irrelevant, so it can be 1 or 0, so there are 2 cells.
Good luck
Where the variables are missing call the don't care condition. If you are making a truth table for the expression, you don't care if the variables can take both 0 or 1 as per reduce the expression. Or use X (cross) for them

date and time strings cannot be read correctly in SAS

I have the following sample data to read into SAS
2012-05-0317:36:00NYA
2012-05-0410:29:00SNW
2012-05-2418:45:00NYA
2012-05-2922:24:00NSL
2012-05-3107:26:00DEN
2012-05-2606:10:00PHX
2012-05-0202:30:00FTW
2012-05-0220:45:00HOB
2012-05-0103:01:00HGR
2012-05-0120:30:00RCH
2012-05-1112:00:00NAS
However, there is a strange problem bothering me.
Here is my first try.
data test;
informat DT yymmdd10.
TM $TIME8.
orig $3.
;
format DT yymmddd10.
TM TIME8.
orig $3.
;
input
#1 DT_temp
#11 TM_temp
#19 orig
;
datalines;
2012-05-0317:36:00NYA
2012-05-0410:29:00SNW
2012-05-2418:45:00NYA
2012-05-2922:24:00NSL
2012-05-3107:26:00DEN
2012-05-2606:10:00PHX
2012-05-0202:30:00FTW
2012-05-0220:45:00HOB
2012-05-0103:01:00HGR
2012-05-0120:30:00RCH
2012-05-1112:00:00NAS
run;
The result shows
DT TM orig
. . NYA
. . SNW
. . NYA
. . NSL
. . DEN
. . PHX
. . FTW
. . HOB
. . HGR
. . RCH
. . NAS
This means the date and time are not read correctly. A work around I have right now is to read everything as string first and then convert it to date and time respectively.
data test;
informat DT_temp $10.
TM_temp $8.
orig $3.
;
format DT yymmddd10.
TM TIME8.
orig $3.
;
input
#1 DT_temp
#11 TM_temp
#19 orig
;
DT=input(strip(DT_temp),yymmdd10.);
TM=input(strip(TM_temp),time8.);
drop DT_temp TM_temp;
datalines;
2012-05-0317:36:00NYA
2012-05-0410:29:00SNW
2012-05-2418:45:00NYA
2012-05-2922:24:00NSL
2012-05-3107:26:00DEN
2012-05-2606:10:00PHX
2012-05-0202:30:00FTW
2012-05-0220:45:00HOB
2012-05-0103:01:00HGR
2012-05-0120:30:00RCH
2012-05-1112:00:00NAS
run;
In this way, everything gets the correct format.
orig DT TM
NYA 2012-05-03 17:36:00
SNW 2012-05-04 10:29:00
NYA 2012-05-24 18:45:00
NSL 2012-05-29 22:24:00
DEN 2012-05-31 7:26:00
PHX 2012-05-26 6:10:00
FTW 2012-05-02 2:30:00
HOB 2012-05-02 20:45:00
HGR 2012-05-01 3:01:00
RCH 2012-05-01 20:30:00
NAS 2012-05-11 12:00:00
Basically, these two methods used the same informat. I was wondering why the first method does not work. Appreciate for any kind of help. Thank you very much.
Your "first try" code has a couple errors, but I'm guessing they were introduced while writing the question.
Because you are using column-oriented input, you need to specify the format to be used for each variable. Here is a corrected version:
data test;
informat DT yymmdd10.
TM TIME8.
orig $3.
;
format DT yymmddd10.
TM TIME8.
orig $3.
;
input
#1 DT yymmdd10.
#11 TM TIME8.
#19 orig $3.
;
datalines;
2012-05-0317:36:00NYA
2012-05-0410:29:00SNW
2012-05-2418:45:00NYA
2012-05-2922:24:00NSL
2012-05-3107:26:00DEN
2012-05-2606:10:00PHX
2012-05-0202:30:00FTW
2012-05-0220:45:00HOB
2012-05-0103:01:00HGR
2012-05-0120:30:00RCH
2012-05-1112:00:00NAS
run;

copy and replacing from file1 to file specific line

I have two files, file1.traj and file2.traj. Both these files contain identical data and the data are arranged in same format in them.
The first line of both files is a comment. At line 7843 of both files there is cartesian coordinate X, Y and Z. And at line 15685
there is another cartesian coordinate X, Y and Z. The number of lines in between two cartesian coordinates are 7841.
What I want to do is copy ONLY the X Y Z coordinates from file1.traj to file2.traj throughout the whole file.
I tried to use paste command but I cant do for specified line alone.
Here i showed the data format in the file. I used the number for clarity purpose.
line.1 trajectory generated by ptraj
line.2 5.844 4.178 7.821 6.423 4.054 8.578 6.606 4.907 6.827 7.557
line.3 4.385 6.722 6.877 6.384 7.283 5.950 6.884 7.565 7.668 6.282
line.2 8.474 7.721 7.127 8.928 7.628 7.205 6.259 8.589 6.712 6.110
line.3 7.712 8.602 6.643 8.151 8.654 7.495 6.940 7.183 4.871 6.108
line.4 7.887 4.864 7.755 7.814 3.754 8.697 7.267 3.724 7.081 7.633
line.5 2.478 6.246 8.089 2.604 8.026 8.853 3.943 6.623 5.754 4.529
.
.
.
. 1.516 41.749 54.260 0.108 41.176 54.536 -0.626 40.627 53.818 -0.303
. 41.920 42.179 3.556 3.251 41.623 3.530 2.472 42.558 2.678 3.304
. 44.723 1.496 5.937 44.339 1.355 6.803 44.866 0.614 5.593 52.401
line.7842 86.323 2.974 52.385 85.816 3.785 51.879 85.808 2.359
line.7843 104.140 159.533 88.303
line.7844 4.792 5.052 8.317 5.279 4.463 8.898 5.663 5.341 7.220 6.267
line.7845 4.438 7.137 6.477 6.566 7.627 5.857 7.407 7.936 7.301 6.170
. 8.741 7.647 7.020 9.023 7.315 7.107 6.475 8.171 6.435 6.413
. 7.823 8.416 6.704 8.208 8.473 7.582 6.560 7.126 5.141 5.816
.
.
.
. 52.050 7.905 42.026 38.561 1.747 39.847 39.375 2.235 39.972 38.634
. 1.382 38.965 0.810 0.477 39.394 0.717 -0.349 39.867 0.222 1.081
. 39.847 43.073 5.033 2.756 43.387 5.428 1.942 42.256 4.598 2.511
line.15683 47.302 4.261 7.071 47.801 4.632 7.799 47.256 4.968 6.428 54.279
line.15684 0.498 3.477 53.964 0.612 2.580 53.500 0.612 4.021
line.15685 104.140 159.533 88.303
line.15686 4.970 4.868 7.979 5.342 4.250 8.612 5.988 5.450 7.184 6.903
line 15687 4.861 7.246 6.381 6.921 7.550 5.526 7.597 7.536 6.953 7.009
.
.
Is there any possibility to use awk or sed?
Using AWK you can do it,i don't think its perfect answer ,but its tricky
sushanth#mymachine:~$ cat file1.txt
104.140 159.533 88.303
5.844 4.178 7.821 6.423 4.054 8.578 6.606 4.907 6.827 7.557
4.385 6.722 6.877 6.384 7.283 5.950 6.884 7.565 7.668 6.282
8.474 7.721 7.127 8.928 7.628 7.205 6.259 8.589 6.712 6.110
104.140 159.533 88.303
7.712 8.602 6.643 8.151 8.654 7.495 6.940 7.183 4.871 6.108
7.887 4.864 7.755 7.814 3.754 8.697 7.267 3.724 7.081 7.633
2.478 6.246 8.089 2.604 8.026 8.853 3.943 6.623 5.754 4.529
here i have used print only lines of 35 characters or longer using awk
sushanth#mymachine:~$ cat file1.txt | awk 'length < 35' > file2.txt
104.140 159.533 88.303
104.140 159.533 88.303