SED Multiple search items - sed

SED question
I need to print any lines that have contain 11 for November or 12 for December.
My two questions are:
How do I search for more than one item I.E. print lines with the value 11 and 12?
How do I tell the search to look in column 4 which has the dates?
What I have so far:
sed -n -e '/11/,/12/p' datebook
File datebook:
Steve Blenheim:238-923-7366:95 Latham Lane, Easton, PA 83755:11/12/56:20300
Betty Boop:245-836-8357:635 Cutesy Lane, Hollywood, CA 91464:6/23/23:14500
Igor Chevsky:385-375-8395:3567 Populus Place, Caldwell, NJ 23875:6/18/68:23400
Norma Corder:397-857-2735:74 Pine Street, Dearborn, MI 23874:3/28/45:245700
Jennifer Cowan:548-834-2348:583 Laurel Ave., Kingsville, TX 83745:10/1/35:58900
Jon DeLoach:408-253-3122:123 Park St., San Jose, CA 04086:7/25/53:85100
Karen Evich:284-758-2857:23 Edgecliff Place, Lincoln, NB 92086:7/25/53:85100
Karen Evich:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
Karen Evich:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
Fred Fardbarkle:674-843-1385:20 Parak Lane, DeLuth, MN 23850:4/12/23:780900
Fred Fardbarkle:674-843-1385:20 Parak Lane, DeLuth, MN 23850:4/12/23:780900
Lori Gortz:327-832-5728:3465 Mirlo Street, Peabody, MA 34756:10/2/65:35200
Paco Gutierrez:835-365-1284:454 Easy Street, Decatur, IL 75732:2/28/53:123500
Ephram Hardy:293-259-5395:235 CarltonLane, Joliet, IL 73858:8/12/20:56700
James Ikeda:834-938-8376:23445 Aster Ave., Allentown, NJ 83745:12/1/38:45000
Barbara Kertz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/1/46:268500
Lesley Kirstin:408-456-1234:4 Harvard Square, Boston, MA 02133:4/22/62:52600
William Kopf:846-836-2837:6937 Ware Road, Milton, PA 93756:9/21/46:43500
Sir Lancelot:837-835-8257:474 Camelot Boulevard, Bath, WY 28356:5/13/69:24500
Jesse Neal:408-233-8971:45 Rose Terrace, San Francisco, CA 92303:2/3/36:25000
Zippy Pinhead:834-823-8319:2356 Bizarro Ave., Farmount, IL 84357:1/1/67:89500
Arthur Putie:923-835-8745:23 Wimp Lane, Kensington, DL 38758:8/31/69:126000
Popeye Sailor:156-454-3322:945 Bluto Street, Anywhere, USA 29358:3/19/35:22350
Jose Santiago:385-898-8357:38 Fife Way, Abilene, TX 39673:1/5/58:95600
Tommy Savage:408-724-0140:1222 Oxbow Court, Sunnyvale, CA 94087:5/19/66:34200
Yukio Takeshida:387-827-1095:13 Uno Lane, Ashville, NC 23556:7/1/29:57000
Vinh Tranh:438-910-7449:8235 Maple Street, Wilmington, VM 29085:9/23/63:68900

How do I tell the search to look in column 4 which has the dates?
This is an indication that you should use awk because sed doesn't have the concept of fields. An awk solution would be
awk -v FS=":" '$4 ~ /^1[12]\/.*/{print}' datebook
Output
Steve Blenheim:238-923-7366:95 Latham Lane, Easton, PA 83755:11/12/56:20300
Karen Evich:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
Karen Evich:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
James Ikeda:834-938-8376:23445 Aster Ave., Allentown, NJ 83745:12/1/38:45000
Barbara Kertz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/1/46:268500
Deciphering the solution
FS=":" sets the the field/column delimiter to colon.
$4 represents the column four in your input file which is the date in the format mm/dd/yy
The ~ in $4 ~ /^1[12]\/.*/ means we do a regex match in which
^ represents the beginning of the string
[12] can match either one or two.
Since the regex part itself is delimited by / you need to escape any literal / as in \/

It appears that you want to select lines where the first characters after the third colon on the line are 11/ or 12/ (since the data formats appear to be pre-Y2K-style US-format dates with mm/dd/yy notation). So you write:
$ sed -n '/^\([^:]*:\)\{3\}1[12]\//p' datebook
Steve Blenheim:238-923-7366:95 Latham Lane, Easton, PA 83755:11/12/56:20300
Karen Evich:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
Karen Evich:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
James Ikeda:834-938-8376:23445 Aster Ave., Allentown, NJ 83745:12/1/38:45000
Barbara Kertz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/1/46:268500
$
The ^ matches at the start of a line; the \([^:]*]:\) part looks for a series of zero or more non-colons followed by a colon; the \{3\} requires 3 of them; the 1[12]\/ demands 11/ or 12/ after that; the p prints.
I observe that the initial statement says 'contain 11 for November or 12 for December', but your first numbered question says 'value 11 and 12'. These are contradictory; a given date field can only start with one or the other, not both. I've assumed that 'or' is what you intended.

Related

Add a label indicating duplicate names [duplicate]

This question already has answers here:
Add double quotation on duplicated name
(4 answers)
Closed 5 years ago.
I tried to use
sed 's/ */:/' file | awk -F: '{ if (arr[$1":"$2]) print "\""$1"\":"$2; else { arr[$1":"$2]++; print $0 }}'
but cannot get ideal output. Thanks.
The following is the file information and the desired output that I want.
Text File:
Jon DeLoach:408-253-3122:123 Park St., San Jose, CA 04086:7/25/53:85100
Karen Evich:284-758-2857:23 Edgecliff Place, Lincoln, NB 92086:7/25/53:85100
Karen Evich:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
Karen Evich:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
Fred Fardbarkle:674-843-1385:20 Parak Lane, DeLuth, MN 23850:4/12/23:780900
Fred Fardbarkle:674-843-1385:20 Parak Lane, DeLuth, MN 23850:4/12/23:780900
Lori Gortz:327-832-5728:3465 Mirlo Street, Peabody, MA 34756:10/2/65:35200
Paco Gutierrez:835-365-1284:454 Easy Street, Decatur, IL 75732:2/28/53:123500
Paco Gutierrez:835-365-1284:454 Easy Street, Decatur, IL 75732:2/28/53:123500
Jesse Neal:408-233-8971:45 Rose Terrace, San Francisco, CA 92303:2/3/36:25000
Jesse Neal:408-233-8971:45 Rose Terrace, San Francisco, CA 92303:2/3/36:25000
Zippy Pinhead:834-823-8319:2356 Bizarro Ave., Farmount, IL 84357:1/1/67:89500
Required output: Add stars indicating duplicated names
Jon DeLoach:408-253-3122:123 Park St., San Jose, CA 04086:7/25/53:85100
*Karen Evich*:284-758-2857:23 Edgecliff Place, Lincoln, NB 92086:7/25/53:85100
*Karen Evich*:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
*Karen Evich*:284-758-2867:23 Edgecliff Place, Lincoln, NB 92743:11/3/35:58200
*Fred Fardbarkle*:674-843-1385:20 Parak Lane, DeLuth, MN 23850:4/12/23:780900
*Fred Fardbarkle*:674-843-1385:20 Parak Lane, DeLuth, MN 23850:4/12/23:780900
Lori Gortz:327-832-5728:3465 Mirlo Street, Peabody, MA 34756:10/2/65:35200
*Paco Gutierrez*:835-365-1284:454 Easy Street, Decatur, IL 75732:2/28/53:123500
*Paco Gutierrez*:835-365-1284:454 Easy Street, Decatur, IL 75732:2/28/53:123500
*Jesse Neal*:408-233-8971:45 Rose Terrace, San Francisco, CA 92303:2/3/36:25000
*Jesse Neal*:408-233-8971:45 Rose Terrace, San Francisco, CA 92303:2/3/36:25000
Zippy Pinhead:834-823-8319:2356 Bizarro Ave., Farmount, IL 84357:1/1/67:89500
Give a test to this. Seems to work ok.
$ awk -F":" 'NR==FNR{a[$1]++;next}(a[$1]>1){sub($1,"*" $1 "*")}1' file1 file1
Explanation:
This code reads the same file twice. This maybe has a performance penalty depending on the filesize.
-F":" : Global Input Fields Delimiter is defined as :
NR==FNR{a[$1]++;next} : The code in { } is executed when NR==FNR = the first file is read by awk
a[$1]++ : Creates an array a with index $1 and value ++ => +1 for each $1 found. So for record 1 we have a[Jon DeLoach]=1. For Record2 a[Karen Evich]=1, for record 3 a[Karen Evich]++ => 2,etc
next : instructs awk to go to the next record and skip the rest script.
(a[$1]>1){sub($1,"*" $1 "*")}1 : This condition & action is performed on the second file. For each a[$1] found in second file with a value >1 (as has been finalized when the first file finished), we insert * around $1 using awk sub function. sub function applies substitution directly to $0 = Whole record.
1 : prints the whole record of the second file.

Mongodb find documents

I have a MongoDB instance which contains a translation of texts:
{
"_id" : ObjectId("57c68ba415f4d42b6ecd9ee7"),
"en" : "Adana (pronounced [aˈda.na]) is a major city in southern Turkey. The city is situated on the Seyhan river, 35 km (22 mi) inland from the Mediterranean Sea, in south-central Anatolia. It is the administrative seat of the Adana Province and has a population of 1.7 million,[1] making it the fifth most populous city in Turkey. Adana-Mersin polycentric metropolitan area, with a population of 3 million, stretches over 70 km (43 mi) east-west and 25 km (16 mi) north-south; encompassing the cities of Mersin, Tarsus and Adana.",
"sw" : "Adana (Kigiriki Άδανα) ni mji mkubwa katika nchi ya Uturuki. Kwa mujibu wa sensa iliyofanyika mwaka wa 2000, mji una wakazi wapatao 1,130,710 waishio huko,[2] na kuufanya kuwa mmoja kati ya miji mitano mikubwa ya Uturuku (baada ya Istanbul, Ankara, İzmir na Bursa). Mwaka wa 2006 mji wa Adana umekadiriwa kufikia iadadi ya wakazi wapatao 1,271,894. Huu ndiyo mji mkuu wa Mkoa wa Adana."
}
{
"_id" : ObjectId("57c68ba915f4d42b6ecd9eea"),
"en" : "Addis Ababa or Addis Abeba (the spelling used by the official Ethiopian Mapping Authority),(Amharic: አዲስ አበባ? Addis Abäba IPA: [adˈdis ˈabəba] ( listen), \"new flower\"; Oromo: Finfinne,[3][4] [fɪnˈfɪ́n.nɛ́] \"Natural Spring(s)\"), is the capital and largest city of Ethiopia. Finfinne is its Oromo name. It has a population of 3,384,569 according to the 2007 population census, with annual growth rate of 3.8%. This number has been increased from the originally published 2,738,248 figure and appears to be still largely underestimated.[2][5]",
"sw" : "Addis Ababa (pia Addis Abeba; kwa Kiamhara አዲስ አበባ, \"Ua Jipya\"; kwa Kioromo Finfinne) ni mji mkuu wa Ethiopia na wa Umoja wa Afrika."
}
{
"_id" : ObjectId("57c68bab15f4d42b6ecd9eec"),
"en" : "Adelaide of Italy (931 – 16 December 999), also called Adelaide of Burgundy, was the second wife of Holy Roman Emperor Otto the Great[2] and was crowned as the Holy Roman Empress with him by Pope John XII in Rome on February 2, 962. Empress Adelaide was perhaps the most prominent European woman of the 10th century; she was regent of the Holy Roman Empire as the guardian of her grandson in 991-995.[2]",
"sw" : "Adelaide wa Italia (takriban 931 – 16 Desemba, 999) alikuwa binti wa Rudolf II, mfalme wa Burgundia. Kwanza aliolewa na Lothar, mfalme wa Italia. Alipofariki Lothar, Adelaide aliolewa na Otto I, mfalme wa Ujerumani. Aliishi maisha matakatifu. Sikukuu yake ni 16 Desemba."
}
What I would like to do is to select one specific record. For example I expect to select the last record by doing this:
db.wiki.find({"sw": "Adelaide wa Italia"}).pretty();
But the mongo shell returns nothing.
Indeed, I know that I can create an index and do something like:
db.wiki.find({$text: {$search: "\"Adelaide wa Italia\""}}).pretty();
which indeed returns the record as expected.
What am I doing wrong in the non-index searching please?
In this case you should use search with regex:
db.wiki.find({"sw": /Adelaide wa Italia/}).pretty();
The way you are doing it by:
db.wiki.find({"sw": "Adelaide wa Italia"}).pretty();
you simply tell Mongo to return you all documents where sw is equal to Adelaide wa Italia but you want to get all documents which contains this phrase in sw field instead.

SAS: Unable to read dates in the program. How to fix it?

Here's my code. I am unable to read the dates from the input, it keeps giving me incorrect format, I tried changing a few times to mmddyy10. mmddyy8. and others but it still does not read them in correctly.
data master_patients;
infile datalines;
input account_number name $8-16 address $17-34 date MMDDYYYY10. gender $1.
insurance_code $49-51 updated_date mmddyyyy10.;
datalines;
620135 Smith 234 Aspen St. 12-21-1975 m CBC 02-16-1998
645722 Miyamoto 65 3rd Ave. 04-03-1936 f MCR 05-30-1999
645739 Jensvold 505 Glendale Ave. 06-15-1960 f HLT 09-23-1993
874329 Kazoyan 76-C La Vista . . MCD 01-15-2003
;
proc print data=master_patients;
run;
Could you please point out where I am going wrong? Thanks for any help.
I recommend a specific informat, rather than anydtdte though it helps you get started. It will ensure that your data is correct.
data master_patients;
infile datalines;
informat date updated_date mmddyy10.;
format date updated_date date9.;
input account_number name $ 8-16 address $ 17-34 date gender $1.
insurance_code $ 49-51 updated_date;
datalines;
620135 Smith 234 Aspen St. 12-21-1975 m CBC 02-16-1998
645722 Miyamoto 65 3rd Ave. 04-03-1936 f MCR 05-30-1999
645739 Jensvold 505 Glendale Ave. 06-15-1960 f HLT 09-23-1993
874329 Kazoyan 76-C La Vista . . MCD 01-15-2003
;
run;
There are two main problems. First the informat name does not have 4 Y's in it. Just 2. Second you don't have the column pointer in the right place when you are trying to read 10 characters as a date so that you are getting a blank and then the first 9 characters of the date. SAS cannot represents dates in the second or third century AD. Try MDY(12,21,197) and see what happens.
data master_patients;
infile datalines firstobs=2;
input account_number name $8-16 address $17-34 #36 date MMDDYY10.
gender $1. insurance_code $49-51 #53 updated_date mmddyy10.
;
datalines;
----+----1----+----2----+----3----+----4----+----5----+----6----+
620135 Smith 234 Aspen St. 12-21-1975 m CBC 02-16-1998
645722 Miyamoto 65 3rd Ave. 04-03-1936 f MCR 05-30-1999
645739 Jensvold 505 Glendale Ave. 06-15-1960 f HLT 09-23-1993
874329 Kazoyan 76-C La Vista . . MCD 01-15-2003
;
proc print data=master_patients;
run;
For modified list input for this problem.Just add ":" between variable name and informat.
data master_patients;
infile datalines;
input account_number name $8-16 address $17-34 date : mmddyy10. gender $1.
insurance_code $49-51 updated_date : mmddyy10.;
datalines;
620135 Smith 234 Aspen St. 12-21-1975 m CBC 02-16-1998
645722 Miyamoto 65 3rd Ave. 04-03-1936 f MCR 05-30-1999
645739 Jensvold 505 Glendale Ave. 06-15-1960 f HLT 09-23-1993
874329 Kazoyan 76-C La Vista . . MCD 01-15-2003
;
proc print data=master_patients;
run;
Please note if you don't add ":" , just change mmddyy10. to anydtdte. , the data read into dataset may Not correct.

subtraction between two columns with awk or sed

I have some text files. I need to do the subtraction between second and fourth columns in each file. The subtracted values should print to the original files as fifth column. How can I do this with awk or sed?
HII 62.0 HII 35.1
MEE 21.3 MEE 21.3
GLL 42.3 GLL 18.5
ASS 105.9 ASS 105.9
RRG 65.6
GLL 48.3
SES 83.5
Desired output
HII 62.0 HII 35.1 26.9
MEE 21.3 MEE 21.3 0
GLL 42.3 GLL 18.5 23.8
ASS 105.9 ASS 105.9 0
RRG 65.6
GLL 48.3
SES 83.5
If the third and fourth columns are blank, no need to subtract.
awk 'NF == 2 { print }
NF == 4 { print $0, $2 - $4 }'
That could all be fitted onto one line, but it clearer what it is doing when it is spread over two lines.
If you want more control over the format, you can use printf() instead of just print.
After sanitizing trailing spaces in the data, it produces:
HII 62.0 HII 35.1 26.9
MEE 21.3 MEE 21.3 0
GLL 42.3 GLL 18.5 23.8
ASS 105.9 ASS 105.9 0
RRG 65.6
GLL 48.3
SES 83.5
This might work for you (GNU sed & Bash):
sed -ri '/^\S+\s+(\S+)\s+\S+\s+(\S+)/s//echo "&\t$(echo \1-\2|bc)"/e' file

Insert missing comma (perl)

Line:
výkonná produkce: Martin Scorsese, Timothy Van Patten střih: Kate Sanford, Tim Streeto herec(s): Hong Eun Hee (Mi Geum (soţia lui Sang Ok)) Han Hee (Jang Mi Ryung) Kim Yoo Mi (Yoon Chae Yeon) Jae-ho Song (Im Bong Hoek (tatăl lui Sang Ok))
I want insert, with perl, missing comma like this:
výkonná produkce: Martin Scorsese, Timothy Van Patten střih: Kate Sanford, Tim Streeto herec(s): Hong Eun Hee (Mi Geum (soţia lui Sang Ok)), Han Hee (Jang Mi Ryung), Kim Yoo Mi (Yoon Chae Yeon), Jae-ho Song (Im Bong Hoek (tatăl lui Sang Ok))
Thanks.
Without any details on constraints, we have to guess what you need. I'm guessing this:
s/\) /), /g