Why is sed returning more characters than requested - sed

In a part of my script I am trying to generate a list of the year and month that a file was submitted. Since the file contains the timestamp, I should be able to cut the filenames to the month position, and then do a sort+uniq filtering. However sed is generating an outlier for one of the files.
I am using this command sequence
ls -1 service*json | sed -e "s|\(.*201...\).*json$|\1|g" | sort |uniq
And this works for most of time except in some cases it outputs the whole timestamp:
$ ls
service-parent-20181119092630.json service-parent-20181123134132.json service-parent-20181202124532.json service-parent-20190121091830.json service-parent-20190125124209.json
service-parent-20181119101003.json service-parent-20181126104300.json service-parent-20181211095939.json service-parent-20190121092453.json service-parent-20190128163539.json
service-parent-20181120095850.json service-parent-20181127083441.json service-parent-20190107035508.json service-parent-20190122093608.json
service-parent-20181120104838.json service-parent-20181129155835.json service-parent-20190107042234.json service-parent-20190122115053.json
$ ls -1 service*json | sed -e "s|\(.*201...\).*json$|\1|g" | sort |uniq
service-parent-201811
service-parent-201811201048
service-parent-201812
service-parent-201901
I have also tried this variation but the second output line is still returned:
ls -1 service*json | sed -e "s|\(.*201.\{3\}\).*json$|\1|g" | sort |uniq
Can somebody explain why service-parent-201811201048 is returned past the requested 3 characters?
Thanks.

service-parent-201811201048 happens to have 201048 to match 201....
Might try ls -1 service*json | sed -e "s|\(.*-201...\).*json$|\1|g" | sort |uniq to ask for a dash - before 201....

It is not recommended to parse the output of ls. Please try instead:
for i in service*json; do
sed -e "s|^\(service-.*-201[0-9]\{3\}\).*json$|\1|g" <<< "$i"
done | sort | uniq

Your problem is explained at https://stackoverflow.com/a/54565973/1745001 (i.e. .* is greedy) but try this:
$ ls | sed -E 's/(-[0-9]{6}).*/\1/' | sort -u
service-parent-201811
service-parent-201812
service-parent-201901
The above requires a sed that supports EREs via -E, e.g. GNU sed and OSX/BSD sed.

Related

How to remove after second period in a string using sed

In my script, have a possible version number: 15.03.2 set to variable $STRING. These numbers always change. I want to strip it down to: 15.03 (or whatever it will be next time).
How do I remove everything after the second . using sed?
Something like:
$(echo "$STRING" | sed "s/\.^$\.//")
(I don't know what ^, $ and others do, but they look related, so I just guessed.)
I think the better tool here is cut
echo '15.03.2' | cut -d . -f -2
This might work for you (GNU sed):
sed 's/\.[^.]*//2g' file
Remove the second or more occurrence of a period followed by zero or non-period character(s).
$ echo '15.03.2' | sed 's/\([^.]*\.[^.]*\)\..*/\1/'
15.03
More generally to skip N periods:
$ echo '15.03.2.3.4.5' | sed -E 's/(([^.]*\.){2}[^.]*)\..*/\1/'
15.03.2
$ echo '15.03.2.3.4.5' | sed -E 's/(([^.]*\.){3}[^.]*)\..*/\1/'
15.03.2.3
$ echo '15.03.2.3.4.5' | sed -E 's/(([^.]*\.){4}[^.]*)\..*/\1/'
15.03.2.3.4

sed with filename from pipe

In a folder I have many files with several parameters in filenames, e.g (just with one parameter) file_a1.0.txt, file_a1.2.txt etc.
These are generated by a c++ code and I'd need to take the last one (in time) generated. I don't know a priori what will be the value of this parameter when the code is terminated. After that I need to copy the 2nd line of this last file.
To copy the 2nd line of the any file, I know that this sed command works:
sed -n 2p filename
I know also how to find the last generated file:
ls -rtl file_a*.txt | tail -1
Question:
how to combine these two operation? Certainly it is possible to pipe the 2nd operation to that sed operation but I dont know how to include filename from pipe as input to that sed command.
You can use this,
ls -rt1 file_a*.txt | tail -1 | xargs sed -n '2p'
(OR)
sed -n '2p' `ls -rt1 file_a*.txt | tail -1`
sed -n '2p' $(ls -rt1 file_a*.txt | tail -1)
Typically you can put a command in back ticks to put its output at a particular point in another command - so
sed -n 2p `ls -rt name*.txt | tail -1 `
Alternatively - and preferred, because it is easier to nest etc -
sed -n 2p $(ls -rt name*.txt | tail -1)
-r in ls is reverse order.
-r, --reverse
reverse order while sorting
But it is not good idea when used it with tail -1.
With below change (head -1 without r option in ls), performance will be better, that you needn't wait to list all files then pipe to tail command
sed -n 2p $(ls -t1 name*.txt | head -1 )
I was looking for a similar solution: taking the file names from a pipe of grep results to feed to sed. I've copied my answer here for the search & replace, but perhaps this example can help as it calls sed for each of the names found in the pipe:
this command to simply find all the files:
grep -i -l -r foo ./*
this one to exclude this_shell.sh (in case you put the command in a script called this_shell.sh), tee the output to the console to see what happened, and then use sed on each file name found to replace the text foo with bar:
grep -i -l -r --exclude "this_shell.sh" foo ./* | tee /dev/fd/2 | while read -r x; do sed -b -i 's/foo/bar/gi' "$x"; done
I chose this method, as I didn't like having all the timestamps changed for files not modified. Feeding the grep result allows only the files with target text to be looked at (thus likely may improve performance / speed as well)
be sure to backup your files & test before using. May not work in some environments for files with embedded spaces. (?)
fwiw - I had some problems using the tail method, it seems that the entire dataset was generated before calling tail on just the last item.

Change sed line separator to NUL to act as "xargs -0" prefilter?

I'm running a command line like this:
filename_listing_command | xargs -0 action_command
Where filename_listing_command uses null bytes to separate the files -- this is what xargs -0 wants to consume.
Problem is that I want to filter out some of the files. Something like this:
filename_listing_command | sed -e '/\.py/!d' | xargs ac
but I need to use xargs -0.
How do I change the line separator that sed wants from newline to NUL?
If you've hit this SO looking for an answer and are using GNU sed 4.2.2 or later, it now has a -z option which does what the OP is asking for.
Pipe it through grep:
filename_listing_command | grep -vzZ '\.py$' | filename_listing_command
The -z accepts null terminators on input and the -Z produces null terminators on output and the -v inverts the match (excludes).
Edit:
Try this if you prefer to use sed:
filename_listing_command | sed 's/[^\x0]*\.py\x0//g' | filename_listing_command
If none of your file names contain newline, then it may be easier to read a solution using GNU Parallel:
filename_listing_command | grep -v '\.py$' | parallel ac
Learn more about GNU Parallel http://www.youtube.com/watch?v=OpaiGYxkSuQ
With help of Tom Hale and that answer we have:
sed -nzE "s/^$PREFIX(.*)/\1/p"

Filter text based in a multiline match criteria

I have the following sed command. I need to execute the below command in single line
cat File | sed -n '
/NetworkName/ {
N
/\n.*ims3/ p
}' | sed -n 1p | awk -F"=" '{print $2}'
I need to execute the above command in single line. can anyone please help.
Assume that the contents of the File is
System.DomainName=shayam
System.Addresses=Fr6
System.Trusted=Yes
System.Infrastructure=No
System.NetworkName=AS
System.DomainName=ims5.com
System.DomainName=Ram
System.Addresses=Fr9
System.Trusted=Yes
System.Infrastructure=No
System.NetworkName=Peer
System.DomainName=ims7.com
System.DomainName=mani
System.Addresses=Hello
System.Trusted=Yes
System.Infrastructure=No
System.NetworkName=Peer
System.DomainName=ims3.com
And after executing the command you will get only peer as the output. Can anyone please help me out?
You can use a single nawk command. And you can lost the useless cat
nawk -F"=" '/NetworkName/{n=$2;getline;if($2~/ims3/){print n} }' file
You can use sed as well as proposed by others, but i prefer less regex and less clutter.
The above save the value of the network name to "n". Then, get the next line and check the 2nd field against "ims3". If matched, then print the value of "n".
Put that code in a separate .sh file, and run it as your single-line command.
cat File | sed -n '/NetworkName/ { N; /\n.*ims3/ p }' | sed -n 1p | awk -F"=" '{print $2}'
Assuming that you want the network name for the domain ims3, this command line works without sed:
grep -B 1 ims3 File | head -n 1 | awk -F"=" '{print $2}'
So, you want the network name where the domain name on the following line includes 'ims3', and not the one where the following line includes 'ims7' (even though the network names in the example are the same).
sed -n '/NetworkName/{N;/ims3/{s/.*NetworkName=\(.*\)\n.*/\1/p;};}' File
This avoids abuse of felines, too (not to mention reducing the number of commands executed).
Tested on MacOS X 10.6.4, but there's no reason to think it won't work elsewhere too.
However, empirical evidence shows that Solaris sed is different from MacOS sed. It can all be done in one sed command, but it needs three lines:
sed -n '/NetworkName/{N
/ims3/{s/.*NetworkName=\(.*\)\n.*/\1/p;}
}' File
Tested on Solaris 10.
You just need to put -e pretty much everywhere you'd break the command at a newline or have a semicolon. You don't need the extra call to sed or awk or cat.
sed -n -e '/NetworkName/ {' -e 'N' -e '/\n.*ims3/ s/[^\n]*=\(.*\).*/\1/P' -e '}' File

Extracting a string from a file name

My script takes a file name in the form R#TYPE.TXT (# is a number and TYPE is two or three characters).
I want my script to give me TYPE. What should I do to get it? Guess I need to use awk and sed.
I'm using /bin/sh (which is a requirement)
you can use awk
$ echo R1CcC.TXT | awk '{sub(/.*[0-9]/,"");sub(".TXT","")}{print}'
CcC
or
$ echo R1CcC.TXT | awk '{gsub(/.*[0-9]|\.TXT$/,"");print}'
CcC
and if sed is really what you want
$ echo R9XXX.TXT | sed 's/R[0-9]\(.*\)\.TXT/\1/'
XXX
I think this is what you are looking for.
$ echo R3cf.txt | sed "s/.[0-9]\(.*\)\..*/\1/"
cf
If txt is always upper case and the filename always starts with R you could do something like.
$ echo R3cf.txt | sed "s/R[0-9]\(.*\)\.TXT/\1/"
You can use just the shell (depending what shell your bin/sh is:
f=R9ABC.TXT
f="${f%.TXT}" # remove the extension
type="${f#R[0-9]}" # remove the first bit
echo "$type" # ==> ABC