Printing a line in a certain percentile of a larg text file

Printing a line in a certain percentile of a larg text file - sed

I am looking for a single line command to print a line in a certain percentile of a large text file. My preferred solution is something based on sed, wc -l, and/or head/tail as I already know how to do that with awk and wc -l. To make it more clear, if my file has 1K lines of text, I need to print for example the (95%*1K)th line of that file.

In bash:
head -`echo scale=0\;$(cat file|wc -l)\*95/100 | bc -l` file | tail -n 1

head -`wc -l file | awk '{print(int(0.95*$1))}'` file | tail -n 1

Related

How to extract text from file to file using sed or grep?

My example string is in txt file /www/meteo/last.txt:
a:3:{i:0;s:4:"6.13";i:1;s:5:"19.94";i:2;s:5:"22.13";}
I would like to get line by line 3 numbers from that file to a new file.
(those values is temperature so they are changing in time - every 10 minutes)
New file /www/meteo/new.txt: (line by line)
6.13
19.94
22.13

Try this awk method
awk -F'"' 'BEGIN{OFS="\n"} {print $2,$4,$6}' last.txt > new.txt
OutPut:
cat new.txt
6.13
19.94
22.13

Or, if you wanted to use sed or grep:
sed -r 's/([^"]*)("[^"]*")([^"]*)/\2\n/g;s/"//g' /www/meteo/last.txt
grep -Eo '"[^"]*"' /www/meteo/last.txt | sed 's/"//g'
If you want a specific value, lets say the second temperature in quotes you can use sed:
grep -Eo '"[^"]*"' /www/meteo/last.txt | sed -n '2p'

KSH: sed command to search and replace last occurrence of a string in a file

I have tried to search for last occurrence in a file using sed. In HP-UX tac option is not available.
For Ex: Below is the data in file,
A|2121212|666666666 | 2|01|2 |B|1111111111 |234234234 |00001148|
B|2014242|8888888888| 3|12|3 |B|22222222222 |45345345 |00001150|
C|4545456|4444444444| 4|31|4 |B|3333333333333 |4234234 |00001148|
I'm trying:
cat $filename | sed 's/00001148/00001147/g'
It is changing from 00001148 to 00001147 for both the occurrence of 00001148.
I have to search for |00001148| of last occurrence and replace with another number. Currently my sed command is changing both two instances of 00001148.

EDIT
To match the last line, use $
sed '$s/00001148/00001147/g' $filename
will give the output as
A|2121212|666666666 | 2|01|2 |B|1111111111 |234234234 |00001148|
B|2014242|8888888888| 3|12|3 |B|22222222222 |45345345 |00001150|
C|4545456|4444444444| 4|31|4 |B|3333333333333 |4234234 |00001147|
If the matching line is the last line in the file, use tail instead of cat
tail -1 $filename | sed 's/00001148/00001147/g'
The tail command selects the last(tail) lines form the file, here it is specified to take 1 line usint -1 option
if it is not the last line,
grep "00001148" $filename | tail -1 | sed 's/00001148/00001147/g'
The grep command finds all the occureences and tail selects the last line and sed makes the replacement.

Way with awk and sed
sed '1!G;h;$!d' file | awk '/00001148/&&!x{sub("00001148","00001147");x=1}1' | sed '1!G;h;$!d'
Can probs all be done in sed though

sed -n '1!H;1;h;${x;s/\(.*\|\)00001148\|/&OtherNumberHere\|/;p;}' YourFile
sed is trying to get the biggest content in search pattern, so by default form start to the last |00001148| is the biggest pattern available (if any)

Try this:
tac $filename | sed '0,/00001148/{s/00001148/00001147/}' | tac
tac inverts your file.
The sed command replaces the first occurrance.
Then use tac again to invert the result.

sed with filename from pipe

In a folder I have many files with several parameters in filenames, e.g (just with one parameter) file_a1.0.txt, file_a1.2.txt etc.
These are generated by a c++ code and I'd need to take the last one (in time) generated. I don't know a priori what will be the value of this parameter when the code is terminated. After that I need to copy the 2nd line of this last file.
To copy the 2nd line of the any file, I know that this sed command works:
sed -n 2p filename
I know also how to find the last generated file:
ls -rtl file_a*.txt | tail -1
Question:
how to combine these two operation? Certainly it is possible to pipe the 2nd operation to that sed operation but I dont know how to include filename from pipe as input to that sed command.

You can use this,
ls -rt1 file_a*.txt | tail -1 | xargs sed -n '2p'
(OR)
sed -n '2p' `ls -rt1 file_a*.txt | tail -1`
sed -n '2p' $(ls -rt1 file_a*.txt | tail -1)

Typically you can put a command in back ticks to put its output at a particular point in another command - so
sed -n 2p `ls -rt name*.txt | tail -1 `
Alternatively - and preferred, because it is easier to nest etc -
sed -n 2p $(ls -rt name*.txt | tail -1)

-r in ls is reverse order.
-r, --reverse
reverse order while sorting
But it is not good idea when used it with tail -1.
With below change (head -1 without r option in ls), performance will be better, that you needn't wait to list all files then pipe to tail command
sed -n 2p $(ls -t1 name*.txt | head -1 )

I was looking for a similar solution: taking the file names from a pipe of grep results to feed to sed. I've copied my answer here for the search & replace, but perhaps this example can help as it calls sed for each of the names found in the pipe:
this command to simply find all the files:
grep -i -l -r foo ./*
this one to exclude this_shell.sh (in case you put the command in a script called this_shell.sh), tee the output to the console to see what happened, and then use sed on each file name found to replace the text foo with bar:
grep -i -l -r --exclude "this_shell.sh" foo ./* | tee /dev/fd/2 | while read -r x; do sed -b -i 's/foo/bar/gi' "$x"; done
I chose this method, as I didn't like having all the timestamps changed for files not modified. Feeding the grep result allows only the files with target text to be looked at (thus likely may improve performance / speed as well)
be sure to backup your files & test before using. May not work in some environments for files with embedded spaces. (?)
fwiw - I had some problems using the tail method, it seems that the entire dataset was generated before calling tail on just the last item.

find file names by searching the last lines for pattern

I have to find all files in a large number of large ASCII files which contain a specific pattern. At the moment I'm doing that with
grep -l <pattern> <files>
and it's very slow.
But I know that the pattern appears in the last 10 lines, if it exists. Is there an elegant possibility to search only the last lines to speed up the search, e.g. with awk?

You can simply print the filename while processing
for f in $files; do
echo "---- File \"$f\" ------"
tail -n 10 "$f" | grep -l "$pattern"
# you can also save the stdout to $f...
done

to see only specific number of line of a file then command syntex is as follow.
tail [+ number] [-l] [-b] [-c] [-r] [-f] [-c number | -n number] [file]
Now you can use pipe to comand greap and cat to perform your specific work.
i.e.
tail -n 10 <fileName>|grep -l <pattern> <files>
Click here to know more.

Filter text based in a multiline match criteria

I have the following sed command. I need to execute the below command in single line
cat File | sed -n '
/NetworkName/ {
N
/\n.*ims3/ p
}' | sed -n 1p | awk -F"=" '{print $2}'
I need to execute the above command in single line. can anyone please help.
Assume that the contents of the File is
System.DomainName=shayam
System.Addresses=Fr6
System.Trusted=Yes
System.Infrastructure=No
System.NetworkName=AS
System.DomainName=ims5.com
System.DomainName=Ram
System.Addresses=Fr9
System.Trusted=Yes
System.Infrastructure=No
System.NetworkName=Peer
System.DomainName=ims7.com
System.DomainName=mani
System.Addresses=Hello
System.Trusted=Yes
System.Infrastructure=No
System.NetworkName=Peer
System.DomainName=ims3.com
And after executing the command you will get only peer as the output. Can anyone please help me out?

You can use a single nawk command. And you can lost the useless cat
nawk -F"=" '/NetworkName/{n=$2;getline;if($2~/ims3/){print n} }' file
You can use sed as well as proposed by others, but i prefer less regex and less clutter.
The above save the value of the network name to "n". Then, get the next line and check the 2nd field against "ims3". If matched, then print the value of "n".

Put that code in a separate .sh file, and run it as your single-line command.

cat File | sed -n '/NetworkName/ { N; /\n.*ims3/ p }' | sed -n 1p | awk -F"=" '{print $2}'

Assuming that you want the network name for the domain ims3, this command line works without sed:
grep -B 1 ims3 File | head -n 1 | awk -F"=" '{print $2}'

So, you want the network name where the domain name on the following line includes 'ims3', and not the one where the following line includes 'ims7' (even though the network names in the example are the same).
sed -n '/NetworkName/{N;/ims3/{s/.*NetworkName=\(.*\)\n.*/\1/p;};}' File
This avoids abuse of felines, too (not to mention reducing the number of commands executed).
Tested on MacOS X 10.6.4, but there's no reason to think it won't work elsewhere too.
However, empirical evidence shows that Solaris sed is different from MacOS sed. It can all be done in one sed command, but it needs three lines:
sed -n '/NetworkName/{N
/ims3/{s/.*NetworkName=\(.*\)\n.*/\1/p;}
}' File
Tested on Solaris 10.

You just need to put -e pretty much everywhere you'd break the command at a newline or have a semicolon. You don't need the extra call to sed or awk or cat.
sed -n -e '/NetworkName/ {' -e 'N' -e '/\n.*ims3/ s/[^\n]*=\(.*\).*/\1/P' -e '}' File

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Printing a line in a certain percentile of a larg text file - sed

In bash: head -`echo scale=0\;$(cat file|wc -l)\*95/100 | bc -l` file | tail -n 1

head -`wc -l file | awk '{print(int(0.95*$1))}'` file | tail -n 1

Related

How to extract text from file to file using sed or grep?

KSH: sed command to search and replace last occurrence of a string in a file

sed with filename from pipe

find file names by searching the last lines for pattern

Filter text based in a multiline match criteria

Categories

Resources