compare first 60 characters and delete the duplicate row

compare first 60 characters and delete the duplicate row - sed

How to use the sed to check the consecutive lines where there first 10 characters is the same? If same, the second row of the lines will be deleted.
Example:
Before
ABCDEF123456
123456ABCDEF
123456789012
123456789090
After
ABCDEF123456
123456ABCDEF
123456789012

This mihgt work for you (GNU sed):
sed 'N;P;/^\(.\{10\}\).*\n\1/d;D' file
Read two lines, print the first and then compare the first ten characters of the first line with the second line. If they are the same delete both lines otherwise delete the first.

Related

Add any number of whitespaces to file

I have a plain text file:
line1_text
line2_text
I need to add a number of whitespaces between the two lines.
Adding 10 whitespaces is easy.
But say I need to add 10000 whitespaces, how would I achieve that using sed?
P.S. This is for experimental purposes

There undoubtedly is a sed method to do this but, since sed does not have any natural understanding of arithmetic, it is not a natural choice for this problem. By contrast, awk understands arithmetic and can readily, for example, print an empty line n times for any integer value of n.
As an example, consider this input file:
$ cat infile
line1_text
line2_text
This code will add as many blank lines as you like before any line that contains the string line2_text:
$ awk -v n=5 '/line2_text/{for (i=1;i<=n;i++)print""} 1' infile
line1_text
line2_text
If you want 10,000 blank lines instead of 5, then replace n=5 with n=10000.
How it works
-v n=5
This defines an awk variable n with value 5.
/line2_text/{for (i=1;i<=n;i++)print""}
Every time that a line matches the regex line2_text, then a for loop is performed with prints an empty line n times.
1
This is awk's shorthand for print-the-line and it causes every line from input to be printed to the output.

This might work for you (GNU sed):
sed -r '/line1_text/{x;s/.*/ /;:a;ta;s/ /&\n/10000;tb;s/^[^\n]*/&&/;ta;:b;s/\n.*//;x;G}' file
This appends the hold space to the first line. The hold space is manipulated to hold the required number of spaces by a looping mechanism based on powers of 2. This may produce more than necessary and the remainder are chopped off using a linefeed as a delimiter.
To change spaces to newlines, use:
sed -r '/line1_text/{x;s/.*/ /;:a;ta;s/ /\n&/10000;tb;s/^[^\n]*/&&/;ta;:b;s/\n.*//;s/ /\n/g;x;G}' file
In essence the same can be achieved using this (however it is very slow for large numbers):
sed -r '/line1_text/{x;:a;/ {20}/bb;s/^/ /;ta;:b;x;G}' file

sed: Is there an option to substitute the Nth AND Mth occurrence of a match per line?

I believe I am asking about pattern flags.
I am familiar with the global pattern flag 'g'
sed 's/pattern/sub/g'
And I know I can substitute the Nth occurrence of a match by using a number.
sed 's/pattern/sub/2'
But suppose I wanted to substitute the Nth AND Mth match on a line.
Example:
Remove the 3rd and 5th word of the following string
Input: "one two three four five six"
Output: "one two four six"

This might work for you (GNU sed):
sed -r 's/\S+\s*//5;s///3' file
This removes the fifth and then the third non-spaced followed by possible spaced groups of characters.
N.B. The removal is reversed i.e. 5 then 3 so that the previous removal does not affect the next.

detect two consecutive lines matching a pattern with sed

I am looking for two consecutive lines matching a certain pattern, say containing word 'pat' using sed and have noticed that I am able to detect it sometimes with this command:
sed -n 'N; /.*pat.*\n.*pat.*/p'
but this command fails if the line numbers for the duplicates are not of the same parity and I assume it's because we're searching lines 1+2, 3+4, 5+6 etc.. if this is the case, what would be the correct way to do this?

Why does it need to be sed? May I suggest awk?
awk '{/pat/?f++:f=0} f==2' file
If pat is found, increment f with 1
If pat is not found, reset f to 0
If f==2 print the line.

This might work for you (GNU sed):
sed '$!N;/pattern.*\n.*pattern/p;D' file
This keeps 2 lines in the pattern space and prints both of them out if the regexp matches.

sed delete remaining characters in line except first 5

what would be sed command to delete all characters in line except first 5 leading ones, using sed?
I've tried going 'backwards' on this (reverted deleting) but it's not most elegant solution.

This might work for you (GNU sed):
echo '1234567890' | sed 's/.//6g'
12345
Or:
echo '1234567890' | cut -c-5
12345

Try this (takes 5 repetitions of 'any' character at the beginning of the line and save this in the first group, then take any number of repetition of any characters, and replace the matched string with the first group):
sed 's/^\(.\{5\}\).*/\1/'
Or the alternative suggested by mouviciel:
sed 's/^\(.....\).*/\1/'
(it is more readable as long as the number of first characters you want does not grow too large)

Use sed to delete a matched regexp and the line (or two) underneath it

OK I found this question:
How do I delete a matching line, the line above and the one below it, using sed?
and just spent the last hour trying to write something that will match a string and delete the line containing the string and the line beneath it (or a variant - delete 2 lines beneath it).
I feel I'm now typing random strings. Please somebody help me.

If I've understood that correctly, to delete match line and one line after
/matchstr/{N;d;}
Match line and two lines after
/matchstr/{N;N;d;}
N brings in the next line
d - deletes the resulting single line

you can use awk. eg search for the word "two" and skip 2 lines after it
$ cat file
one
two
three
four
five
six
seven
eight
$ awk -vnum=2 '/two/{for(i=0;i<=num;i++)getline}1' file
one
five
six
seven
eight

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

compare first 60 characters and delete the duplicate row - sed

How to use the sed to check the consecutive lines where there first 10 characters is the same? If same, the second row of the lines will be deleted. Example: Before ABCDEF123456 123456ABCDEF 123456789012 123456789090 After ABCDEF123456 123456ABCDEF 123456789012

This mihgt work for you (GNU sed): sed 'N;P;/^\(.\{10\}\).*\n\1/d;D' file Read two lines, print the first and then compare the first ten characters of the first line with the second line. If they are the same delete both lines otherwise delete the first.

Related

Add any number of whitespaces to file

sed: Is there an option to substitute the Nth AND Mth occurrence of a match per line?

detect two consecutive lines matching a pattern with sed

sed delete remaining characters in line except first 5

Use sed to delete a matched regexp and the line (or two) underneath it

Categories

Resources