Delete line char using sed when not matching a pattern - sed

After my grep, I generated a file which has data as key and count but on separate lines:
$Value, "Some", $233kS
:2343
$AnotherCount, JunkValue
:38585
YetAnother, Nothing
:38484
I want a file like:
$Value, "Some", $233kS:2343
$AnotherCount, JunkValue:38585
YetAnother, Nothing:38484
The count pattern is fixed, it is always of form :[0-9]*
Is it possible using sed or any single line command?
I looked at replace line but I want only when the count pattern is not matched.
I am interested in solution that can work for extended problem:
$Value, "Some", $233kS
$AnotherCount, JunkValue
:38585
YetAnother, Nothing
:38484
Should output:
$Value, "Some", $233kS$AnotherCount, JunkValue:38585
YetAnother, Nothing:38484
Basically, all the lines not matching the pattern should not have end line char.

Any of these might help you out:
$ awk '!(FNR%2){ print b $0 }{b=$0}' file
$ paste -sd "\0\n" file
Both of these lines assume that the odd lines need to be concatenated with the even lines
note: according to POSIX paste "\0" is considered an empty string, not the <null>-character

Related

Replacing all occurrence after nth occurrence in a line in perl

I need to replace all occurrences of a string after nth occurrence in every line of a Unix file.
My file data:
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
My output data:
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
tried using sed: sed 's/://3g' test.txt
Unfortunately, the g option with the occurrence is not working as expected. instead, it is replacing all the occurrences.
Another approach using awk
awk -v c=':' -v n=2 'BEGIN{
FS=OFS=""
}
{
j=0;
for(i=0; ++i<=NF;)
if($i==c && j++>=n)$i=""
}1' file
$ cat file
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
$ awk -v c=':' -v n=2 'BEGIN{FS=OFS=""}{j=0;for(i=0; ++i<=NF;)if($i==c && j++>=n)$i=""}1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
With GNU awk, using gensub please try following. This is completely based on your shown samples, where OP wants to remove : from 3rd occurrence onwards. Using gensub to segregate parts of matched values and removing all colons from 2nd part(from 3rd colon onwards) in it as per OP's requirement.
awk -v regex="^([^:]*:)([^:]*:)(.*)" '
{
firstPart=restPart=""
firstPart=gensub(regex, "\\1 \\2", "1", $0)
restPart=gensub(regex,"\\3","1",$0)
gsub(/:/,"",restPart)
print firstPart restPart
}
' Input_file
I have inferred based on the limited data you've given us, so it's possible this won't work. But I wouldn't use regex for this job. What you have there is colon delimited fields.
So I'd approach it using split to extract the data, and then some form of string formatting to reassemble exactly what you like:
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
chomp;
my ( undef, $first, #rest ) = split /:/;
print ":$first:", join ( "", #rest ),"\n";
}
__DATA__
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
This gives you the desired result, whilst IMO being considerably clearer for the next reader than a complicated regex.
You can use the perl solution like
perl -pe 's~^(?:[^:]*:){2}(*SKIP)(?!)|:~~g if /^:account_id:/' test.txt
See the online demo and the regex demo.
The ^(?:[^:]*:){2}(*SKIP)(?!)|: regex means:
^(?:[^:]*:){2}(*SKIP)(?!) - match
^ - start of string (here, a line)
(?:[^:]*:){2} - two occurrences of any zero or more chars other than a : and then a : char
(*SKIP)(?!) - skip the match and go on to search for the next match from the failure position
| - or
: - match a : char.
And only run the replacement if the current line starts with :account_id: (see if /^:account_id:/').
Or an awk solution like
awk 'BEGIN{OFS=FS=":"} /^:account_id:/ {result="";for (i=1; i<=NF; ++i) { result = result (i > 2 ? $i : $i OFS)}; print result}' test.txt
See this online demo. Details:
BEGIN{OFS=FS=":"} - sets the input/output field separator to :
/^:account_id:/ - line must start with :account_id:
result="" - sets result variable to an empty string
for (i=1; i<=NF; ++i) { result = result (i > 2 ? $i : $i OFS)}; print result} - iterates over the fields and if the field number is greater than 2, just append the current field value to result, else, append the value + output field separator; then print the result.
I would use GNU AWK following way if n fixed and equal 2 following way, let file.txt content be
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
then
awk 'BEGIN{FS=":";OFS=""}{$2=FS $2 FS;print}' file.txt
output
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
Explanation: use : as field separator and nothing as output field separator, this itself does remove all : so I add : which have to be preserved: 1st (before second column) and 2nd (after second column). Beware that I tested it solely for this data, so if you would want to use it you should firstly test it with more possible inputs.
(tested in gawk 4.2.1)
This might work for you (GNU sed):
sed 's/:/\n/3;h;s/://g;H;g;s/\n.*\n//' file
Replace the third occurrence of : by a newline.
Make a copy of the line.
Delete all occurrences of :'s.
Append the amended line to the copy.
Join the two lines by removing everything from third occurrence of the copy to the third occurrence of the amended line.
N.B. The use of the newline is the best delimiter to use in the case of sed, as the line presented to seds commands are initially devoid of newlines. However the important property of the delimiter is that it is unique and therefore can be any such character as long as it is not found anywhere in the data set.
An alternative solution uses a loop to remove all :'s after the first two:
sed -E ':a;s/^(([^:]*:){2}[^:]*):/\1/;ta' file
With GNU awk for the 3rd arg to match() and gensub():
$ awk 'match($0,/(:[^:]+:)(.*)/,a){ $0=a[1] gensub(/:/,"","g",a[2]) } 1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
and with any awk in any shell on every Unix box:
$ awk 'match($0,/:[^:]+:/){ tgt=substr($0,1+RLENGTH); gsub(/:/,"",tgt); $0=substr($0,1,RLENGTH) tgt } 1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus

Find and print two blocks of lines from file with sed in one pass

I am trying to come up with an sed command to find and print two blocks of variable number of lines from a text file that look like this:
...
INFO first block to match
id: "value"
...
last line of the first block
INFO next irrelevant block
id: "different value"
...
INFO second block to match
id: "value"
...
last line of the second block
...
I only have prior knowledge of the id value and the fact that each block starts with a line that has "INFO". I want to match each block from that first line and not include the first line of the next block in the output:
INFO first block to match
id: "value"
...
last line of the first block
INFO second block to match
id: "value"
...
last line of the second block
Ideally I would prefer to do it in a single pass, not have the file scanned multiple times from top to bottom. Currently I have this (it only matches the first block, and I need both):
sed -n -e "/INFO/{"'$!'"{N;/INFO.*id: \"value\"/{:l;p;n;/^[^\\[]/bl;}}}" file.log
EDIT
Linebreak between blocks is certainly nice, but entirely optional.
EDIT 2
Please note that INFO and id: "value" do not have to be in the beginning of the line, and all other words in my example are arbitrary and not known in advance. There can be any number of blocks (including 0) between and around the ones I need to match.
sed is powerful, terse, and dumb. awk is smarter!
awk '/^INFO/{f = /match/? 1: 0} f'
edit: I see you want a linebreak between each "block"; will update if I find a tighter way:
awk '/^INFO/{f = /match/? 1: 0; if(i++) $0 = RS $0} f'
/^INFO/{action}: Execute {action} only on lines beginning with "INFO"
variable = if ? then : else: Conditional Expression (ternary operator)
if(i++): The first time this is evaluated, i will be zero, thus the expression will be false. This prevents an extra line break at the first block.
$0 = RS $0: Prepend a Record Separator (newline) to $0 (entire record)
f If f is greater than zero, {print $0} is implied.
This might work for you (GNU sed):
sed -nE ':a;/^INFO/{N;/^id: "value"/M!D;:b;H;$!{n;/^INFO/!bb};x;s/^/x/;/^x{2}/{s/^x*.//p;q};x;ba}' file
This solution stores the required blocks in the hold space, prefixed by a counter. Once the required number of blocks are stored the counters are removed, the blocks printed and the process quit.
The solution (based only on the input provided) supposes that an id (if it exists) always follows the the INFO line.
Here is an alternative solution using a combination of sed and awk. It allows you to parse the input blockwise or recordwise. This approach relies on setting awk record separator (RS) to the empty string which makes awk read a full block in at a time.
So there are 2 steps:
Make the input record-parsable.
Process each record.
For your example this could something like this:
sed '1!s/^INFO/\n&/' infile | awk '/id: "value"/' RS= ORS='\n\n'
Output:
INFO first block to match
id: "value"
...
last line of the first block
INFO second block to match
id: "value"
...
last line of the second block
awk is good for this, and if you could set RS to a multi-character expression it would be ideal. (gnu awk allows this, but why bother with gnu awk when there is perl?)
perl -wnle 'BEGIN{$/="INFO"; undef $\} print "$/$_" if m/id: \"value\"/' input
Basically, this sets the record separator ($/) to the string "INFO" (so now each of your "records" is a "line" to perl). If the record matches the pattern id: "value", it is printed with "INFO" prepended to the start. (without -n, perl would retain the record separator the end of each record, which is not quite what you want). By omitting the "undef $\", you can get an extra newline between records. Some code golf could probably cut the length of this in half, but my perl is a bit rusty. Waiting for the shorter version in comments.
This may or may not be what you want depending on what your real data looks like:
$ awk '/INFO/{info=$0; f=0} /id: "value"/{print info; f=1} f' file
INFO first block to match
id: "value"
...
last line of the first block
INFO second block to match
id: "value"
...
last line of the second block
or if you want to do more with each block than just print it as you go then some variation of this is better:
$ awk '
/INFO/ { prt() }
{ block = block $0 ORS }
END { prt() }
function prt() {
if (block ~ /id: "value"/) {
printf "%s", block
}
block=""
}
' file
INFO first block to match
id: "value"
...
last line of the first block
INFO second block to match
id: "value"
...
last line of the second block
The above will behave the same using any awk in any shell on any UNIX box.

Using sed to remove embedded newlines

What is a sed script that will remove the "\n" character but only if it is inside "" characters (delimited string), not the \n that is actually at the end of the (virtual) line?
For example, I want to turn this file
"lalala","lalalslalsa"
"lalalala","lkjasjdf
asdfasfd"
"lalala","dasdf"
(line 2 has an embedded \n ) into this one
"lalala","lalalslalsa"
"lalalala","lkjasjdf \\n asdfasfd"
"lalala","dasdf"
(Line 2 and 3 are now joined, and the real line feed was replaced with the character string \\n (or any other easy to spot character string, I'm not picky))
I don't just want to remove every other newline as a previous question asked, nor do I want to remove ALL newlines, just those that are inside quotes. I'm not wedded to sed, if awk would work, that's fine too.
The file being operated on is too large to fit in memory all at once.
sed is an excellent tool for simple substitutions on a single line but for anything else you should use awk., e.g:
$ cat tst.awk
{
if (/"$/) {
print prev $0
prev = ""
}
else {
prev = prev $0 " \\\\n "
}
}
$ awk -f tst.awk file
"lalala","lalalslalsa"
"lalalala","lkjasjdf \\n asdfasfd"
"lalala","dasdf"
Below was my original answer but after seeing #NeronLeVelu's approach of just testing for a quote at the end of the line I realized I was doing this in a much too complicated way. You could just replace gsub(/"/,"&") % 2 below with /"$/ and it'd work the same but the above code is a simpler implementation of the same functionality and will now handle embedded escaped double quotes as long as they aren't at the end of a line.
$ cat tst.awk
{ $0 = saved $0; saved="" }
gsub(/"/,"&") % 2 { saved = $0 " \\\\n "; next }
{ print }
$ awk -f tst.awk file
"lalala","lalalslalsa"
"lalalala","lkjasjdf \\n asdfasfd"
"lalala","dasdf"
The above only stores 1 output line in memory at a time. It just keeps building up an output line from input lines while the number of double quotes in that output line is an odd number, then prints the output line when it eventually contains an even number of double quotes.
It will fail if you can have double quotes inside your quoted strings escaped as \", not "", but you don't show that in your posted sample input so hopefully you don't have that situation. If you have that situation you need to write/use a real CSV parser.
sed -n ':load
/"$/ !{N
b load
}
:cycle
s/^\(\([^"]*"[^"]*"\)*\)\([^"]*"[^"]*\)\n/\1\3 \\\\n /
t cycle
p' YourFile
load the lines in working buffer until a close line (ending with ") is found or end reach
replace any \n that is after any couple of open/close " followed by a single " with any other caracter that " between from the start of file by the escapped version of new line (in fact replace starting string + \n by starting string and escaped new line)
if any substitution occur, retry another one (:cycle and t cycle)
print the result
continue until end of file
thanks to #Ed Morton for remark about escaped new line

Use sed to replace word in 2-line pattern

I try to use sed to replace a word in a 2-line pattern with another word. When in one line the pattern 'MACRO "something"' is found then in the next line replace 'BLOCK' with 'CORE'. The "something" is to be put into a reference and printed out as well.
My input data:
MACRO ABCD
CLASS BLOCK ;
SYMMETRY X Y ;
Desired outcome:
MACRO ABCD
CLASS CORE ;
SYMMETRY X Y ;
My attempt in sed so far:
sed 's/MACRO \([A-Za-z0-9]*\)/,/ CLASS BLOCK ;/MACRO \1\n CLASS CORE ;/g' input.txt
The above did not work giving message:
sed: -e expression #1, char 30: unknown option to `s'
What am I missing?
I'm open to one-liner solutions in perl as well.
Thanks,
Gert
Using a perl one-liner in slurp mode:
perl -0777 -pe 's/MACRO \w+\n CLASS \KBLOCK ;/CORE ;/g' input.txt
Or using a streaming example:
perl -pe '
s/^\s*\bCLASS \KBLOCK ;/CORE ;/ if $prev;
$prev = $_ =~ /^MACRO \w+$/
' input.txt
Explanation:
Switches:
-0777: Slurp files whole
-p: Creates a while(<>){...; print} loop for each line in your input file.
-e: Tells perl to execute the code on command line.
When in one line the pattern 'MACRO "something"' is found then in the
next line replace 'BLOCK' with 'CORE'.
sed works on lines of input. If you want to perform substitution on the next line of a specified pattern, then you need to add that to the pattern space before being able to do so.
The following might work for you:
sed '/MACRO/{N;s/\(CLASS \)BLOCK/\1CORE/;}' filename
Quoting from the documentation:
`N'
Add a newline to the pattern space, then append the next line of
input to the pattern space. If there is no more input then sed
exits without processing any more commands.
If you want to make use of address range as in your attempt, then you need:
sed '/MACRO/,/CLASS BLOCK/{s/\(CLASS\) BLOCK/\1 CORE/}' filename
I'm not sure why do you need a backreference for substituting the macro name.
You could try this awk command also,
awk '{print}/MACRO/ {getline; sub (/BLOCK/,"CORE");{print}}' file
It prints all the lines as it is and do the replacing action on seeing a word MACRO on a line.
Since getline has so many pitfall I try not to use it, so:
awk '/MACRO/ {a++} a==1 {sub(/BLOCK/,"CORE")}1' file
MACRO ABCD
CLASS CORE ;
SYMMETRY X Y ;
This could do it
#!awk -f
BEGIN {
RS = ";"
}
/MACRO/ {
sub("BLOCK", "CORE")
}
{
printf s++ ? ";" $0 : $0
}
"line" ends with ;
sub BLOCK for CORE in "lines" with MACRO
print ; followed by "line" unless first line

sed/awk/cut/grep - Best way to extract string

I have a results.txt file that is structured in this format:
Uncharted 3: Javithaxx l Rampant l Graveyard l Team Deathmatch HD (D1VpWBaxR8c)
Matt Darey feat. Kate Louise Smith - See The Sun (Toby Hedges Remix) (EQHdC_gGnA0)
The Matrix State (SXP06Oax70o)
Above & Beyond - Group Therapy Radio 014 (guest Lange) (2013-02-08) (8aOdRACuXiU)
I want to create a new file extracting the youtube URL ID specified in the last characters in each line line "8aOdRACuXiU"
I'm trying to build a URL like this in a new file:
http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
Note, I appended the &hd=1 to the string that I am trying to be replaced. I have tried using Linux reverse and cut but reverse or rev munges my data. The hard part here is that each line in my text file will have entries with parentheses and I only care about getting the data between the last set of parentheses. Each line has a variable length so that isn't helpful either. What about using grep and .$ for the end of the line?
In summary, I want to extract the youtube ID from results.txt and export it to a new file in the following format: http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
Using awk:
awk '{
v = substr( $NF, 2, length( $NF ) - 2 )
printf "%s%s%s\n", "http://www.youtube.com/watch?v=", v, "&hd=1"
}' infile
It yields:
http://www.youtube.com/watch?v=D1VpWBaxR8c&hd=1
http://www.youtube.com/watch?v=EQHdC_gGnA0&hd=1
http://www.youtube.com/watch?v=SXP06Oax70o&hd=1
http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
$ sed 's!.*(\(.*\))!http://www.youtube.com/watch?v=\1\&hd=1!' results.txt
http://www.youtube.com/watch?v=D1VpWBaxR8c&hd=1
http://www.youtube.com/watch?v=EQHdC_gGnA0&hd=1
http://www.youtube.com/watch?v=SXP06Oax70o&hd=1
http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
Here, .*(\(.*\)) looks for the last occurrence of a pair of parentheses, and captures the characters inside those parentheses. The captured group is then inserted into the URL using \1.
Using a perl one-liner :
perl -lne 'printf "http://www.youtube.com/watch?v=%s&hd=1\n", $& if /[^\(]+(?=\)$)/' file.txt
Or multi-line version :
perl -lne '
printf(
"http://www.youtube.com/watch?v=%s&hd=1\n",
$&
) if /[^\(]+(?=\)$)/
' file.txt