replace spaces only in between quotation marks

replace spaces only in between quotation marks - sed

I have line from log file:
field 1234 "text in quotes" 1234 "other text in quotes"
I would like to replace spaces in between quotes, so I could than extract columns using space as delimiter. So the result could be something like
field 1234 "text#in#quotes" 1234 "other#text#in#quotes"
I was not able to find out working regex for sed myself.
Thanks a lot for help.
Martin

Pipe your log file through this awk command:
awk -F\" '{OFS="\"";for(i=2;i<NF;i+=2)gsub(/ /,"#",$i);print}'

Thanks for all answers.
This is perl one-liner I finally use:
perl -pe 's{("[^\"]+")}{($x=$1)=~s/ /#/g;$x}ge'
it results in required
field 1234 "text#in#quotes" 1234 "other#text#in#quotes"
.

Ruby(1.9+)
$ cat file
field 1234 "text in quotes" 1234 "other text in quotes"
$ ruby -ne 'print $_.gsub(/(\".*?\")/){|x| x.gsub!(/\s+/,"#") }' file
field 1234 "text#in#quotes" 1234 "other#text#in#quotes"

by using double quote as RS all even records are the ones inside double quotes. replace space in those even records. Since output records separator is newline by default ,change it to double quote.
awk 'BEGIN {RS="\"";ORS="\"" }{if (NR%2==0){gsub(/ /,"#",$0);print $0}else {p
rint $0}}' InputText.txt

If you decide to swap sed with more feature rich perl then here is one liner to get what you need:
line='field 1234 "text in quotes" 1234 "other text in quotes"'
echo $line | perl -pe 's#("[^"]*")#sub{$p=$1; $p =~ tr/ /#/; return $p}->()#eg'
Output: field 1234 "text#in#quotes" 1234 "other#text#in#quotes"

Related

How to exclude end of lines of textfiles via terminal?

Given a file ./wordslist.txt with <word> <number_of_apparitions> such as :
aš toto 39626
ir 35938
tai 33361
tu 28520
kad 26213
...
How to exclude the end-of-lines digits in order to collect in output.txt data such :
aš toto
ir
tai
tu
kad
...
Note :
Sed, find, cut or grep prefered. I cannot use something which keeps [a-z] things since my data can contain ascii letters, non-ascii letters, chinese characters, digits, etc.

I suggest:
cut -d " " -f 1 wordslist.txt > output.txt
Or :
sed -E 's/ [0-9]+$//' wordslist.txt > output.txt.

Use awk for print first word in this case.
awk '{print $1}' your_file > your_new_file

awk solution to simply print input line excluding last column
$ awk '{NF--; print}' wordslist.txt
aš toto
ir
tai
tu
kad
Note:
This will only work in some awks. Per POSIX incrementing NF adds a null field but decrementing NF is undefined behavior (thanks #EdMorton for the info)
This doesn't check if last column is numeric and field separation in output will be single space only
If there can be empty lines in input file, use awk 'NF{NF--}1'

The following works :
sed -r 's/ [0-9]+$//g' wordslist.txt

Can grep or sed show only words that match multiple search patterns in a line?

I am wondering, if one can print the matched strings as it is in each line... using grep or sed?
TestCase1: File1 contains below text
The Sun
Thunder The Rain They say
They say The dance
If I use this command:
egrep -o 'The|They' File1
The output I get is:
The
The
They
They
The
But, my expected output should be as below:
The
The They
They The
I am aware that, In grep the option -o, --only-matching prints only the matched non-empty) parts of a matching line, with each such part on a separate output line.
Edit: Please also suggest, if one wants to have a filter with exact word match with multiple match strings
i.e. <The> and <They> exact word match? Space separated words simply.
TestCase2: File2 contains below text
The Sun
Thunder The Rain They say
They say The dance
They're dancing with them in the dorm
The sun is shining the east and they scream.
Output is:
The
The They
They the
the
The the they
How to approach this?

With GNU awk for FPAT:
$ awk -v FPAT='\\<[Tt]hey?\\>' '{$1=$1}1' file
The
The They
They The
They the
The the they
Note that that can't NOT identify They when it appears in They're. If that's really an issue and you want to look for space-separated complete strings then this might be what you want:
$ awk '{c=0; for (i=1;i<=NF;i++) if ($i ~ /^[Tt]hey?$/) printf "%s%s", (c++?OFS:""), $i; print ""}' file
The
The They
They The
the
The the they
If not, let us know.
The above was run against this iteration of the OPs posted sample input:
$ cat file
The Sun
Thunder The Rain They say
They say The dance
They're dancing with them in the dorm
The sun is shining the east and they scream.

Best do it with Perl:
~$ perl -nE 'say /They? /g' File1
The
The They
They The
EDIT : Add new conditions. The regex still matches all but the lowercase the. Adding the i flag makes the match case-insensitive and matches all your test strings.
$ perl -nE 'say /They? /ig' File1
The
The They
They The
the
The the they
There is a little bit of a trick here: the match also picks up the space after the ? and prints it in the output. E.g. the first line of output is realy: "The_\n" - where "_" = space character. This may or may not be acceptable. One way to remove the spaces and reassemble the string would be:
$ perl -nE 'say join " ", map {substr $_,0,-1} /They? /ig' File1
As to your question about matching full words <The> and <They>, as you put it, the ? in They? indicates that the 'y' is optional. I.e. matches 0 or 1 times. Therefore the pattern is considering 'The' and 'They' as full words, one or the other, followed by a space. You could rewrite the pattern as:
$ perl -nE 'say /(?:They|The) /ig' File1
And effect the same output.
Now that you are considering lowercase the you may run into more edge case "gotchas" like words that end in "the". "loathe" and "tythe" come to mind.
$ echo "I'm loathe to cringe and tythe socks" >> File1
$ perl -nE 'say /They? /ig' File1
The
The They
They The
the
The the they
the the <--- not wanted!
You can then add the \b test in to match on word boundaries (as in zdim's answer):
$ perl -nE 'say /\bThey? /ig' File1
The
The They
They The
the
The the they
<-- But you get this empty line where no match occurs
So to refine further, you could only print if the line matches. Like this:
$ perl -nE 'say /\bThey? /ig if /\bThey? /i' File1
The
The They
They The
the
The the they
Then, I'm sure, you can find more edge cases that will blow it all up and force further refinement.

Things are not fully specified so here are a couple of possibilities
To catch all words starting with The, and print them with a space in between
perl -wnE'say join " ", /\bThe\w*/g' file
where \b is a word-boundary, a zero-width anchor, and \w is a word character. Using \S (a non-space character) is yet more permissive.
For only The or They can instead use
perl -wnE'say join " ", /\bThey?\b/g' file
where y? makes y optional.
To allow the as well use [tT] instead of T in the pattern, or /i for either case for all chars.
It's been clarified in coments that punctuation after The|They isn't allowed, and that low case t is. Then we need to constrain the match by space, not word boundary, and use [tT] as mentioned
perl -wnE'say join " ", /\b([Tt]hey?)\s/g' file
Now the capturing parenthesis () are needed since \s does consume, unlike \b before.
This prints the desired output with the provided input.

awk to the rescue!
$ awk -v p="They?" '$0~p{for(i=1;i<=NF;i++) if($i~p) printf "%s",$i OFS; print ""}' file
The
The They
They The

try one more awk:
awk '{while(match($0,/The|They/)){string=substr($0,RSTART,RLENGTH);VAL=VAL?VAL OFS string:string;$0=substr($0,RSTART+RLENGTH+1);};print VAL;VAL=""}' Input_file
NON-ONE line form of solution as follows too.
awk '{
while(match($0,/The|They/)){
string=substr($0,RSTART,RLENGTH);
VAL=VAL?VAL OFS string:string;
$0=substr($0,RSTART+RLENGTH+1);
};
print VAL;
VAL=""
}
' Input_file
Will add the explanation shortly for same.

Matching line subfield deliminated with square brackets

I have a file that contains lines that contains fields delimited with square brackets, for example :
[tag "x"][severity "y"][id "z"][client 1]
I need to extract the data from the client field. But I am struggling with the best way to do this. Obviously its too advanced for the likes of cut.
I have been struggling to use sed (and I'm not even sure sed is the "best" or "most appropriate" tool), but sed regex like this doesn't seem to work :
sed 's/^.*\[client\(.*\)/\1/g'
I'm guessing the "most appropriate" tool is probably Perl with some sort of Perl module ?

In Perl, you can capture each bracket contents like so:
$ perl -lne 'print $1 while /(?<=\[)([^\]]+)(?=\])/g' file
tag "x"
severity "y"
id "z"
client 1
So then if you only want the client match you can do:
$ perl -lne 'for (/(?<=\[)([^\]]+)(?=\])/g) { print if /^client\b/ }' file
client 1
As pointed out in comments, /\[([^\]]+)\]/g is maybe a little more efficient.
$ perl -lne 'for (/\[([^\]]+)\]/g) { print if /^client\b/}' file
client 1

You don't show your expected output so it's a guess but based on what it looks like the script you posted is attempting to do - is this what you want?
$ sed 's/.*\[client *\([^]]*\).*/\1/g' file
1

I would use tr -d.
echo '[tag "x"][severity "y"][id "z"][client 1]' | tr -d '[]'
tag "x"severity "y"id "z"client 1

echo '[tag "x"][severity "y"][id "z"][client 1]' | awk -F'[][]+' '{print $5}'
client 1

Printing text between regexps

I tried the '/pat1/,/pat2/p', but I want to print only the text between the patterns, not the whole line. How do I do that?

A pattern range is for multiline patterns. This is how you'd do that:
sed -n '/pat1/,/pat2/{/pat1\|pat2/!p}' inputfile
-n - don't print by default
/pat1/,/pat2/ - within the two patterns inclusive
/pat1\|pat2/!p - print everything that's not one of the patterns
What you may be asking for is what's between two patterns on the same line. One of the other answers will do that.
Edit:
A couple of examples:
$ cat file1
aaaa bbbb cccc
123 start 456
this is what
I want
789 end 000
xxxx yyyy zzzz
$ sed -n '/start/,/end/{/start\|end/!p}' file1
this is what
I want
You can shorten it by telling sed to use the most recent pattern again (//):
$ sed -n '/.*start.*/,/^[0-9]\{3\} end 0*$/{//!p}' file1
this is what
I want
As you can see, I didn't have to duplicate the long, complicated regex in the second part of the command.

sed -r 's/pat1(.*)pat2/\1/g' somefile.txt

I don't know the kind of pattern you used, but i think it is also possible with regular expressions.
cat myfile | sed -r 's/^(.*)pat1(.*)pat2(.*)$/\2/g'

you can use awk.
$ cat file
other TEXT
pat1 text i want pat2
pat1 TEXT I
WANT
pat2
other text
$ awk -vRS="pat2" 'RT{gsub(/.*pat1/,"");print}' file
text i want
TEXT I
WANT
The solution works for patterns that span multiple lines

How do i print word after regex but not a similar word?

I want an awk or sed command to print the word after regexp.
I want to find the WORD after a WORD but not the WORD that looks similar.
The file looks like this:
somethingsomething
X-Windows-Icon=xournal
somethingsomething
Icon=xournal
somethingsomething
somethingsomething
I want "xournal" from the one that say "Icon=xournal". This is how far i have come until now. I have tried an AWK string too but it was also unsuccessful.
cat "${file}" | grep 'Icon=' | sed 's/.*Icon=//' >> /tmp/text.txt
But i get both so the text file gives two xournal which i don't want.

Use ^ to anchor the pattern at the beginning of the line. And you can even do the grepping directly within sed:
sed -n '/^Icon=/ { s/.*=//; p; }' "$file" >> /tmp/text.txt
You could also use awk, which I think reads a little better. Using = as the field separator, if field 1 is Icon then print field 2:
awk -F= '$1=="Icon" {print $2}' "$file" >> /tmp/text.txt

This might be useful even though Perl is not one of the tags.
In case if you are interested in Perl this small program will do the task for you:
#!/usr/bin/perl -w
while(<>)
{
if(/Icon\=/i)
{
print $';
}
}
This is the output:
C:\Documents and Settings\Administrator>io.pl new2.txt
xournal
xournal
explanation:
while (<>) takes the input data from the file given as an argument on the command line while executing.
(/Icon\=/i) is the regex used in the if condition.
$' will print the part of the line after the regex.

All you need is:
sed -n 's/^Icon=//p' file

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse