sh - remove text outside two strings - sed

I need help to filter a part of text from my original logs:
<variable>
<status type="String"><![CDATA[-1]]></status>
<errorCode type="String"><![CDATA[[bpm]]]></errorCode>
<mensagens type="MensagemSistema[]">
<item>
<msg_err type="String"><![CDATA[ERROR1-This is error: - THIS TEXT IS VARIABLE.]]</msg_err>
<msg_err_stack type="String"><![CDATA[stack_trace]]></msg_err_stack>
</item>
</mensagens>
</variable>
The part that I want is:
<msg_err type="String"><![CDATA[ERROR1-This is error: - THIS TEXT IS VARIABLE.]]>
... and this text is variable.
I tried to perform this with sed, but I wouldn't find a example to remove text outside two strings. Just another thing this is unix
thanks in advance
Tiago

You could try the below sed command,
$ echo '<msg_err type="String"><![CDATA[ERROR1-This is error 1.]]></msg_err>' | sed 's/.*\[\([^][]*\).*/\1/g'
ERROR1-This is error 1.

This looks like a job for an XML parser. The Perl module XML::Simple is capable of retrieving the data you want:
perl -MXML::Simple -e '$xml = XMLin(\*STDIN); print $xml->{'mensagens'}->{'item'}->{'msg_err'}->{'content'};' < error.xml
Output:
ERROR1-This is error: - THIS TEXT IS VARIABLE.
Note that I added a > to close the CDATA in the msg_err tag, as I assumed this to be a typo.

Related

Is there any way to encode Multiple columns in a csv using base64 in Shell?

I have a requirement to replace multiple columns of a csv file with its base64 encoding value which should be applied to some columns of the file but keep the first line unaffected as the first line contains the header of the file. I have tried out for 1 column as below but as I have given it to proceed after skipping the first line of the file it is not
gawk 'BEGIN { FS="|"; OFS="|" } NR >=2 { cmd="echo "$4" | base64 -w 0";cmd | getline x;close(cmd); print $1,$2,$3,x}' awktest
o/p:
12|A|B|Qw==
13|C|D|RQ==
36|Z|V|VQ==
Qs: It is not showing the header in the output. What should I do to make produce the header in the output? Also can I use any loop here to replace multiple columns?
input:
10|A|B|C|5|T|R
12|A|B|C|6|eee|ff
13|C|D|E|9|dr|xrdd
36|Z|V|U|7|xc|xd
Required output:
10|A|B|C|5|T|R
12|A|B|encodedvalue|6|encodedvalue|ff
13|C|D|encodedvalue|9|encodedvalue|xrdd
36|Z|V|encodedvalue|7|encodedvalue|xd
Is this possible? Have researched a lot but could not find a proper explanation. I am new to shell. Kindly help. Many thanks!!!!
It looks like you can just sequence conditionals. This may not be the best way of solving the header issue, but it's intuitive.
BEGIN { FS="|"; OFS="|" } NR ==1 {print} NR >=2 { cmd="echo "$4" | base64 -w 0";cmd | getline x;close(cmd); print $1,$2,$3,x}
As for using a loop to affect multiple columns... Loops in bash are hard. Awk is technically its own language, and may have a looping construct of it's own, IDK. But it's not clear you need a loop. If there's only a reasonable number of fields that need modifying, you can just parameterize the existing command (somehow) by the field index, and then pipe through however many instances of it. It won't be as performant as doing it all in a single pass of awk, but that's probably ok.

Error in Linkdatagen : Use of uninitiated value $chr in concatenation (.) or string

Hi I was trying to use linkdatagen, which is a perl based tool. It requires a vcf file (using mpileup from SAMtools) and a hapmap annotation file (provided). I have followed the instructions but the moment I use the perl script provided, I get this error.
The codes I used are:
samtools mpileup -d10000 -q13 -Q13 -gf hg19.fa -l annotHapMap2U.txt samplex.bam | bcftools view -cg -t0.5 - > samplex.HM.vcf
Perl vcf2linkdatagen.pl -variantCaller mpileup -annotfile annotHapMap2U.txt -pop CEU -mindepth 10 -missingness 0 samplex.HM.vcf > samplex.brlmm
Use of uninitiated value $chr in concatenation (.) or string at vcf2linkdatagentest.pl line 487, <IN> line 1.... it goes on and on.. I have mailed the authors, and haven't heard from them yet. Can anyone here please help me? What am I doing wrong?
The perl script is :
http://bioinf.wehi.edu.au/software/linkdatagen/vcf2linkdatagen.pl
The HapMap file can be downloaded from the website mentioned below.
http://bioinf.wehi.edu.au/software/linkdatagen/
Thanks so much
Ignoring lines starting with #, vcf2linkdatagen.pl expects the first field of the first line of the VCF to contain something of the form "chrsomething", and your file doesn't meet that expectation. Examples from a comment in the code:
chr1 888659 . T C 226 . DP=26;AF1=1;CI95=1,1;DP4=0,0,9,17;MQ=49;FQ=-81 GT:PL:GQ 1/1:234,78,0:99
chr1 990380 . C . 44.4 . DP=13;AF1=7.924e-09;CI95=1.5,0;DP4=3,10,0,0;MQ=49;FQ=-42 PL 0
The warning means that a variable used in a string is not initialized (undefined). It is an indication that something might be wrong. The line in question can be traced to this statement
my $chr = $1 if ($tmp[0] =~ /chr([\S]+)/);
It is bad practice to use postfix if statements on a my statement.
As ikegami notes a workaround for this might be
my ($chr) = $tmp[0] =~ /chr([\S])/;
But since the match failed, it will likely return the same error. The only way to solve is to know more about the purpose of this variable, if the error should be fatal or not. The author has not handled this case, so we do not know.
If you want to know more about the problem, you might add a debug line such as this:
warn "'chr' value not found in the string '$tmp[0]'" unless defined $chr;
Typically, an error like this occurs when someone gives input to a program that the author did not expect. So if you see which lines give this warning, you might find out what to do about it.

Replace text between brackets

This is the first time I've tried to do this from a Mac (I use repl.bat on Windows) and I'm struggling to figure out the correct syntax to do this.
Basically, I have in a file called defines.h that has stuff like this in it:
#define GAME_VERSION (2060)
#define TESTHOOK
#define PRERELEASE
#define pi 3.1415926536f
#define FX32_ONE (4096)
And I want to replace the text between the brackets or after the equals with a passed in environment variable (I also want to replace things like GAME_VERSION= in other files)
I've been trying to use sed, perl and awk but can't seem to get the syntax right. Could someone talk me through this please?
I have managed to do this so far:
echo "GAME_VERSION (2000)" | sed s/'([^)]*)'/'3000'/g
The first 2 replies haven't changed the defines.h file at all and I'm not sure if I'm doing something else wrong or because I didn't show more of the file's contents before. Not sure if relevant but the contents of defines.h get printed to the terminal console also.
in thios specific case
NewValue=713705
sed "/^[[:space:]]*GAME_VERSION/ s/[0-9]*/${NewValue}/" define.h
assuming:
there are more than 1 line in define.h that can have number assigned
there is only 1 uncommented for GAME_VERSION
That should work with both formats:
REPLACE="5000"
sed "s/\([(=]\)\([0-9]\+\)/\1$REPLACE/g" defines.h
Gives the output:
GAME_VERSION (5000)
GAME_VERSION=5000
It searches for a bracket or a equal sign ([(=]) followed by a digit ([0-9]\+) and replaces it by the contentss of the environment variable $REPLACE.

use identifying symbols to identify and edit line/string, then append line/string to previous line in file

Using standard linux utilities (sed and awk, I am guessing)
Sorry about the vague title, I don't really know how to describe the request much better. An easier way to do so is to provide a simple example. I have a file with the following content:
www.example.com
johnsmith#gmail.com
fredflintstone#gmail.com
bettyboop#gmail.com
www.example2.com
kylejohnson#gmail.com
www.example3.com
chadbrown#gmail.com
joshbeck#gmail.com
www.example4.com
tomtom#gmail.com
jeffjeffries#gmail.com
billnorman#gmail.com
stankubrick#gmail.com
andrewanders#gmail.com
So, what I want to do is convert the above to:
www.example.com,johnsmith#gmail.com,fredflintstone#gmail.com,bettyboop#gmail.com
www.example2.com,kylejohnson#gmail.com
www.example3.com,chadbrown#gmail.com,joshbeck#gmail.com,
www.example4.com,tomtom#gmail.com,jeffjeffries#gmail.com,billnorman#gmail.com,stankubrick#gmail.com,andrewanders#gmail.com
I am thinking that the easiest thing to do would be to execute something along the lines of: if the line contains an "#" symbol, input a comma at the beginning of the line/string and then append that line/string to the preceding line. Anyone have any ideas? It would be simpler, I think, if there were a uniform number of email addresses associated with each website, but this is not the case.
Thanks in advance!
A simple approach
awk '{s=/#/?",":"\n";printf s"%s",$0}' file
www.example.com,johnsmith#gmail.com,fredflintstone#gmail.com,bettyboop#gmail.com
www.example2.com,kylejohnson#gmail.com
www.example3.com,chadbrown#gmail.com,joshbeck#gmail.com
s=/#/?",":"\n" Does line contain # yes set s="," no set s="\n" (newline).
printf s"%s",$0 print $0 using s as format. If line has # print newline, then $0, if not print ,, then $0
Try this awk program:
/^[:space:]*www\./ {
if (f) {print line}
f=1; line=$0;
next
}
f {
line=(line "," $0)
}

How to parse multiple values to a file using Stream Editor

I have an XML file which requires 2 values to be passed dynamically. Can anyone please assist with my query.
#!/usr/bin/ksh
sed s/a/$1/b/$2/g FILE_PATH/FILE_A_INPUT.xml > FILE_PATH/FILE_A.xml
Used the above function in .sh script, but it error-ed out.
RUN_THIS.sh 1 2
sed: Function s/a/1/g/b/2/g cannot be parsed
Try:
sed "s/a/$1/g;s/b/$2/g" INPUT > OUTPUT
Instead.