Matlab execute UNIX command not working in newer versions - matlab

I have a script that uses this line of code:
system(['cat ' inputfile ' | tr -d ''\000'' | tr -d ''\015'' >& tempfile.txt']);
to go through a text file and delete some special characters and then put it into a temp file.
This line of code works in Matlab2012 but not in 2017 as it leads to this error:
tr: Illegal byte sequence
cat: stdout: Broken pipe
Does anyone know how to get around this issue? Thank you!

The encoded format may not be supported by tr, try changing the locale (refer to https://unix.stackexchange.com/questions/141420/tr-complains-of-illegal-byte-sequence):
system(['cat ' inputfile ' | LC_ALL="C" tr -d ''\000''''\015'' >& tempfile.txt']);

Related

Yet another example of sed needing some escape foo

I'm trying to use sed to edit a file within a makefile. When I edit a date, in the format xxxx-yy-zz it works fine. When I try to edit a version number with a format of x.y.z, it fails. I'm pretty certain this is because I need to escape the . in the version for the grep part of sed. I used this answer but it doesn't work, and I'm not good enough at this to figure it out (similar advice here). I can't give a working example due to use of external files, but here is the basic idea:
SHELL := /bin/bash # bash is needed for manipulation of version number
PKG_NAME=FuncMap
TODAY=$(shell date +%Y-%m-%d)
PKG_VERSION := $(shell grep -i '^version' $(PKG_NAME)/DESCRIPTION | cut -d ':' -f2 | cut -d ' ' -f2)
PKG_DATE := $(shell grep -i '^date' $(PKG_NAME)/DESCRIPTION | cut -d ':' -f2)
## Increment the z in of x.y.z
XYZ=$(subst ., , $(PKG_VERSION))
X=$(word 1, $(XYZ))
Y=$(word 2, $(XYZ))
Z=$(word 3, $(XYZ))
Z2=$$(($(Z)+1))
NEW_VERSION=$(addsuffix $(addprefix .,$(Z2)), $(addsuffix $(addprefix ., $(Y)), $(X)))
OLD_VERSION=$(echo "$(PKG_VERSION)" | sed -e 's/[]$.*[\^]/\\&/g' )
all: info update
info:
#echo "Package: " $(PKG_NAME)
#echo "Current/Pending version numbers: " $(PKG_VERSION) $(NEW_VERSION)
#echo "Old date: " $(PKG_DATE)
#echo "Today: " $(TODAY)
#echo "OLD_VERSION: " $(OLD_VERSION)
update: $(shell find $(PKG_NAME) -name "DESCRIPTION")
#echo "Editing DESCRIPTION to increment version"
$(shell sed 's/$(OLD_VERSION)/$(NEW_VERSION)/' $(PKG_NAME)/DESCRIPTION > $(PKG_NAME)/TEST)
#echo "Editing DESCRIPTION to update the date"
$(shell sed 's/$(PKG_DATE)/$(TODAY)/' $(PKG_NAME)/DESCRIPTION > $(PKG_NAME)/TEST)
And this gives as output:
Package: FuncMap
Current/Pending version numbers: 1.0.1000 1.0.1001
Current date: 2000-07-99
Today: 2015-07-11
OLD_VERSION:
sed: first RE may not be empty
Editing DESCRIPTION to increment version
Editing DESCRIPTION to update the date
Obviously the sed on the version number is not working (the date is handled fine, and current/pending versions are correct, and the date is properly changed in the external file). Besides this particular problem, I'm sure a lot of this code is suboptimal - don't laugh! I don't know make nor shell scripting very well...
OLD_VERSION is empty, because you omitted the makefile shell function:
OLD_VERSION=$(shell echo "$(PKG_VERSION)" | sed -e 's/[]$.*[\^]/\\&/g' )

Extracting the contents between two different strings using bash or perl

I have tried to scan through the other posts in stack overflow for this, but couldn't get my code work, hence I am posting a new question.
Below is the content of file temp.
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/<env:Body><dp:response xmlns:dp="http://www.datapower.com/schemas/management"><dp:timestamp>2015-01-
22T13:38:04Z</dp:timestamp><dp:file name="temporary://test.txt">XJzLXJlc3VsdHMtYWN0aW9uX18i</dp:file><dp:file name="temporary://test1.txt">lc3VsdHMtYWN0aW9uX18i</dp:file></dp:response></env:Body></env:Envelope>
This file contains the base64 encoded contents of two files names test.txt and test1.txt. I want to extract the base64 encoded content of each file to seperate files test.txt and text1.txt respectively.
To achieve this, I have to remove the xml tags around the base64 contents. I am trying below commands to achieve this. However, it is not working as expected.
sed -n '/test.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test.txt">##g'|perl -p -e 's#</dp:file>##g' > test.txt
sed -n '/test1.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test1.txt">##g'|perl -p -e 's#</dp:file></dp:response></env:Body></env:Envelope>##g' > test1.txt
Below command:
sed -n '/test.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test.txt">##g'|perl -p -e 's#</dp:file>##g'
produces output:
XJzLXJlc3VsdHMtYWN0aW9uX18i
<dp:file name="temporary://test1.txt">lc3VsdHMtYWN0aW9uX18i</dp:response> </env:Body></env:Envelope>`
Howeveer, in the output I am expecting only first line XJzLXJlc3VsdHMtYWN0aW9uX18i. Where I am commiting mistake?
When i run below command, I am getting expected output:
sed -n '/test1.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test1.txt">##g'|perl -p -e 's#</dp:file></dp:response></env:Body></env:Envelope>##g'
It produces below string
lc3VsdHMtYWN0aW9uX18i
I can then easily route this to test1.txt file.
UPDATE
I have edited the question by updating the source file content. The source file doesn't contain any newline character. The current solution will not work in that case, I have tried it and failed. wc -l temp must output to 1.
OS: solaris 10
Shell: bash
sed -n 's_<dp:file name="\([^"]*\)">\([^<]*\).*_\1 -> \2_p' temp
I add \1 -> to show link from file name to content but for content only, just remove this part
posix version so on GNU sed use --posix
assuming that base64 encoded contents is on the same line as the tag around (and not spread on several lines, that need some modification in this case)
Thanks to JID for full explaination below
How it works
sed -n
The -n means no printing so unless explicitly told to print, then there will be no output from sed
's_
This is to substitute the following regex using _ to separate regex from the replacement.
<dp:file name=
Regular text
"\([^"]*\)"
The brackets are a capture group and must be escaped unless the -r option is used( -r is not available on posix). Everything inside the brackets is captured. [^"]* means 0 or more occurrences of any character that is not a quote. So really this just captures anything between the two quotes.
>\([^<]*\)<
Again uses the capture group this time to capture everything between the > and <
.*
Everything else on the line
_\1 -> \2
This is the replacement, so replace everything in the regex before with the first capture group then a -> and then the second capture group.
_p
Means print the line
Resources
http://unixhelp.ed.ac.uk/CGI/man-cgi?sed
http://www.grymoire.com/Unix/Sed.html
/usr/xpg4/bin/sed works well here.
/usr/bin/sed is not working as expected in case if the file contains just 1 line.
below command works for a file containing only single line.
/usr/xpg4/bin/sed -n 's_<env:Envelope\(.*\)<dp:file name="temporary://BackUpDir/backupmanifest.xml">\([^>]*\)</dp:file>\(.*\)_\2_p' securebackup.xml 2>/dev/null
Without 2>/dev/null this sed command outputs the warning sed: Missing newline at end of file.
This because of the below reason:
Solaris default sed ignores the last line not to break existing scripts because a line was required to be terminated by a new line in the original Unix implementation.
GNU sed has a more relaxed behavior and the POSIX implementation accept the fact but outputs a warning.

Parsing HTML on the command line; How to capture text in <strong></strong>?

I'm trying to grab data from HTML output that looks like this:
<strong>Target1NoSpaces</strong><span class="creator"> ....
<strong>Target2 With Spaces</strong><span class="creator"> ....
I'm using a pipe train to whittle down the data to the targets I'm trying to hit. Here's my approach so far:
grep "/strong" output.html | awk '{print $1}'
Grep on "/strong" to get the lines with the targets; that works fine.
Pipe to 'awk '{print $1}'. That works in case #1 when the target has no spaces, but fails in case #2 when the target has spaces..only the first word is preserved as below:
<strong>Target1NoSpaces</strong><span
<strong>Target2
Do you have any tips on hitting the target properly, either in my awk or in different command? Anything quick and dirty (grep, awk, sed, perl) would be appreciated.
Try pup, a command line tool for processing HTML. For example:
$ pup 'strong text{}' < file.html
Target1NoSpaces
Target2 With Spaces
To search via XPath, try xpup.
Alternatively, for a well-formed HTML/XML document, try html-xml-utils.
One way using mojolicious and its DOM parser:
perl -Mojo -E '
g("http://your.web")
->dom
->find("strong")
->each( sub { if ( $t = shift->text ) { say $t } } )'
Using Perl regex's look-behind and look-ahead feature in grep. It should be simpler than using awk.
grep -oP "(?<=<strong>).*?(?=</strong>)" file
Output:
Target1NoSpaces
Target2 With Spaces
Add:
This implementation of Perl's regex's multi-matching in Ruby could match values in multiple lines:
ruby -e 'File.read(ARGV.shift).scan(/(?<=<strong>).*?(?=<\/strong>)/m).each{|e| puts "----------"; puts e;}' file
Input:
<strong>Target
A
B
C
</strong><strong>Target D</strong><strong>Target E</strong>
Output:
----------
Target
A
B
C
----------
Target D
----------
Target E
Here's a solution using xmlstarlet
xml sel -t -v //strong input.html
Trying to parse HTML without a real HTML parser is a bad idea. Having said that, here is a very quick and dirty solution to the specific example you provided. It will not work when there is more than one <strong> tag on a line, when the tag runs over more than one line, etc.
awk -F '<strong>|</strong>' '/<strong>/ {print $2}' filename
You never need grep with awk and the field separator doesn't have to be whitespace:
$ awk -F'<|>' '/strong/{print $3}' file
Target1NoSpaces
Target2 With Spaces
You should really use a proper parser for this however.
Since you tagged perl
perl -ne 'if(/(?:<strong>)(.*)(?:<\/strong>)/){print $1."\n";}' input.html
I am surprised no one mensions W3C HTML-XML-utils
curl -Ss https://stackoverflow.com/questions/18746957/parsing-html-on-the-command-line-how-to-capture-text-in-strong-strong |
hxnormalize -x |
hxselect -s '\n' strong
output:
<strong class="fc-black-750 mb6">Stack Overflow
for Teams</strong>
<strong>Teams</strong>
To capture only content:
curl -Ss https://stackoverflow.com/questions/18746957/parsing-html-on-the-command-line-how-to-capture-text-in-strong-strong |
hxnormalize -x |
hxselect -s '\n' -c strong
Stack Overflow
for Teams
Teams

AWK/SED. How to remove parentheses in simple text file

I have a text file looking like this:
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02) ... and so on.
I would like to modify the file by removing all the parenthesis and a new line for each couple
so that it look like this:
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
...
A simple way to do that?
Any help is appreciated,
Fred
I would use tr for this job:
cat in_file | tr -d '()' > out_file
With the -d switch it just deletes any characters in the given set.
To add new lines you could pipe it through two trs:
cat in_file | tr -d '(' | tr ')' '\n' > out_file
As was said, almost:
sed 's/[()]//g' inputfile > outputfile
or in awk:
awk '{gsub(/[()]/,""); print;}' inputfile > outputfile
This would work -
awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' inputfile > outputfile
Test:
[jaypal:~/Temp] cat file
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)
[jaypal:~/Temp] awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' file
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
This might work for you:
echo "(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)" |
sed 's/) (/\n/;s/[()]//g'
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
Guess we all know this, but just to emphasize:
Usage of bash commands is better in terms of time taken for execution, than using awk or sed to do the same job. For instance, try not to use sed/awk where grep can suffice.
In this particular case, I created a file 100000 lines long file, each containing characters "(" as well as ")". Then ran
$ /usr/bin/time -f%E -o log cat file | tr -d "()"
and again,
$ /usr/bin/time -f%E -ao log sed 's/[()]//g' file
And the results were:
05.44 sec : Using tr
05.57 sec : Using sed
cat in_file | sed 's/[()]//g' > out_file
Due to formatting issues, it is not entirely clear from your question whether you also need to insert newlines.

How to remove set of special characters (see attachment)

This characters is special I can not put in code because the forum not support it. Here is how it looks in code format: [32;1m
The cube (first character) is arrow to left in file (see links above).
Here is the picture of character how it look.See the file: http://www.dodaj.rs/f/2u/ar/3B1Q7J4Q/sample.jpg
And here is attachement of file it consist what I want to remove: http://hotfile.com/dl/124448134/58e08a0/File.log.html
Here is the complete file:
[32;1m/var/log/daemon.log file is rotated1...[0m
[32;1m/var/log/daemon.log file is rotated2...[0m
[37;1m/var/log/daemon.log file is rotated3...[0m
[35;1m/var/log/daemon.log file is rotated3...[0m
[33;1mhello[0m
[33;1mthis is sample[0m
[33;1mwhats up?[0m
What I want is to delete everything of unnecessary characters and output to be:
/var/log/daemon.log file is rotated1...
/var/log/daemon.log file is rotated2...
/var/log/daemon.log file is rotated3...
/var/log/daemon.log file is rotated3...
hello
this is sample
whats up?
I tried to delete special characters with sed like:
cat File.log | sed 's/[!##\$%^&*()]//g' | sed -e 's/37;1m//g' > output.log
but it do nothing.
Can someone please write me that code that make what I need?
Thx.
EDIT: After posting the post arrow can not see on forum...
sed -e 's/[[:cntrl:]]//g' -e 's/\[32;1m//g' -e 's/\[33;1m//g' -e 's/\[35;1m//g' -e 's/\[37;1m//g' -e 's/\[0m//g'
echo '[32;1m/var/log/daemon.log file is rotated1...[0m' | awk -F'1m' '{sub("\[0m","",$2);print $2}'
/var/log/daemon.log file is rotated1...