python3.4 under Windows7 encoding issuу - \u2014 - encoding

I have 'em dash' character in my python code to split by it a line in a certain txt file.
with open(path, 'r') as r:
number = r.readline()
num = number.split(' — ')[1].replace('\n',' — ')
It worked fine under ubuntu with python3.4, but when running the code under windows 7 (python3.4) get the following error.
num = number.split(' \u2014 ')[1].replace('\n',' \u2014 ') IndexError:
list index out of range
I'm sure that it should work and It seems that the problem is in encoding.
Will appreciate any help to fix my programm. I've tried to set "# -- coding: utf-8 --" without any result
SOLUTION WAS open(path, mode, encoding='UTF8')

when you do:
num = number.split(' — ')[1].replace('\n',' — ')
you assume that the string 'number' contains a dash, and then take the second field ([1]), if number does not contains a dash then [1] does not exists, only [0] exists, and you get the index out of range response.
if ' — ' in number:
num = number.split(' — ')[1].replace('\n',' — ')
else:
num = number.replace('\n',' — ')
furthermore, as you are now on Windows, you might want to check for '\r\n' as well as '\n' depending what the file is using as end of line character(s)

Related

How to escape special char when use glib.string.escape()

Due to the document of glib.string.escape()
Escapes the special characters '\b', '\f', '\n', '\r', '\t', '\v', '\' and '"' in the string source by inserting a '\' before them.
Additionally all characters in the range 0x01-0x1F (everything below SPACE) and in the range 0x7F-0xFF (all non-ASCII chars) are replaced with a '\' followed by their octal representation. Characters supplied in exceptions are not escaped.
Now I want not eacape "0x7F-0xFF" characters. How to write the exceptions part?
my example code no work.
shellcmd = "bash -c \""+file.get_string(title,"List").escape("0x7F-0xFF")+"\"";
print("shellcmd: %s\n", shellcmd);
Process.spawn_command_line_sync (shellcmd,
out ls_stdout, out ls_stderr, out ls_status);
if(ls_status!=0){ list = ls_stderr.split("\n"); }
else{ list = ls_stdout.split("\n"); }
this works.
shellcmd = "bash -c \""+file.get_string(title,"Check").replace("\"","\\\"")+"\"";
You actually have to put the characters 0x7f to 0xff in the exceptions argument. So something like:
shellcmd = "bash -c \""+file.get_string(title,"List").escape("\x7F\x80\x81\x82…\xfe\xff")+"\"";
You would need to list them all manually.
Looking more generally at your code, you seem to be constructing a command to run. This is a very bad idea and you should never do it. It is wide open to code injection. Use Process.spawn_sync() and pass it an argument vector instead.

Extracting values from a single file

I have a file with multiple lines; but a specific line contains tons of information, with several repeated expressions. I'm trying to extract some specific values. I first tried some commands with sed, for instance, but with no success. So, I was wondering if you could give me some insights.
So, here you have one fraction of the unique line of the given document I mentioned:
[...]6[&length_range={0.19
[... a lot of more information here in between ...]
0.01},habitat.set.prob={0.01,0.03,0.56,0.01,0.01,0.34,0.01,0.01,0.01},DLOOP.rate_median=0.04131395026396427,length=
[...]
10[&length_range={0.19
[... a lot of more information here in between ...]
0.01},habitat.set.prob={0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61},DLOOP.rate_median=0.04131395026396427,length=
[...]
My aim here is first to extract all the values that is between the brackets, after "habitat.set.prob={". and put them in a single line in a text file.
Also, it would be important to extract the numbers that appears just before the expression "[&length_range=]", which in this case are "6" and "10". They are the label of the set of numbers after "prob={"
So the set of numbers I want to extract always appears between "habitat.set.prob={" and "},DLOOP.rate_median", while the other number (the label) is always rigth before "[&length_range="; but what is before the label is not the same expression; actually it is a random number.
The goal then is end up with a file with the following characteristcs:
6 0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
10 0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
and so on …
What do you think? Is this possible?
I started with this very basic command at least to try to extract the set of numbers, but it didn't work
sed -n "/habitat.set.prob={/,/},DLOOP.rate_median=/ p"
| Well... I got some improvement.
I was able to get the values at least:
awk '{gsub("habitat.set.prob={","\n");printf"%s",$0}' filename | awk -F'},' '{print $1"}"}' | grep -iv "TREE" > stats.txt
|
Many thanks in advance.
Cheers,
Luiz
Something like that:
sed -rn '/.*[0-9]+\[&length_range=\{/,/habitat.set.prob=\{/{s/.*\b([0-9]+)\[&length_range.*/\1/p; s/.*habitat.set.prob=\{([^D]+)\},DLOOP.rate.*/\1/p}' habitat
6
0.01,0.03,0.56,0.01,0.01,0.34,0.01,0.01,0.01
10
0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
The first part '/.a./,/.b./' searches from pattern a to b, distributed over multiple lines. The -n told sed to do non-printing as default.
In '/.a./,/.b./{s/.c./.d./p; s/.e./.f./p}'
there are two substitution commands with p=print in curly braces.
I am not sure if you really digged a little, so not providing the complete answer, but let's hope this would help you:
for the first part: getting the no(which you call as label) you didn't mention if there is any specific pattern, so try this (data is the file which contains the actual input) - you need to work on how to get the number and tweak the RE a bit
sed -n 's/.*\([0-9][0-9]*\).*length_range.*/\1/p' data
For the other part which gives the numericals between habitat and DLOOP:
sed -n 's/.*habitat.set.prob=\(.*\),DLOOP.*/\1/pg' data | tr '{' ' ' | tr '}' ' '
Now, try to take this as a starter and work on your output to get your desired result!
To explain a bit:
In the first section - I am trying to capture the numericals between anything(.*) and (.*)length_range [you can escape the character [ and & by using \ in front of them]
In the second section: I am capturing pattern in between habitat.set.prob and DLOOP and then doin a tr to remove the brackets.
#include <iostream>
using namespace std;
int main()
{
string p = "1:2:3:4"; //input your string
int arr[4] = {}; //create a new empty integer array to put the integers in it
for(int i=0, j=0; i <p.length(); i++){//loop on the string to extract integers
if( p[i] == ':'){continue;}//if the value = ':' skip it and continue
arr[j]=(int)p[i]-48;j++;//put the integer in the array we created
}
cout << "String={"<<arr[0]<<" "<<arr[1]<<" "<<arr[2]<<" "<<arr[3]<<"}";//print the array
return 0;
}

Printing a warning message over multiple lines

I am trying to print a warning message that is a little long and includes 2 variable calls. Here's my code:
warning( 'MATLAB:questionable_argument', ...
'the arguments dt (%d) and h (%d) are sub-optimal. Consider increasing nt or decreasing nx.', ...
dt, h )
Obviously, the line of text extends to the right when viewing the MATLAB code. How can I break it so it wraps nicely? I've tried multiple things but keep getting syntax errors.
As suggested in comments, just insert a \n where you want to break the line. You can also use a variable for the text, to make it easy to read also within the code:
txt = sprintf(['the arguments dt (%d) and h (%d) are sub-optimal.\n'...
'Consider increasing nt or decreasing nx.'],dt,h);
warning( 'MATLAB:questionable_argument',txt)
If you just embed escape characters such as \n in a warning string, it will not work:
warning('Hi there.\nPlease do not do that.')
will just print out:
Warning: hi there.\nPlease do not do that
However, if you pre-format the text using sprintf , then all the escape characters will work. For instance:
warnText = sprintf('Hi there.\nPlease do not do that.');
warning(warnText)
Produces what you want:
Warning: Hi there.
Please do not do that.
A more simple version than EBH had provided is as shown:
str1 = 'text 1';
str2 = 'text 2';
str3 = 'etc.';
str = sprintf('\n%s \n%s \n%s \n',str1,str2,str3);
warning(str)

Matlab: How to print " ' " character

I am trying to create the following string:
javaaddpath ('C:\MatlabUserLib\ParforProgMonv2')
However, I could only do the following
command = sprintf('%s ', varargin{1}, '(', varargin{2}, ')');
and that gives me:
javaaddpath ( C:\MatlabUserLib\ParforProgMonv2 )
UPDATE:
Based on Dan's suggestion, I used the following:
command = sprintf('%s', varargin{1}, '(', '''', varargin{2}, '''', ')')
Use two single quotation marks. See the docs for formatting strings, btw this concept is known as an escape character (to help you google such things in the future).
command = sprintf('%s ', varargin{1}, '(''', varargin{2}, ''')')
Although I think you might prefer
command = sprintf('%s (''%s'')', varargin{1}, varargin{2})
or if you have no other varargins (which I guess is very unlikely but anyway)
command = sprintf('%s (''%s'')', varargin{:})
There are a couple of ways around this. First you could declare your path as a string variable then pass the string to your command, eg,
path = 'my/path'
javaaddpath (path)
Or you can use special characters to insert things like a single quote or a new line character, so for a single quote,
EDIT: wrong display command as pointed out by Dan below
myString = '" Hi there! "'
disp(myString)

EDIFACT macro (readable message structure)

I´m working within the EDI area and would like some help with a EDIFACT macro to make the EDIFACT files more readable.
The message looks like this:
data'data'data'data'
I would like to have the macro converting the structure to:
data'
data'
data'
data'
Pls let me know how to do this.
Thanks in advance!
BR
Jonas
If you merely want to view the files in a more readable format, try downloading the Softshare EDI Notepad. It's a fairly good tool just for that purpose, it supports X12, EDIFACT and TRADACOMS standards, and it's free.
Replacing in VIM (assuming that the standard EDIFACT separators/escape characters for UNOA character set are in use):
:s/\([^?]'\)\(.\)/\1\r\2/g
Breaking down the regex:
\([^?]'\) - search for ' which occurs after any character except ? (the standard escape character) and capture these two characters as the first atom. These are the last two characters of each segment.
\(.\) - Capture any single character following the segment terminator (ie. don't match if the segment terminator is already on the end of a line)
Then replace all matches on this line with a new line between the segment terminator and the beginning of the next segment.
Otherwise you could end up with this:
...
FTX+AAR+++FORWARDING?: Freight under Vendor?'
s care.'
NAD+BY+9312345123452'
CTA+PD+0001:Terence Trent D?'
Arby'
...
instead of this:
...
FTX+AAR+++FORWARDING?: Freight under Vendor?'s care .'
NAD+BY+9312345123452'
CTA+PD+0001:Terence Trent D?'Arby'
...
Is this what you are looking for?
Option Explicit
Dim stmOutput: Set stmOutput = CreateObject("ADODB.Stream")
stmOutput.Open
stmOutput.Type = 2 'adTypeText
stmOutput.Charset = "us-ascii"
Dim stm: Set stm = CreateObject("ADODB.Stream")
stm.Type = 1 'adTypeBinary
stm.Open
stm.LoadFromFile "EDIFACT.txt"
stm.Position = 0
stm.Type = 2 'adTypeText
stm.Charset = "us-ascii"
Dim c: c = ""
Do Until stm.EOS
c = stm.ReadText(1)
Select Case c
Case Chr(39)
stmOutput.WriteText c & vbCrLf
Case Else
stmOutput.WriteText c
End Select
Loop
stm.Close
Set stm = Nothing
stmOutput.SaveToFile "EDIFACT.with-CRLF.txt"
stmOutput.Close
Set stmOutput = Nothing
WScript.Echo "Done."