How to concat two utf8 string in erlang? - unicode

I have two variable as below:
A = <<"سعید"/utf8>>,
B = <<"حیدری"/utf8>>,
how can i concat A and B ?
C = <<A/utf8, B/utf8>>.
line above returns exception error: bad argument

utf8 is just encoding. It is binary as any other binary:
1> A = <<"سعید"/utf8>>,
1> B = <<"حیدری"/utf8>>,
1> C = <<A/bytes, B/bytes>>.
<<216,179,216,185,219,140,216,175,216,173,219,140,216,175,
216,177,219,140>>
2> io:put_chars([C, $\n]).
سعیدحیدری
ok
P.S.: The result is shown reversed because of web browser behavior. It is shown in correct order in the console.

Related

How can I extract first and last line from multiple text blocks separated with new line?

I have a file containing multiple tests with detailed action written one beneath another. All test blocks are separated one from another by new line. I want to extract only first and last line from the all blocks and put it on one line for each block into a new file. Here is an example:
input.txt:
[test1]
duration
summary
code=
Results= PASS
[test2]
duration
summary=x
code=
Results=FAIL
.....
[testX]
duration
summary=x
code=
Results= PASS
output.txt should be sometime like this:
test1 PASS
test2 FAIL
...
testX PASS
eg2:
[Linux_MP3Enc_xffv.2_Con_37_003]
type = testcase
summary = MP3 encoder test
ActionGroup[Linux_Enc] = PASS
ActionGroup[Linux_Playb] = PASS
ActionGroup[Linux_Pause_Resume] = PASS
ActionGroup[Linux_Fast_Seek] = PASS
Duration = 230.607398987 s
Total_Result = PASS
[Composer__vtx_007]
type = testcase
summary = composer
Background[0xff000000] = PASS
Background[0xffFFFFFF] = PASS
Background[0xffFF0000] = PASS
Background[0xff00FF00] = PASS
Background[0xff0000FF] = PASS
Background[0xff00FFFF] = PASS
Background[0xffFFFF00] = PASS
Background[0xffFF00FF] = PASS
Duration = 28.3567230701 s
Total_Result = PASS
[Videox_Rotate_008]
type = testcase
summary = rotation
Rotation[0] = PASS
Rotation[1] = PASS
Rotation[2] = PASS
Rotation[3] = PASS
Duration = 14.0116529465 s
Total_Result = PASS
Thank you!
Short and simple gnu awk:
awk -F= -v RS='' '{print $1 $NF}' file
[Linux_MP3Enc_xffv.2_Con_37_003] PASS
[Composer__vtx_007] PASS
[Videox_Rotate_008] PASS
If you do not like the brackets:
awk -F'[]=[]' -v RS='' '{print $2 $NF}' file
Linux_MP3Enc_xffv.2_Con_37_003 PASS
Composer__vtx_007 PASS
Videox_Rotate_008 PASS
Using sed as tagged (although other tools would probably be more natural to use) :
sed -nE '/^\[.*\]$/h;s/^Results= ?//;t r;b;:r;H;x;s/\n/ /;p'
Explanation :
/^\[.*\]$/h # matches the [...] lines, put them in the hold buffer
s/^Results= ?// # matches the Results= lines, discards the useless part
t r;b # on lines which matched, jump to label r;
# otherwise jump to the end (and start processing the next line)
:r;H;x;s/\n/ /;p # label r; append the pattern space (which contains the end of the Results= line)
# to the hold buffer. Switch Hold buffer and pattern space,
# replace the linefeed in the pattern space by a space and print it
You can try it here.
One way to solve this is using a regular expression such as:
(?<testId>test\d+)(?:.*\n){4}.*(?<outcome>PASS|FAIL)
The regex matches your sample output and stores the test id (e.g. "test1") in the capture group named "testId" and the outcome (e.g. "PASS") in the capture group "outcome".
(Test it in regexr)
The regex can be used in any language with regex support. The below code shows how to do it in Python.
(Test it in repl.it)
import re
# Read from input.txt
with open('input.txt', 'r') as f:
indata = f.read()
# Modify the regex slightly to fit Python regex syntax
pattern = '(?:.*)(?P<testId>test\d+)(?:.*\n){4}.*(?P<outcome>PASS|FAIL)'
# Get a generator which yeilds all matches
matches = re.finditer(pattern, indata)
# Combine the matches to a list of strings
outputs = ['{} {}'.format(m.group('testId'), m.group('outcome')) for m in matches]
# Join all rows to one string
output = '\n'.join(outputs)
# Write to output.txt
with open('output.txt', 'w') as f:
f.write(output)
Running the above script on input.txt containing:
[test1]
duration
summary
code=
Results= PASS
[test2]
duration
summary=x
code=
Results=FAIL
[test444]
duration
summary=x
code=
Results= PASS
yields a file output.txt containing:
test1 PASS
test2 FAIL
test444 PASS
In order to print the first and last line from the block, how about:
awk -v RS="" '{
n = split($0, a, /\n/)
print a[1]
print a[n]
}' input.txt
Result for the 1st example:
[Linux_MP3Enc_xffv.2_Con_37_003]
Total_Result = PASS
[Composer__vtx_007]
Total_Result = PASS
[Videox_Rotate_008]
Total_Result = PASS
The man page of awk tells:
If RS is set to the null string, then records are separated by blank lines.
You can easily split the block with blank lines with this feature.
Hope this helps.

Extracting certain part of a string using strtok

I'm trying to extract a part of the string by using strtok(), but I am unable to get complete output.
For input:
string = '3_5_2_spd_20kmin_corrected_1_20190326.txt';
Output:
>> strtok(string)
ans =
'3_5_2_spd_20kmin_corrected_1_20190326.txt'
>> strtok(string,'.txt')
ans =
'3_5_2_spd_20kmin_correc'
>> strtok(string,'0326')
ans =
'_5_'
>> strtok(string,'2019')
ans =
'3_5_'
>> strtok(string,'.txt')
ans =
'3_5_2_spd_20kmin_correc'
I expect the output 3_5_2_spd_20kmin_corrected_1_20190326, but the actual output was 3_5_2_spd_20kmin_correc. Why is that and how can I get the correct output?
strtok treats every character inside the second input argument as a separate delimiter.
For example, when calling:
strtok("3_5_2_spd_20kmin_corrected_1_20190326.txt",'.txt')
Matlab sees as separate delimiters the .,t,x and therefore splits your input at the first t it encounters and gives back the result 3_5_2_spd_20kmin_correc.
In your other example using '2019', again '2019' is not a single delimiter but delimiterS, in the sense that the actual delimiters used are all '2','0','1','9'. Therefore the first delimiter encountered in the string (left to right) is '2', right after '3_5_'. That's why it returns '3_5_'.
To achieve your expected output, I think you would be better off using
strsplit
instead:
result = strsplit(string,".txt");
result{1}
extractBefore does what you're looking to do:
>> string = '3_5_2_spd_20kmin_corrected_1_20190326.txt';
>> extractBefore(string,'.txt')
ans =
'3_5_2_spd_20kmin_corrected_1_20190326'
If your strings are file names/paths, and your goal is to extract the file name without extension, the best option would be to use fileparts, like so:
>> str = '3_5_2_spd_20kmin_corrected_1_20190326.txt';
>> [~, name] = fileparts(str)
name =
'3_5_2_spd_20kmin_corrected_1_20190326'

Extra text when using jsondecode

I am trying to come to the point where I create a graph on a given data that I am supposed to read from a text file.
So I use in my code fopen to open the text file, textscan to scan it, than make a string out of it and by using split I want to cut of the first part of every line and use the second part so that I can decode it into json and then use the information.
So my text file consists of two lines of information:
123456.99 :: working completed: result=0 , data ="{"day":"monday", "breakfast":"sandwich"}"
123456.99 :: working completed: result=0 , data ="{"day":"tuesday", "breakfast":"bread"}"
The first part of my code:
fileID = fopen('test1');
text = textscan(fileID, '%s', 'delimiter','\n','whitespace','');
strLog = string(text{1});
res = split(strLog, "data =");
json_str = res(:, 2)
And as a result I get a 2x1 string array. Output:
json_str =
2×1 string array
""{"day":"monday", "breakfast":"sandwich"}""
""{"day":"tuesday", "breakfast":"bread"}""
This is where I got stuck.
My first idea was to call cellfun and apply jsondecode.
But I got
Error using jsondecode JSON syntax error at line 1, column 4
(character 4): extra text.
But it makes no sence to me, since that should be the " from "day" which for json should be okay!?
In json_str you have quote marks " at the start and end. These need to be removed for jsondecode to work. For example J = jsondecode(json_str{1}(2:end-1)).
You can then use cellfun to process all elements. For example,
S = cellfun(#(x)jsondecode(x(2:end-1)),json_str)

python3.4 under Windows7 encoding issuу - \u2014

I have 'em dash' character in my python code to split by it a line in a certain txt file.
with open(path, 'r') as r:
number = r.readline()
num = number.split(' — ')[1].replace('\n',' — ')
It worked fine under ubuntu with python3.4, but when running the code under windows 7 (python3.4) get the following error.
num = number.split(' \u2014 ')[1].replace('\n',' \u2014 ') IndexError:
list index out of range
I'm sure that it should work and It seems that the problem is in encoding.
Will appreciate any help to fix my programm. I've tried to set "# -- coding: utf-8 --" without any result
SOLUTION WAS open(path, mode, encoding='UTF8')
when you do:
num = number.split(' — ')[1].replace('\n',' — ')
you assume that the string 'number' contains a dash, and then take the second field ([1]), if number does not contains a dash then [1] does not exists, only [0] exists, and you get the index out of range response.
if ' — ' in number:
num = number.split(' — ')[1].replace('\n',' — ')
else:
num = number.replace('\n',' — ')
furthermore, as you are now on Windows, you might want to check for '\r\n' as well as '\n' depending what the file is using as end of line character(s)

How to pass attribute text as a variable to select a single node in xml using VBScript?

I'm trying to select a single node in an xml file using VBscript using the following code
Set node = xmlDoc.selectingSingleNode(".//node()[#name = 'anything']")
This works perfectly if I write what I need to pass as a text.
But I need to pass this 'anything' as a variable X
I tried the following but neither is working
xmlDoc.selectingSingleNode(".//node()[#name = X]")
xmlDoc.selectingSingleNode(".//node()[#name = '&X&']")
Any suggestions are appreciated
Just concatenate properly:
>> X = "abc"
>> WScript.Echo ".//node()[#name = '" & X & "']"
>>
.//node()[#name = 'abc']