I've done a question about this before, but still do not understand what to do.
I need to make canonicalized header and body for a email. I've read this piece of documentation lots of times. Could someone make a example, because I cannot wrap my head around this:
3.4.1. The "simple" Header Canonicalization Algorithm
The "simple" header canonicalization algorithm does not change header
fields in any way. Header fields MUST be presented to the signing or
verification algorithm exactly as they are in the message being
signed or verified. In particular, header field names MUST NOT be
case folded and whitespace MUST NOT be changed.
3.4.2. The "relaxed" Header Canonicalization Algorithm
The "relaxed" header canonicalization algorithm MUST apply the
following steps in order:
Convert all header field names (not the header field values) to
lowercase. For example, convert "SUBJect: AbC" to "subject: AbC".
Unfold all header field continuation lines as described in
[RFC5322]; in particular, lines with terminators embedded in
continued header field values (that is, CRLF sequences followed by
WSP) MUST be interpreted without the CRLF. Implementations MUST
NOT remove the CRLF at the end of the header field value.
Convert all sequences of one or more WSP characters to a single SP
character. WSP characters here include those before and after a
line folding boundary.
Delete all WSP characters at the end of each unfolded header field
value.
Delete any WSP characters remaining before and after the colon
separating the header field name from the header field value. The
colon separator MUST be retained.
3.4.3. The "simple" Body Canonicalization Algorithm
The "simple" body canonicalization algorithm ignores all empty lines
at the end of the message body. An empty line is a line of zero
length after removal of the line terminator. If there is no body or
no trailing CRLF on the message body, a CRLF is added. It makes no
other changes to the message body. In more formal terms, the
"simple" body canonicalization algorithm converts "*CRLF" at the end
of the body to a single "CRLF".
Note that a completely empty or missing body is canonicalized as a
single "CRLF"; that is, the canonicalized length will be 2 octets.
The SHA-1 value (in base64) for an empty body (canonicalized to a
"CRLF") is:
uoq1oCgLlTqpdDX/iUbLy7J1Wic=
The SHA-256 value is:
frcCV1k9oG9oKj3dpUqdJg1PxRT2RSN/XKdLCPjaYaY=
3.4.4. The "relaxed" Body Canonicalization Algorithm
The "relaxed" body canonicalization algorithm MUST apply the
following steps (1) and (2) in order:
Reduce whitespace:
Ignore all whitespace at the end of lines. Implementations
MUST NOT remove the CRLF at the end of the line.
Reduce all sequences of WSP within a line to a single SP
character.
Ignore all empty lines at the end of the message body. "Empty
line" is defined in Section 3.4.3. If the body is non-empty but
does not end with a CRLF, a CRLF is added. (For email, this is
only possible when using extensions to SMTP or non-SMTP transport
mechanisms.)
The SHA-1 value (in base64) for an empty body (canonicalized to a
null input) is:
2jmj7l5rSw0yVb/vlWAYkK/YBwk=
The SHA-256 value is:
47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=
3.4.5. Canonicalization Examples (INFORMATIVE)
In the following examples, actual whitespace is used only for
clarity. The actual input and output text is designated using
bracketed descriptors: "<SP>" for a space character, "<HTAB>" for a
tab character, and "<CRLF>" for a carriage-return/line-feed sequence.
For example, "X <SP> Y" and "X<SP>Y" represent the same three
characters.
Example 1: A message reading:
A: <SP> X <CRLF>
B <SP> : <SP> Y <HTAB><CRLF>
<HTAB> Z <SP><SP><CRLF>
<CRLF>
<SP> C <SP><CRLF>
D <SP><HTAB><SP> E <CRLF>
<CRLF>
<CRLF>
when canonicalized using relaxed canonicalization for both header and
body results in a header reading:
a:X <CRLF>
b:Y <SP> Z <CRLF>
and a body reading:
<SP> C <CRLF>
D <SP> E <CRLF>
Example 2: The same message canonicalized using simple
canonicalization for both header and body results in a header
reading:
A: <SP> X <CRLF>
B <SP> : <SP> Y <HTAB><CRLF>
<HTAB> Z <SP><SP><CRLF>
and a body reading:
<SP> C <SP><CRLF>
D <SP><HTAB><SP> E <CRLF>
Example 3: When processed using relaxed header canonicalization and
simple body canonicalization, the canonicalized version has a header
of:
a:X <CRLF>
b:Y <SP> Z <CRLF>
and a body reading:
<SP> C <SP><CRLF>
D <SP><HTAB><SP> E <CRLF>
Okay, let's try translating these examples into C strings:
3.4.5. Canonicalization Examples (INFORMATIVE)
In the following examples, actual whitespace is used only for
clarity. The actual input and output text is designated using
bracketed descriptors: "<SP>" for a space character, "<HTAB>" for a
tab character, and "<CRLF>" for a carriage-return/line-feed sequence.
For example, "X <SP> Y" and "X<SP>Y" represent the same three
characters.
Example 1: A message reading:
A: <SP> X <CRLF>
B <SP> : <SP> Y <HTAB><CRLF>
<HTAB> Z <SP><SP><CRLF>
<CRLF>
<SP> C <SP><CRLF>
D <SP><HTAB><SP> E <CRLF>
<CRLF>
<CRLF>
Translation:
char *message = "A: X\r\nB : Y\t\r\n\tZ \r\n\r\n C \r\nD \t E\r\n\r\n\r\n";
when canonicalized using relaxed canonicalization for both header and
body results in a header reading:
a:X <CRLF>
b:Y <SP> Z <CRLF>
Translation:
char *headers = "a:X\r\nb:Y Z\r\n";
and a body reading:
<SP> C <CRLF>
D <SP> E <CRLF>
Translation:
char *body = " C\r\nD E\r\n";
Example 2: The same message canonicalized using simple
canonicalization for both header and body results in a header
reading:
A: <SP> X <CRLF>
B <SP> : <SP> Y <HTAB><CRLF>
<HTAB> Z <SP><SP><CRLF>
Translation:
char *headers = "A: X\r\nB : Y\t\r\n\tZ \r\n";
and a body reading:
<SP> C <SP><CRLF>
D <SP><HTAB><SP> E <CRLF>
Translation:
char *body = " C \r\nD \t E\r\n";
Example 3: When processed using relaxed header canonicalization and
simple body canonicalization, the canonicalized version has a header
of:
a:X <CRLF>
b:Y <SP> Z <CRLF>
Translation:
char *headers = "a:X\r\nb:Y Z\r\n";
and a body reading:
<SP> C <SP><CRLF>
D <SP><HTAB><SP> E <CRLF>
Translation:
char *body = " C \r\nD \t E\r\n";
Related
I have a requirement where I need to read text file and extract some data and send the extracted to other system for which am unable to do it.
Input file:
1BoraBora Island
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
3BR 209078 BoraBora 6798989 99999
1 BR 67854 JAIHIND 789 000Y247 9898983
2 BR CR9 BoraBora 123 QK J12Y64 00010520
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Output should be:
1BoraBora Island
0000000000000000000000
1 BR 67854 JAIHIND 789 000Y247 9898983
2 BR CR9 BoraBora 123 QK J12Y64 00010520
Need to extract only row having "BR" in it at 3th letter.
Please guide me how to achieve this in text format only.
Assuming that the input is `text/plain'. Using a DataWeave script and the subscript() function you can extract a given position from the input:
%dw 2.0
import * from dw::core::Strings
output text/plain
var lines=payload splitBy "\n" // separate text into an array of lines
---
lines[0] ++"\n" ++ lines[1] ++"\n"
++ (lines[2 to -1] // use the range selector to get the remaining lines
filter (substring($,2,4)=="BR") // filter lines that have "BR" at the right position
reduce ($$++"\n"++$) // concatenate the remaining lines again into a single text file
)
Output:
1BoraBora Island
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
1 BR 67854 JAIHIND 789 000Y247 9898983
2 BR CR9 BoraBora 123 QK J12Y64 00010520
Since you are working with Text, you can also use Regex with the scan function to scan all lines that matches your condition then joinBy a new line character
%dw 2.0
output text/plain
---
flatten(payload scan /(?<=^|\n).{2}BR.*/)
joinBy "\n"
(?<=^|\n).{2}BR.* Regex breakdown:
(?<= is a positive lookbehind, that means it will start matching the rest of the pattern only if it follows the pattern specified by it
(?<=^|\n) is positive lookahead with either start of string (^) of a new line (\n)
.{2}BR.* indicates any character twice followed by the literal BR then any number of any character thereafter
I have a scenario like below:
I have a .dat file where header field name which is coming as below example:
2_a 2_b 2_c 2_d 2_e - Header
a b c d e - Data
f g h I j - Trailer
Next time
1_a 1_b 1_c 1_d 1_e -Header
c d e f g -data
b d f j k - trailer
So I want to achieve like my header record number is dynamically changing. Is there any way that I can achieve it so that I will just put the value and it will pick that value before that...like if I will input value 3 the header record will become 3_a 3_b like that....
After that my data will come and then trailer...Please suggest me the process as I am new to powershell...
If you want to create a line like
2_a 2_b 2_c 2_d 2_e
or
1_a 1_b 1_c 1_d 1_e
dynamically, you could use the string format operator, -f, like this
$index = 2
$header = "{0}_a {0}_b {0}_c {0}_d {0}_e" -f $index
this will create the first header and save it to a variable. Change the $index variable to create another string with some other number instead.
See this link for more info on its usage.
I have a problem, I've got an escaped string for example "\\u0026" and I need this to transform to unicode char '\u0026'.
Tricks like
string_concat('\\', S, "\\u0026"), write(S).
didn't help, because it will remove \ not only the escape . So basically my problem is, how to remove escape chars from the string.
EDIT: Oh, I've just noticed, that stackoverflow also plays with escape \.
write_canonical/1 gives me "\\u0026", how to transform that into a single '&' char?
In ISO Prolog a char is usually considered an atom of length 1.
Atoms and chars are enclosed in single quotes, or written without
quotes if possible. Here are some examples:
?- X = abc. /* an atom, but not a char */
X = abc
?- X = a. /* an atom and also a char */
X = a
?- X = '\u0061'.
X = a
The \u notation is SWI-Prolog specific, and not found in the ISO
Prolog. In SWI-Prolog there is a data type string again not found
in the ISO Prolog, and always enclosed in double quotes. Here are
some examples:
?- X = "abc". /* a string */
X = "abc"
?- X = "a". /* again a string */
X = "a"
?- X = "\u0061".
X = "a"
If you have a string at hand of length 1, you can convert it to a char
via the predicate atom_string/2. This is a SWI-Prolog specific predicate,
not in ISO Prolog:
?- atom_string(X, "\u0061").
X = a
?- atom_string(X, "\u0026").
X = &
Some recommendation. Start learning the ISO Prolog atom predicates first,
there are quite a number. Then learn the SWI-Prolog atom and string predicates.
You dont have to learn so many new SWI-Prolog predicates, since in SWI-Prolog most of the ISO Prolog predicates also accept strings. Here is an example of the ISO Prolog predicate atom_codes/2 used with a string in the first argument:
?- atom_codes("\u0061\u0026", L).
L = [97, 38].
?- L = [0'\u0061, 0'\u0026].
L = [97, 38].
?- L = [0x61, 0x26].
L = [97, 38].
P.S: The 0' notation is defined in the ISO Prolog, its neither a char, atom or string, but it represents an integer data type. The value is the code of the given char after the 0'. I have combined it with the SWI-Prolog \u notation.
P.P.S: The 0' notation in connection of the \u notation is of course redundant, in ISO Prolog one can directly use the hex notation prefix 0x for integer values.
The thing is that "\\u0026" is already what you are searching for because it represents \u0026.
I am trying to write a dictionary containing utf-8 strings to a CSV. I'm following the instructions from here. However, despite meticulously encoding and decoding these utf-8 strings, I am getting a UnicodeEncodeErrors involving 'ascii' sets.
I have a list of dictionaries which contain strings and ints as values related to changes to Wikipedia articles. The list below corresponds to this change, for example:
edgelist = [{'articleName': 'Barack Obama', 'editorName': 'Schonbrunn', 'revID': '121844749', 'bytesAdded': '183'},
{'articleName': 'Barack Obama', 'editorName': 'Eep\xc2\xb2', 'revID': '121862749', 'bytesAdded': '107'}]
The problem is list[1]['editorName']. It has type 'str' and el[1]['editorName'].decode('utf-8') is u'Eep\xb2'
The code I am attempting is:
_ENCODING = 'utf-8'
def dictToCSV(edgelist,output_file):
with codecs.open(output_file,'wb',encoding=_ENCODING) as f:
w = csv.DictWriter(f,sorted(edgelist[0].keys()))
w.writeheader()
for d in edgelist:
for k,v in d.items():
if type(v) == int:
d[k]=str(v).encode(_ENCODING)
w.writerow({k:v.decode(_ENCODING) for k,v in d.items()})
This returns:
dictToCSV(edgelist,'test2.csv')
File "csv_to_charts.py", line 129, in dictToCSV
w.writerow({k:v.decode(_ENCODING,'ignore') for k,v in d.items()})
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 148, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb2' in position 3: ordinal not in range(128)
Other permutations such as swapping decode for encode or nothing in the final problematic line also return errors:
w.writerow({k:v.encode(_ENCODING) for k,v in d.items()}) returns 'UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 56: ordinal not in range(128)
w.writerow({k:v for k,v in d.items()}) returns UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 56: ordinal not in range(128)
Following this, I changed with codecs.open(output_file,'wb',encoding=_ENCODING) as f: to with open(output_file,'wb') as f: and still receive the same error.
Excluding the list element(s) or the keys containing this problematic string, the script works fine otherwise.
I just edited your code as follows and the csv was written successfully.
from django.utils.encoding import smart_str
import csv
def dictToCSV(edgelist, output_file):
f = open(output_file, 'wb')
w = csv.DictWriter(f, fieldnames=sorted(edgelist[0].keys()))
w.writeheader()
for d in edgelist:
w.writerow(dict(k=smart_str(v)) for k, v in d.items())
f.close()
Copy the Django code and customize it to your need.
A strict interpretation of ASCII encoding only allows ordinals 0-127. Any value outside that range is not ASCII by definition. Since both \xc2 & \xb2 have ordinals higher than 127, they cannot be interpreted as ASCII.
I'm not a Python user, the RFC for CSV mentions ASCII as a common usage but defines an optional 'charset' parameter for the MIME type; I wonder if the writer you're using also might have an 'encoding' setting?
Your strings are already in UTF-8, and DictWriter doesn't work with codecs.open. Following that example:
# coding: utf-8
import csv
edgelist = [
{'articleName': 'Barack Obama', 'editorName': 'Schonbrunn', 'revID': '121844749', 'bytesAdded': '183'},
{'articleName': 'Barack Obama', 'editorName': 'Eep\xc2\xb2', 'revID': '121862749', 'bytesAdded': '107'}]
with open('out.csv','wb') as f:
f.write(u'\ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly)
w = csv.DictWriter(f,sorted(edgelist[0].keys()))
w.writeheader()
for d in edgelist:
w.writerow(d)
Output:
articleName,bytesAdded,editorName,revID
Barack Obama,183,Schonbrunn,121844749
Barack Obama,107,EepĀ²,121862749
Note, you can use 'editorName': 'EepĀ²' directly instead of 'editorName': 'Eep\xc2\xb2'. The byte string will be UTF-8-encoded per the # coding: utf-8 and if you save the source file in UTF-8.
I have a data file like the following:
----------------------------
a b c d e .............
A B C D E .............
----------------------------
But I want it to be in the following format:
----------------------------
a A
b B
c C
d D
e E
...
...
----------------------------
What is the quickest way to do the transformation in Vim or Perl?
Basically :.s/ /SpaceCtrl+vEnter/gEnterjma:.s/ /Ctrl+vEnter/gEnterCtrl+v'axgg$p'adG will do the trick. :)
OK, let's break that down:
:.s/ /Ctrl+vEnter/gEnter: On the current line (.), substitute (s) spaces (/ /) with a space followed by a carriage return (SpaceCtrl+vEnter/), in all positions (g). The cursor should now be on the last letter's line (e in the example).
j: Go one line down (to A B C D E).
ma: Set mark a to the current position... because we want to refer to this position later.
:.s/ /Ctrl+vEnter/gEnter: Do the same substitution as above, but without the Space. The cursor should now be on the last letter's line (E in the example).
Ctrl+v'a: Select from the current cursor position (E) to mark a (that we set in step 3 above), using the block select.
x: Cut the selection (into the " register).
gg: Move the cursor to the first line.
$: Move the cursor to the end of the line.
p: Paste the previously cut text after the cursor position.
'a: Move the cursor to the a mark (set in step 3).
dG: Delete everything (the empty lines left at the bottom) from the cursor position to the end of the file.
P.S. I was hoping to learn about a "built-in" solution, but until such time...
Simple re-map of the columns:
use strict;
use warnings;
my #a = map [ split ], <>; # split each line on whitespace and store in array
for (0 .. $#{$a[0]}) { # for each such array element
printf "%s %s\n", $a[0]->[$_], $a[1]->[$_]; # print elements in order
}
Usage:
perl script.pl input.txt
Assuming that the cursor is on the first of the two lines, I would use
the command
:s/ /\r/g|+&&|'[-;1,g/^/''+m.|-j