Jsoup parse with unreadable characters - encoding

When parsing flashscore server with JSoup I have unreadable characters.
Jsoup code:
document = Jsoup.connect(URL + LABEL + SEASON + 1 + END)
.userAgent(USER_AGENT)
.header("x-fsign", FSIGN)
.get();
Server response:
<html>
<head></head>
<body>
SA÷1¬~ZA÷ИТАЛИЯ: Серия В¬ZEE÷6oug4RRc¬ZB÷98¬ZY÷Италия¬ZC÷GbNgKxPB¬ZD÷p¬ZE÷K28bJgeL
How to work with it?

Set the correct charset in the "charset" attribute:
JSoup character encoding issue
document = Jsoup.parse(new URL(url).openStream(), "ISO-8859-1", url);

Related

Deployment Azur function from Business Central

I try to deploy the Azure function by using Rest API and zip-archive of solution.
It works properly in Postman.
I've found advice on how to upload mp3 files and develop a solution for my task.
But when I try to create a payload for request by AL-code for Business Central (file have been uploaded to instr):
CR := 13;
LF := 10;
NewLine += '' + CR + LF;
httpHeader.Clear();
TempBlob.CreateOutStream(PayloadOutStream);
PayloadOutStream.WriteText('--boundary' + NewLine);
PayloadOutStream.WriteText(StrSubstNo('Content-Disposition: form-data; name="file"; filename="%1"', filename) + NewLine);
PayloadOutStream.WriteText('Content-Type: application/zip' + NewLine);
PayloadOutStream.WriteText(NewLine);
CopyStream(PayloadOutStream, InStr);
PayloadOutStream.WriteText(NewLine);
PayloadOutStream.WriteText('--boundary');
PayloadOutStream.WriteText(NewLine);
TempBlob.CreateInStream(PayloadInStream);
Content.WriteFrom(PayloadInStream);
Content.GetHeaders(httpHeader);
if httpHeader.Contains('Content-Type') then httpHeader.Remove('Content-Type');
httpHeader.Add('Content-Type', 'multipart/form-data;boundary=boundary');
httpRequest := CreateHttpRequestMessage(Content, 'Post', RequestURI);
Client.Clear();
Client.DefaultRequestHeaders.Add('Authorization', StrSubstNo('Bearer %1', token));
if Client.Send(httpRequest, httpResponse) then begin
httpResponse.Content().ReadAs(responseText);
Message(responseText);
end
else
Error(RequestErrorMsg);
I received an error in the response message from the deployment process like this:
{"Message":"An error has occurred.","ExceptionMessage":"Number of entries expected in End Of Central Directory does not correspond to number of entries in Central Directory.","ExceptionType":"System.IO.InvalidDataException","StackTrace":" at System.IO.Compression.ZipArchive.ReadCentralDirectory()\r\n at System.IO.Compression.ZipArchive.get_Entries()\r\n at Kudu.Core.Infrastructure.ZipArchiveExtensions.Extract(ZipArchive archive, String directoryName, ITracer tracer, Boolean doNotPreserveFileTime) in C:\\Kudu Files\\Private\\src\\master\\Kudu.Core\\Infrastructure\\ZipArchiveExtensions.cs:line 114\r\n at Kudu.Services.Deployment.PushDeploymentController.<>c__DisplayClass21_0.<LocalZipFetch>b__1() in C:\\Kudu Files\\Private\\src\\master\\Kudu.Services\\Deployment\\PushDeploymentController.cs:line 746\r\n at System.Threading.Tasks.Task.InnerInvoke()\r\n at System.Threading.Tasks.Task.Execute()......
I believe, something is wrong when I build the payload. Could you give me advice on how I have to build the body of request for my case?

node-soap how to response a mesage(from server) using soap version 1.2

I'm using node-soap library.
my project working with soap version 1.2 from the server side.
The problem is when im response a message to the client.
I made a debugging in node-soap library and i saw that the namespace of soap was built hard coded.
this is the code from node-soap, server.js file.
Server.prototype._envelope = function (body, includeTimestamp) {
var defs = this.wsdl.definitions,
ns = defs.$targetNamespace,
encoding = '',
alias = findPrefix(defs.xmlns, ns);
var xml = "<?xml version=\"1.0\" encoding=\"utf-8\"?>" +
"<soap:Envelope xmlns:soap=\"http://schemas.xmlsoap.org/soap/envelope/\" " +
encoding +
this.wsdl.xmlnsInEnvelope + '>';
var headers = ''; inc...
you can see the declaration of soap namespace is xmlns:soap=\"http://schemas.xmlsoap.org/soap/envelope/\"
Now, i'm changed this specific code to be:
xmlns:soap=\"http://www.w3.org/2003/05/soap-envelope/\"
and it works.
There is a best way to fix the problem in this library? (like options??)
(like forcesoap12headers:truein client [it works excellent])
thanks for the help,
ariel

How To Produce HTML Table of Filehashes - with Relative Path Only?

I want to generate a HTML table that shows a filehash (sha1) of a bunch of files in a directory; I want the filenames to be relative to my current directory - not absolute.
I know how to do all the different bits separately, but I can't figure out how to chain-them up.
Here's what I've got so far:
dir|get-filehash -Algorithm sha1
Which gives me this:
Algorithm Hash Path
--------- ---- ----
SHA1 DA39A3EE5E6B4B0D3255BFEF95601890AFD80709 C:\temp\test\empty.txt
SHA1 88A5B867C3D110207786E66523CD1E4A484DA697 C:\temp\test\hello.txt
Now I only want the hash and filename , so I can do this:
dir|get-filehash -Algorithm sha1|select-object hash, path
Which gives me:
Hash Path
---- ----
DA39A3EE5E6B4B0D3255BFEF95601890AFD80709 C:\temp\test\empty.txt
88A5B867C3D110207786E66523CD1E4A484DA697 C:\temp\test\hello.txt
So I can output this to an HTML file like this:
(dir|get-filehash -Algorithm sha1|select-object hash, path)|ConvertTo-html|add-content output.htm
[ignore the fact that this only works properly if the output file doesn't exist for now].
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>HTML TABLE</title>
</head><body>
<table>
<colgroup><col/><col/></colgroup>
<tr><th>Hash</th><th>Path</th></tr>
<tr><td>DA39A3EE5E6B4B0D3255BFEF95601890AFD80709</td><td>C:\temp\test\empty.txt</td></tr>
<tr><td>88A5B867C3D110207786E66523CD1E4A484DA697</td><td>C:\temp\test\hello.txt</td></tr>
</table>
So this gives me a HTML table; but the PATH values are absolute.
I know a simple way of getting a relative path using the 'Resolve-Path' cmdlet:
dir | Resolve-Path -Relative
.\empty.txt
.\hello.txt
But I can't get it to 'fit' in the rest of my script ; I guess their might be a .NET function to do this in a different way ? Or is there some fancy ninja-use of brackets that let me squeeze this call to a cmdlet insde of the 'select-object' list ?
I tried this: but it doesn't work:
# NOTE: this code does not work !
PS > dir|get-filehash|select-object hash, (path|Resolve-Path -relative)
Made some progress on this - but the solution still isn't very satisfactory - since I also need control over the headers (rather than just 'name', 'value' which the HashTable provides).
I can get these headers working in a 'format-table',but still not in 'convertto-html'.
Here's the code so far:
function get-relative($infile) {
begin { $return_hash=#{} }
process {
$relative_path=($_|resolve-path -Relative)
$filehash=($_| Get-FileHash -Algorithm sha1).hash
$return_hash.add($relative_path, $filehash)
}
end { return $return_hash }
}
And to call it with a formatter:
$table_format = #{Expression={$_.Name} ; Label="Filename"}, #{Expression={$_.Value} ; Label="CHECKSUM(SHA1)"}
dir|get-relative|format-table $table_format
This gives:
Filename CHECKSUM(SHA1)
-------- --------------
.\goodbye.txt DA39A3EE5E6B4B0D3255BFEF95601890AFD80709
.\hello.txt 88A5B867C3D110207786E66523CD1E4A484DA697
But if I shove that through a 'convertto-html', weirdness ensues. (I guess this is because the output from 'format-table' is just a string by now....)
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>HTML TABLE</title>
</head><body>
<table>
<colgroup><col/><col/><col/><col/><col/><col/></colgroup>
<tr><th>ClassId2e4f51ef21dd47e99d3c952918aff9cd</th><th>pageHeaderEntry</th><th>pageFooterEntry</th><th>autosizeInfo</th><th>shapeInfo</th><th>
groupingEntry</th></tr>
<tr><td>033ecb2bc07a4d43b5ef94ed5a35d280</td><td></td><td></td><td></td><td>Microsoft.PowerShell.Commands.Internal.Format.TableHeaderInfo</td><
td></td></tr>
<tr><td>9e210fe47d09416682b841769c78b8a3</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td>27c87ef9bbda4f709f6b4002fa4af63c</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td>27c87ef9bbda4f709f6b4002fa4af63c</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td>4ec4f0187cb04f4cb6973460dfe252df</td><td></td><td></td><td></td><td></td><td></td></tr>
<tr><td>cf522b78d86c486691226b40aa69e95c</td><td></td><td></td><td></td><td></td><td></td></tr>
</table>
</body></html>

XML parsing, illegal character in the end of the string

I'm getting very strange error when I trying to convert a string to XML in MS SQL Server:
Msg 9420, Level 16, State 1, Line 5
XML parsing: line 1, character 8071, illegal xml character
If I check the string in some text editor, I can see that its length is 8070. Why is it complaining about character 8071 if it does not exist?
This is how I'm converting string to XML:
CAST(REPLACE(SUBSTRING(
REPLACE(REPLACE(REPLACE(ResponseData,'ä','a'),'ö','o'),'å','a'),
PATINDEX('%<?xml%',ResponseData), PATINDEX('%sonType>', ResponseData)+6),
'<?xml version="1.0" encoding="utf-16"?>',
'<?xml version="1.0" encoding="utf-8"?>')as XML) as ResponseData
Are any of replaces causing the problem?
UPD: The problem also is that in ResponseData column the XML string is stored together with some other data. Example:
Error from service: <Some error description>. Sent request: <?xml version="1.0" encoding="utf-16"?><Contents of the XML>
So I need to get that XML string from the column and then convert it to XML.
You could try to change original encoding from UTF-16 to ISO-8859-1, or a more precise encoding for your characters:
DECLARE #data varchar(max) = '<?xml version="1.0" encoding="utf-16"?><...>'
SELECT CAST(REPLACE(#data,
'<?xml version="1.0" encoding="utf-16"?>',
'<?xml version="1.0" encoding="iso-8859-1"?>') AS XML) ResponseData

Websocket implementation in Python 3

Trying to create a web-front end for a Python3 backed application. The application will require bi-directional streaming which sounded like a good opportunity to look into websockets.
My first inclination was to use something already existing, and the example applications from mod-pywebsocket have proved valuable. Unfortunately their API doesn't appear to easily lend itself to extension, and it is Python2.
Looking around the blogosphere many people have written their own websocket server for earlier versions of the websocket protocol, most don't implement the security key hash so dont' work.
Reading RFC 6455 I decided to take a stab at it myself and came up with the following:
#!/usr/bin/env python3
"""
A partial implementation of RFC 6455
http://tools.ietf.org/pdf/rfc6455.pdf
Brian Thorne 2012
"""
import socket
import threading
import time
import base64
import hashlib
def calculate_websocket_hash(key):
magic_websocket_string = b"258EAFA5-E914-47DA-95CA-C5AB0DC85B11"
result_string = key + magic_websocket_string
sha1_digest = hashlib.sha1(result_string).digest()
response_data = base64.encodestring(sha1_digest)
response_string = response_data.decode('utf8')
return response_string
def is_bit_set(int_type, offset):
mask = 1 << offset
return not 0 == (int_type & mask)
def set_bit(int_type, offset):
return int_type | (1 << offset)
def bytes_to_int(data):
# note big-endian is the standard network byte order
return int.from_bytes(data, byteorder='big')
def pack(data):
"""pack bytes for sending to client"""
frame_head = bytearray(2)
# set final fragment
frame_head[0] = set_bit(frame_head[0], 7)
# set opcode 1 = text
frame_head[0] = set_bit(frame_head[0], 0)
# payload length
assert len(data) < 126, "haven't implemented that yet"
frame_head[1] = len(data)
# add data
frame = frame_head + data.encode('utf-8')
print(list(hex(b) for b in frame))
return frame
def receive(s):
"""receive data from client"""
# read the first two bytes
frame_head = s.recv(2)
# very first bit indicates if this is the final fragment
print("final fragment: ", is_bit_set(frame_head[0], 7))
# bits 4-7 are the opcode (0x01 -> text)
print("opcode: ", frame_head[0] & 0x0f)
# mask bit, from client will ALWAYS be 1
assert is_bit_set(frame_head[1], 7)
# length of payload
# 7 bits, or 7 bits + 16 bits, or 7 bits + 64 bits
payload_length = frame_head[1] & 0x7F
if payload_length == 126:
raw = s.recv(2)
payload_length = bytes_to_int(raw)
elif payload_length == 127:
raw = s.recv(8)
payload_length = bytes_to_int(raw)
print('Payload is {} bytes'.format(payload_length))
"""masking key
All frames sent from the client to the server are masked by a
32-bit nounce value that is contained within the frame
"""
masking_key = s.recv(4)
print("mask: ", masking_key, bytes_to_int(masking_key))
# finally get the payload data:
masked_data_in = s.recv(payload_length)
data = bytearray(payload_length)
# The ith byte is the XOR of byte i of the data with
# masking_key[i % 4]
for i, b in enumerate(masked_data_in):
data[i] = b ^ masking_key[i%4]
return data
def handle(s):
client_request = s.recv(4096)
# get to the key
for line in client_request.splitlines():
if b'Sec-WebSocket-Key:' in line:
key = line.split(b': ')[1]
break
response_string = calculate_websocket_hash(key)
header = '''HTTP/1.1 101 Switching Protocols\r
Upgrade: websocket\r
Connection: Upgrade\r
Sec-WebSocket-Accept: {}\r
\r
'''.format(response_string)
s.send(header.encode())
# this works
print(receive(s))
# this doesn't
s.send(pack('Hello'))
s.close()
s = socket.socket( socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(('', 9876))
s.listen(1)
while True:
t,_ = s.accept()
threading.Thread(target=handle, args = (t,)).start()
Using this basic test page (which works with mod-pywebsocket):
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Web Socket Example</title>
<meta charset="UTF-8">
</head>
<body>
<div id="serveroutput"></div>
<form id="form">
<input type="text" value="Hello World!" id="msg" />
<input type="submit" value="Send" onclick="sendMsg()" />
</form>
<script>
var form = document.getElementById('form');
var msg = document.getElementById('msg');
var output = document.getElementById('serveroutput');
var s = new WebSocket("ws://"+window.location.hostname+":9876");
s.onopen = function(e) {
console.log("opened");
out('Connected.');
}
s.onclose = function(e) {
console.log("closed");
out('Connection closed.');
}
s.onmessage = function(e) {
console.log("got: " + e.data);
out(e.data);
}
form.onsubmit = function(e) {
e.preventDefault();
msg.value = '';
window.scrollTop = window.scrollHeight;
}
function sendMsg() {
s.send(msg.value);
}
function out(text) {
var el = document.createElement('p');
el.innerHTML = text;
output.appendChild(el);
}
msg.focus();
</script>
</body>
</html>
This receives data and demasks it correctly, but I can't get the transmit path to work.
As a test to write "Hello" to the socket, the program above calculates the bytes to be written to the socket as:
['0x81', '0x5', '0x48', '0x65', '0x6c', '0x6c', '0x6f']
Which match the hex values given in section 5.7 of the RFC. Unfortunately the frame never shows up in Chrome's Developer Tools.
Any idea what I'm missing? Or a currently working Python3 websocket example?
When I try talking to your python code from Safari 6.0.1 on Lion I get
Unexpected LF in Value at ...
in the Javascript console. I also get an IndexError exception from the Python code.
When I talk to your python code from Chrome Version 24.0.1290.1 dev on Lion I don't get any Javascript errors. In your javascript the onopen() and onclose() methods are called, but not the onmessage(). The python code doesn't throw any exceptions and appears to have receive message and sent it's response, i.e exactly the behavior your seeing.
Since Safari didn't like the trailing LF in your header I tried removing it, i.e
header = '''HTTP/1.1 101 Switching Protocols\r
Upgrade: websocket\r
Connection: Upgrade\r
Sec-WebSocket-Accept: {}\r
'''.format(response_string)
When I make this change Chrome is able to see your response message i.e
got: Hello
shows up in the javascript console.
Safari still doesn't work. Now it raise's a different issue when I attempt to send a message.
websocket.html:36 INVALID_STATE_ERR: DOM Exception 11: An attempt was made to use an object that is not, or is no longer, usable.
None of the javascript websocket event handlers ever fire and I'm still seeing the IndexError exception from python.
In conclusion. Your Python code wasn't working with Chrome because of an extra LF in your header response. There's still something else going on because the code the works with Chrome doesn't work with Safari.
Update
I've worked out the underlying issue and now have the example working in Safari and Chrome.
base64.encodestring() always adds a trailing \n to it's return. This is the source of the LF that Safari was complaining about.
call .strip() on the return value of calculate_websocket_hash and using your original header template works correctly on Safari and Chrome.