I am trying to fuzz a FTP server. After a few attempts (exactly 20) the boofuzz crashes when sending the following fuzzed string:
'USER %\xfe\xf0%\x00\xff\r\n'
Boofuzz crashed with the following error message:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfe in position 6: ordinal not in range(128)
My fuzzing script contains the following lines:
s_initialize("user")
s_static("USER")
s_delim(" ", fuzzable=False)
s_string(u"user", encoding="utf-8")
s_static("\r\n")
How can I deal with that UnicodeDecodeError?
The string primitive is currently a bit... primitive. I would recommend filing an issue on the Github project as you may have found a bug.
Edit: For a quick fix, I would switch to ascii to get the fuzzer running. If you post the stack trace, I'd be happy to look further.
Related
I originally ran my .Rnw-file with the latex option:
\usepackage[utf8]{inputenc}
It produced an error:
"! Package inputenc Error: Unicode char \u8: not set up for use with LaTeX."
I switched to [utf8x], which generated a somewhat more helpful error message:
"! Package ucs Error: Unknown Unicode character 150 = U+0096,
(ucs) possibly declared in uni-0.def."
I tried to replace the 0096 (http://www.charbase.com/0096-unicode-start-of-guarded-area) character with \DeclareUnicodeCharacter{0096}{\"o} to easily detect where to problem was but when using [utf8x] the error message remained the same and when using [utf8] there was an additional error: "! Package inputenc Error: Cannot define Unicode char value < 00A0"
Thanks for any help!
I had the same issue with my bibliography. In my editor (TeXstudio), the character U+0096 is rendered as whitespace. For some unknown reason, the line pdflatex reports as containing the offending character is inaccurate.
I solved the problem by running a regular expression search for \x0096 and it found the offending character immediately. Deleting the character and replacing it with a true space fixed the issue.
Incidentally, I tried the \DeclareUnicodeCharacter{0096}{ } fix and it did nothing for me. This could be because the offending character was in the .bib file rather than the .tex file where I placed the command.
I do not think that it is workable way by switching the [utf8x].
Just carefully check your code, particularly the part you copy from somewhere, not type it by your self.
I do have the same problem recently.
I show you How I solve this problem.
I remove the code from the R markdown part by part to find which part caused this problem. Finally, I found the below part that resulted in the error in my code.
### Platform:Affymetrix A-AFFY-2-Affymetrix GeneChip Arabidopsis Genome [ATH1-121501].
I remember I copy this information from webpage. So I delete them and type this part by myself. It can run and generate the pdf file without any error.
To be clear, I show you the difference between the copy version and the version of my typing:
This is just one example I think. I want to point out that it is always problematic when you copy something from an unknown resource file into your code.
Hope this can help you and other people who were frustrated by this problem.
The following will demonstrate the error:
catalyst.pl Hello
cd Hello
echo "encoding utf8" >> hello.conf
script/hello_server.pl -r
Then navigate to http://localhost:3000/?q=P%E9rl in your browser and you'll get a 400 Bad Request.
It appears to be Catalyst's _handle_param_unicode_decoding() method which is generating this error. Given that this error is trivial to generate, it's showing up in the error logs and Google has failed me in trying to fix this error. I can't stop users from entering query strings like that. How can I work around this?
URLs are suppose to be encoded using UTF-8. RFC3986:
When a new URI scheme defines a component that represents textual data consisting of characters from the Universal Character Set, the data should first be encoded as octets according to the UTF-8 character encoding; then only those octets that do not correspond to characters in the unreserved set should be percent-encoded.
P E9 r l is not valid UTF-8.
I believe you were going for Pérl (é is U+00E9)? That would be
$ perl -Mutf8 -MURI::Escape -E'say uri_escape_utf8("Pérl")'
P%C3%A9rl
400 Bad Request is an appropriate error for providing a bad URL. If the user doesn't want to see this error, they should use a valid URL. You could override Catalyst's default error handling behaviour (e.g. to provide a more precise error page) using handle_unicode_encoding_exception().
So there is a method in Catalyst.pm that you can modify in your subclass (Hello.pm in the example above) which controls how errors look. If you want to surprise those types of errors you can do so. Take a look at:
https://metacpan.org/source/JJNAPIORK/Catalyst-Runtime-5.90077/lib/Catalyst.pm#L3108
you can override that method if you like.
Alternatively if you have a proposal for a code change or some sort of configuration option you can branch off the Catalyst github repo and send me a pull request with your ideas:
https://github.com/perl-catalyst/catalyst-runtime
These methods are currently considered somewhat private but I am considering making them fully public.
I have done a bit of research on this error and can't really get my head around what's going on. As far as I understand I am basically having problems because I am converting from one type of encoding to another.
def write_table_to_file(table, connection):
db_table = io.StringIO()
cur = connection.cursor()
#pdb.set_trace()
cur.copy_to(db_table, table)
cur.close()
return db_tabl
This is the method that is giving me head aches. The below error is output when I run this method
[u350932#config5290vm0 python3]$ python3 datamain.py
Traceback (most recent call last):
File "datamain.py", line 48, in <module>
sys.exit(main())
File "datamain.py", line 40, in main
t = write_table_to_file("cms_jobdef", con_tctmsv64)
File "datamain.py", line 19, in write_table_to_file
cur.copy_to(db_table, table)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 40: ordinal not in range(128)
The client encoding on the database im retrieving the table from is
tctmsv64=> SHOW CLIENT_ENCODING;
client_encoding
-----------------
sql_ascii
(1 row)
The database encoding is LATIN1
The encoding for the database I am putting them onto is
S104838=# SHOW CLIENT_ENCODING;
client_encoding
-----------------
WIN1252
(1 row)
The database encoding is UTF8
From the threads I have found they recommend to change the encoding
To correct your function, you'll have to know what encoding the byte
string is in, and convert it to unicode using the decode() method,
and compare that result to the unicode string.
http://www.thecodingforums.com/threads/unicodedecodeerror-ascii-codec-cant-decode-byte-0xa0-in-position-10-ordinal-not-in-range-128.336691/
The problem is when I try and use the decode methods I get complaints that its not a file type. I have had a look at the python 3.4 methods for class io.StringIO(initial_value='', newline='\n')¶ method but could not find anything on changing the encoding.
I also found this page which outlined the problem but I couldn't figure out what I needed to do to solve it
https://wiki.python.org/moin/UnicodeDecodeError
Basically I'm quite confused as to what is going on and not sure how to fix it. Any help would be greatly appreciated.
Cheers
Python 3 changed file I/O behaviour around text encodings - massively for the better, IMO. You may find Processing Text Files in Python 3 informative.
It looks like psycopg2 is seeing that you passed a raw file object and is trying to encode the strings it's working with into byte sequences for writing to the file, with the assumption (since you didn't specify anything else) that you want to use the ascii encoding for the file.
I'd use an io.BytesIO object instead of StringIO, and specify the source encoding when you do a copy_from into the new database.
I'll be surprised if you don't have problems due to invalid, mixed, or otherwise borked text from your SQL_ASCII source database, though.
First of thanks Craig for your response. It was very helpful in making me realise that I needed to find a good way of doing this otherwise the data in my new database would be corrupt. Not something we want! After a bit more googling this link was very useful
https://docs.python.org/3/howto/unicode.html
I ended up using the StreamRecorder module and it works very well. Below is a snippet of my working code
def write_table_to_file(table, connection):
db_table = io.BytesIO()
cur = connection.cursor()
cur.copy_to(codecs.StreamRecoder(db_table,codecs.getencoder('utf-8'), codecs.getdecoder('latin-1'),
codecs.getreader('utf-8'), codecs.getwriter('utf-8')), table)
cur.close()
return db_table
Long story short I convert from latin-1 to utf-8 on the fly and it all works and my data looks good. Thanks again for the feedback Craig :)
A buddy and me are doing an assignment which involves writing out bytes to a file. Normally we'd use something like a FileOutputStream but this assignment explicitly asks us to write out the data (which is, in this case, in bytes) to standard out. Unfortunately it's not working as expected.
For example, writing this code:
System.out.write(144); // write the byte 0x90, which is 144 as an int
System.out.flush();
...actually writes to standard out the byte corresponding to 63 as an integer, rather than 144.
We didn't seem to come across this issue when using a FileOutputStream, but as I said earlier, the assignment wants us to write to standard out. Any ideas?
Thanks in advance.
I was also surprised when I saw in the Eclipse console the character 0x3F (63) instead of 0x90. As suggested by Miguel Prz, the problem is related with the console encoding.
By default, it is set on cp1252 which seems to convert 0x90 to 0x3F. The best you can do is to set ISO-8859-1 and to redirect to a file (see screenshot).
The console try to display the characters as well, but when I copy-paste to a Hex editor I don't get the correct result.
However, in this specific case I recommand to launch your program from a Windows or a Linux console and to redirect the output to a file. So you avoid any encoding transformation and you write binary data directly to a file. ISO-8859-1 has worked for 0x90 but can give problems with other characters. Moreover, it makes little sense to display all the bytes (0x00 to 0xFF) in the standard output (Eclipse console or whatever).
I'm having issues finding out what's wrong with the json string I receive from http://www.hier-bin-ich-koenig.de/json/events to be able to parse it. It doesn't validate, at least not with jsonlint, but I don't know where the issue is. So of course SBJson is unhappy too.
I also don't understand where that [Ô] is coming from. I'd love to know if it's from the content or the code that's converting the content into json. Being able to find where the validation error is would be great.
The exact error sent by the tokeniser is:
JSONValue failed. Error is: Illegal start of token [Ô]
Your page includes a UTF-16 BOM (byte order mark), followed by a UTF-8 encoded document. You should drop the BOM entirely. It is not recommended for UTF-8 encoding.
I had the same problem when I was parsing a json string which was generated by a PHP page. I resolved this problem by using Notepad++,
1, open the php file.
2, menu -> encoding -> encode UTF8 without BOM
3, save.
that's done.