Dart encodeFull issue

Dart encodeFull issue - flutter

When i encode an uri from integer to string. Dart put this %E2%80%8B value before number. What is the reason? Why the dart put this value before number? I can't send parameters from url like that.
const number = 15;
String uri = Uri.encodeFull("https://google.com/${number}");
print(uri);
i got this output https://google.com/15%E2%80%8B why ?? check the link there is a space end of uri. When it work like this i can't call the api because its not a integer value.
İs it a bug or something?

%E2%80%8B represents a sequence of hexadecimal bytes E2, 80, 8B. That sequence corresponds to the UTF-8 representation of the Unicode "zero width space" character (U+200B).
I suspect that you copied and pasted your string from somewhere and inadvertently included an invisible Unicode character. Deleting and retyping the last portion of your "https://google.com/${number}" string literal should fix the problem.

Please add more info, I Have tested your code it's behaving normally.
var number=1234;
var k=Uri.encodeFull("https://google.com/${number}");
Output: https://google.com/1234

Related

how to fix Base64 mutated encoding?

I've got a string, that looks like a Base64 ASCII encoded string:
#2aHR0cDovL2RhdGEwMi1jZG4uZGF0YWxvY2sucnUvZmkybG0vZTAxNGNmZTZhMzE1ZjgyODgyZWUxMmJjOTY5MzQ1MDkvN2ZfVGVzdC5uYS5iZXJlbWVubm9zdC5zMDMuZTAyLldFQi1ETFJpcC4yNUt1em1pY2g\/\/b2xvbG8=uYTEuMTYuMTEuMjIubXA0
If I decode it without any edit, it seems like a mess, but if I remove 2 chars from the very begining (#2), it decodes into a mostly correct string:
http://data02-cdn.datalock.ru/fi2lm/e014cfe6a315f82882ee12bc96934509/7f_Test.na.beremennost.s03.e02.WEB-DLRip.25Kuzmich?
but is still not complete. This URL should be like:
http://data02-cdn.datalock.ru/fi2lm/f03143c36c778262bd9906da5d545f85/7f_Test.na.beremennost.s03.e02.WEB-DLRip.25Kuzmich.a1.16.11.22.mp4
If I remove some more characters from initial string (#2aHR0cDovL2RhdGEwMi1jZG4uZ), I get a corrupted text with correct ending of decoded URL:
][KLKLMMLMYYLLML
LK\K\[Y[LPQ\R^ZXololo.a1.16.11.22.mp4
Is it a regular problem of base64 encoding or maybe there is some sort of mutation in encoded string and it can be solved?
In my experiments i used base64decode.org

The slashes should not be backslashed and there should not be an equals sign in the middle of the stream.
If I remove the backslashed slashes and the equals sign, I get
http://data02-cdn.datalock.ru/fi2lm/e014cfe6a315f82882ee12bc96934509/7f_Test.na.beremennost.s03.e02.WEB-DLRip.25Kuzmich.16.11.22.184
I don't think there's anything "regular" about this corruption. Lucky they didn't inject valid base64 or a lot of spurious junk.

Search for string in base64 encoded document?

I have some data that is stored in a base64 stream. I need to search for a certain string, let's say "White Rabbit". Rather than decoding all the records and searching for "White Rabbit" I thought I might be able to encode the string and search for that.
I don't know much about base64 and encoding, but I did notice that ANYTHING I encode has an = or an == at the end of it. There are no equal signs anywhere else in any encoded string.
So, what does this mean for my search? Can I just remove the equal signs?

The = serves as padding. A base64 string will only end with one or two = if they are required to pad the string out to the proper length.
Reference: https://en.wikipedia.org/wiki/Base64#Padding
As you mentioned, the process would be to encode your search string and then perform a normal search for it after removing any = characters..

Ogg metadata - Vorbis Comment end

I want to implement a class to read vorbis comments. I know that a field will start with a field name, followed by an equal sign and the value. But how does it end? Documentation makes me think that a semicolon will end the field but I checked an ogg file with a hex editor and I cannot see any.
This is how I think it should look like in a file :
TITLE=MY SUPER TITLE;
The field name is title, followed by the equals sign and then the value is MY SUPER TITLE. And finally the semicolon to end the field.
But instead inside my file, the fields look like this :
TITLE=MY SUPER TITLE....
It's almost as above but there is no semicolon. The .'s are characters that cannot be displayed. I thought okay, it seems like the dots represent a value that will say "this is the end of the field!!" but they are almost always different. I noticed that there are always exactly 4 dots. The first dot has always a different value. The other free have usually a value of 0. But not always...
My question now, how does a field end? How do I read this comment?
Also, yeah I know that there are libraries and that I should use them instead of reinventing the wheel over and over again. I will use libraries later but first I want to know how to do it myself. Educational purpose only.

Each field is preceded by a little-endian 32-bit integer that indicates the number of bytes to read. You then convert the bytes to a string via UTF8.
See NVorbis' implementation (LoadComments(...)) for details.

Unicode characters in document info dictionary keys

How do I create document info dictionary keys containing unicode characters (typically swedish characters, for instance C3A4 U+00E4 ä). I would like to use the PdfStamper to enter my own metadata in the document info dictionary, but I can't get it to accept the swedish characters.
Entering custom metadata using Acrobat works fine and looking at the PDF in a text editor I can see that the characters get encoded like for instance #C3#A4 for the character mentioned above. So is there a way to achieve this programmatically using iText PdfStamper???
regards
Mattias
PS. There is no problem having unicode characters in the info dictionary values, but the keys are a different story.

Please take a look at the NameObject example, and give it a try. You'll see that iText automatically escapes special characters in names.
iText follows the ISO-32000-1 specification that stats (7.3.5, Name Objects):
Beginning with PDF 1.2 a name object is an atomic symbol uniquely
defined by a sequence of any characters (8-bit values) except null
(character code 0). Uniquely defined means that any two name objects
made up of the same sequence of characters denote the same object.
Atomic means that a name has no internal structure; although it is
defined by a sequence of characters, those characters are not
considered elements of the name.
not part of the name but is a prefix indicating that what follows is a
sequence of characters representing the name in the PDF file and shall
follow these rules:
a) A NUMBER SIGN (23h) (#) in a name shall be written by using its
2-digit hexadecimal code (23), preceded by the NUMBER SIGN.
b) Any character in a name that is a regular character (other than
NUMBER SIGN) shall be written as itself or by using its 2-digit
hexadecimal code, preceded by the NUMBER SIGN.
c) Any character that is not a regular character shall be written
using its 2-digit hexadecimal code, preceded by the NUMBER SIGN only.
NOTE 1: There is not a unique encoding of names into the PDF file
because regular characters may be coded in either of two ways.
White space used as part of a name shall always be coded using the
2-digit hexadecimal notation and no white space may intervene between
the SOLIDUS and the encoded name.
Regular characters that are outside the range EXCLAMATION MARK(21h)
(!) to TILDE (7Eh) (~) should be written using the hexadecimal
notation.
The token SOLIDUS (a slash followed by no regular characters)
introduces a unique valid name defined by the empty sequence of
characters.
NOTE 2 The examples shown in Table 4 and containing # are not valid
literal names in PDF 1.0 or 1.1.
I'm not copy/pasting table 4, but I don't see any example that uses characters that consist of two bytes. Can you share a PDF that contains a name with a two-byte character that behaves in the way you desire? The PDF specification explicitly says that characters in the context of names are 8-bit values. You seem to be talking about 16-bit values...
Additional note: in the current implementation of iText, we only look at 8 bits:
c = (char)(chars[k] & 0xff);
We deliberately throw away all the higher bits when characters with more than 8 bits are passed.
Actually, I think I have answered your question. Initially, I thought you were asking to add this character: http://www.fileformat.info/info/unicode/char/c3a4/index.htm
As it turns out, you only need "\u00e4" (ä). I've made a small code sample that demonstrates how one would add a custom entry to the DID containing this character: ChangeInfoDictionary.
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
Map<String, String> info = reader.getInfo();
info.put("Special Character: \u00e4", "\u00e4");
stamper.setMoreInfo(info);
stamper.close();
reader.close();
}
Granted, when you open the PDF in a PDF viewer, you don't necessarily see "Special Character: ä" as the key value, but that's a problem of the PDF viewer. When you open the PDF in a text editor, you clearly see:
/Special#20Character:#20#e4(ä)
Which means that iText has correctly escaped the special character.
However: as you pointed out in your comment, the character doesn't show up in Adobe Reader. Based on a PDF I created using Acrobat, I found a workaround by using the following code:
StringBuffer buf = new StringBuffer();
buf.append((char) 0xc3);
buf.append((char) 0xa4);
info.put(buf.toString(), "\u00e4");
Now the character is shown correctly. In other words: it's a matter of encoding...

Just wanted to share a little experiment in C# illustrating one rather effortless way of getting the special characters into the document info dictionary keys.
string inputString = "My key with åäö";
byte[] inputBytes = Encoding.UTF8.GetBytes(inputString);
string convertedString = Encoding.UTF7.GetString(inputBytes);
info.Add(convertedString, "My value with åäö");
(info is the Dictionary used for adding metadata) Then just use the PdfStamper to get the info into the PDF. The metadata is stored correctly in the PDF and can be interpreted by Adobe Reader.

String to Unicode in C#

I want to use Unicode in my code. My Unicode value is 0100 and I am adding my Unicode string \u with my value. When I use string myVal="\u0100", it's working, but when I use like below, it's not working. The value is looking like "\\u1000";. How do I resolve this?
I want to use it like the below one, because the Unicode value may vary sometimes.
string uStr=#"\u" + "0100";

There are a couple of problems here. One is that #"\u" is actually the literal string "\u" (can also be represented as "\u").
The other issue is that you cannot construct a string in the way you describe because "\u" is not a valid string by itself. The compiler is expecting a value to follow "\u" (like "\u0100") to determine what the encoded value is supposed to be.
You need to keep in mind that strings in .NET are immutable, which means that when you look at what is going on behind the scenes with your concatenated example (`#"\u"+"0100"), this is what is actually happening:
Create the string "\u"
Create the string "0100"
Create the string "\u0100"
So you have three strings in memory. In order for that to happen all of the strings must be valid.
The first option that comes to mind for handling those values is to actually parse them as integers, and then convert them to characters. Something like:
var unicodeValue = (char)int.Parse("0100",System.Globalization.NumberStyles.AllowHexSpecifier);
Which will give you the single Unicode character value. From there, you can add it to a string, or convert it to a string using ToString().