mPDF outputs the pdf as symbols insted of characters - character

I want to display the pdf in browser. I use mpdf to generate pdf and it generates correctly. When I use
$mpdf->Output('binders/' . $filename . '.pdf', 'I');
it outputs as the following symbols.
When I use F instead of I, i'm getting the pdf saved on the folder with correct characters.
Can someone help me please.
%PDF-1.4 %���� 3 0 obj <> /Contents 4 0 R>> endobj 4 0 obj <> stream x��[�o�����j#���d���ٙ��)dYvlĎ�]��ئ-��^�(EpE�,Ăd!J�� �,5��O������۽;^h�ݛ�����y�͜���S��.wN}�&�}��R2�"�vU\;��_��~|���_)q�V���ִ�m]E�-��ŚX�#�mh6:���R�]�tq������#�wk9w��C�)D؞.���#�Su<��K[�)<&�-]lYL�J�]]E<�:|���z��.�e٤9hH#�b&� �]��f�5s���uE�?7]Z9���5k��`6�j�j�!��Zѧ��j���L���Zƪ=�EH�vp�����O�0���g|� �����F��� ���e�R�br�]�H"k⁽�^�#v����}�[{І#���,�8A;�X����u8��E���D�o�3�YqQd�Ls:p�)�����ϸ��U��s���n�3_�ç��G�/�����J��Q|����;]�$���/�+R��j��V�]�X��k��Jj��v�]X��d������su�L�je"3����k��=�����F��;Q���H����\���D�$�4�V���;��ƈ���{nJ��36���NיFl��nt�U���Kݜ>��[Lv�L;�

The string is a binary PDF representation and its presence means Content-type: application/pdf header is not sent correctly or it is
overriden by your code or setup. Most likely by text/plain or text/html.
Try to figure out these:
Are you resetting Content-type header in PHP code somewhere after calling the mPDF Output method?
Is your server forcing a different Content-type somewhere in your setup?
Does your browser support displaying application/pdf Content-type directly?
Does the D Output mode offer the file for downloading or does it display the same characters?
If the D mode displays the same characters, then the Content-type header is most likely being overriden somewhere in your app or setup after calling the mPDF Output method.

Related

ITEXT PDFReader not able to read PDF

I am not able to read a PDF file using itext pdfreader. This PDf is valid PDF if I tried to open this.
URL Of PDF: http://www.fundslibrary.co.uk/FundsLibrary.DataRetrieval/Documents.aspx?type=fund_class_kiid&id=f096b13b-3d0e-4580-8d3d-87cf4d002650&user=fidelitydocumentreport
The PDF in question is encrypted.
According to the PDF specification,
Encryption applies to all strings and streams in the document's PDF file, with the following exceptions:
The values for the ID entry in the trailer
Any strings in an Encrypt dictionary
Any strings that are inside streams such as content streams and compressed object streams, which themselves are encrypted
Later on there are information on special cases in which the document level metadata stream is not encrypted either or in which only attachments are encrypted.
The Cross-Reference Stream Dictionary of the PDF looks like this:
<<
/Root 101 0 R
/Info 63 0 R
/XRef(stream)
/Encrypt 103 0 R
/ID[<D034DE62220E1CBC2642AC517F0FE9C7><D034DE62220E1CBC2642AC517F0FE9C7>]
/Type/XRef
/W[1 3 2]
/Index[0 107]
/Size 107
/Length 642
>>
As you can see there is an non-encrypted string here, (stream), which is neither the value for the ID entry, nor in an Encrypt dictionary, nor inside a stream. Furthermore, the afore mentioned special cases do not apply here either.
Thus, this file violates the PDF specification here. Therefore, this file is not a valid PDF.
Furthermore, according to the PDF specification
The last line of the file shall contain only the end-of-file marker, %%EOF.
The file at handsends like this
Thus, the last line of the file does contain something else than the end-of-file marker (which is in the line before), a 0x06 and a 0x0c.
The file, therefore, violates the PDF specification here, too.

Cannot retrieve attachments with blankspaces in the filename with javamail

I have problems when downloading attachments with javamail when they have a blankspace in the file name and no extension.
This is due to the content-type of the BodyPart. For the files example.pdf and example I have the content-type equal to APPLICATION/PDF; name=example.pdf and APPLICATION/OCTET-STREAM; name=example, respectively, while if I have the file example 2 i have APPLICATION/OCTET-STREAM;. This makes me impossible to retrieve the file with javamail.
This is very strange to me, anybody knows why? Or some workaround?
Thanks
I don't know what "problems downloading attachments" means exactly. Is there an exception thrown?
If the message is properly formatted, spaces in the filename shouldn't make any difference.
If the message isn't properly formatted (e.g., the "name" parameter isn't quoted when it contains spaces), you may need to set some properties to have JavaMail work around that bug in the sending program. See the javadocs for the javax.mail.internet package for details.

How to reconstruct a String correctly from a String which be encoded by wrong charset

Edit Added some new information to make the question more clearly.
In matlab early before 2012B, the method urlread would return a string constructed by wrong charset if the web content's charset is not utf8. (It has been improved somewhat in Matlab 2012B)
For example
% a chinese website whose content encoding by gb2312
url = 'http://www.cnbeta.com/articles/213618.htm';
html = urlread(url)
Because Matlab encoded the html using utf8 instead of gb2312.
You will see the chinese character in the html doesnot show correctly.
If I read a chinese website with utf8 encoded, then everything works fine:
% a chinese website whose content encoding by utf8
url = 'http://www.baidu.com/';
html = urlread(url)
So is there any way to reconstruct the string correctly from html?
I have tried as following, but it didnot work:
>> bytes = unicode2native(html,'utf8');
>> str = native2unicode(bytes,'gb2312')
However, I do known there is a way to fix the urlread's problem:
Type edit urlread.m in the console and then replace the code nearly Line 108 (In matlab 2011B):
output = native2unicode(typecast(byteArrayOutputStream.toByteArray','uint8'),'UTF-8');
by:
output = native2unicode(typecast(byteArrayOutputStream.toByteArray','uint8'),'gb2312');
Save the file, and now urlread would works for website encoded by gb2312.
Actually, this solution point out why urlread doesnot work sometime. The method urlread always use the utf8 charset to encode string even if the content is not encoded by utf8.
It seems that you already have the solution, just create a function called urlread_gb that can read gb2312.

Encoding problems in ASP when using English and Chinese characters

I am having problems with encoding Chinese in an ASP site. The file formats are:
translations.txt - UTF-8 (to store my translations)
test.asp - UTF-8 - (to render the page)
test.asp is reading translations.txt that contains the following data:
Help|ZH|帮助
Home|ZH|首页
The test.asp splits on the pipe delimiter and if the user contains a cookie with ZH, it will display this translation, else it will just revert back to the Key value.
Now, I have tried the following things, which have not worked:
Add a meta tag
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
Set the Response.CharSet = "UTF-8"
Set the Response.ContentType = "text/html"
Set the Session.CodePage (and Response) to both 65001 (UTF-8)
I have confirmed that the text in translations.txt is definitely in UTF-8 and has no byte order mark
The browser is picking up that the page is Unicode UTF-8, but the page is displaying gobbledegook.
The Scripting.OpenTextFile(<file>,<create>,<iomode>,<encoding>) method returns the same incorrect text regardless of the Encoding parameter.
Here is a sample of what I want to be displayed in China (ZH):
首页
帮助
But the following is displayed:
首页
帮助
This occurs all tested browsers - Google Chrome, IE 7/8, and Firefox 4. The font definitely has a Chinese branch of glyphs. Also, I do have Eastern languages installed.
--
I have tried pasting in the original value into the HTML, which did work (but note this is a hard coded value).
首页
首页
However, this is odd.
首页 --(in hex)--> E9 A6 96 E9 A1 --(as chars)--> 首页
Any ideas what I am missing?
In order to read the UTF-8 file, you'll probably need to use the ADODB.Stream object. I don't claim to be an expert on character encoding, but this test worked for me:
test.txt (saved as UTF-8 without BOM):
首页
帮助
test.vbs
Option Explicit
Const adTypeText = 2
Const adReadLine = -2
Dim stream : Set stream = CreateObject("ADODB.Stream")
stream.Open
stream.Type = adTypeText
stream.Charset = "UTF-8"
stream.LoadFromFile "test.txt"
Do Until stream.EOS
WScript.Echo stream.ReadText(adReadLine)
Loop
stream.Close
Whatever part of the process is reading the translations.txt file does not seem to understand that the file is in UTF-8. It looks like it is reading it in as some other encoding. You should specify encoding in whatever process is opening and reading that file. This will be different from the encoding of your web page.
Inserting the byte order mark at the beginning of that file may also be a solution.
Scripting.OpenTextFile does not understand UTF-8 at all. It can only read the current OEM encoding or Unicode. As you can see from the number of bytes being used for some character sets UTF-8 is quite inefficient. I would recommend Unicode for this sort of data.
You should save the file as Unicode (in Windows parlance) and then open with:
Dim stream : Set stream = Scripting.OpenTextFile(yourFilePath, 1, false, -1)
Just use the script below at the top of your page
Response.CodePage=65001
Response.CharSet="UTF-8"

How to get the correct Content-Length for a POST request

I am using a perl script to POST to Google Appengine application. I post a text file containing some XML using the -F option.
http://www.cpan.org/authors/id/E/EL/ELIJAH/bget-1.1
There is a version 1.2, already tested and get the same issue. The post looks something like this.
Host: foo.appspot.com
User-Agent: lwp-request/1.38
Content-Type: text/plain
Content-Length: 202
<XML>
<BLAH>Hello World</BLAH>
</XML>
I have modified the example so the 202 isn't right, don't worry about that. On to the problem. The Content-Length matches the number of bytes on the file, however unless I manually increase the Content-Length it does not send all of the file, a few bytes get truncated. The number of bytes truncated is not the same for files of different sizes. I used the -r option on the script and I can see what it is sending and it is sending all of the file, but Google Appengine self.request.body shows that not everything is received. I think the solution is to get the right number for Content-Length and apparently it isn't as simple as number of bytes on the file or the perl script is mangling it somehow.
Update:
Thanks to Erickson for the right answer. I used printf to append characters to the end of the file and it always truncated exactly the number of lines in the file. I suppose I could figure out what is being added by iterating through every character on the server side but not worth it. This wasn't even answered over on the google groups set up for app engine!
Is the number of extra bytes you need equal to the number of lines in the file? I ask because perhaps its possible that somehow carriage-returns are being introduced but not counted.
I've run into similar problems before.
I assume you're using the length() function to determine the size of the file? If so, it;s likely the data that you're posting is UTF-8 encoded, instead of ASCII.
To get the correct count you may need to add a "use bytes;" pragma to the top of your script, or wrap the length call in a block:
my $size;
do {use bytes; $size = length($file_data)}
From the perlfunc man page:
"Note the characters: if the EXPR is in Unicode, you will get the number of characters, not the number of bytes."
How are you getting the number of bytes? .. By looking at the size of the file on the filesystem?
You can use "-s" to get the size of the file.
Or, if you want to do more, you may use File::Stat