php remove arabic/farsi string from string - persian

I got content from other website(persian language) by "simple dom html" and stored content of div to variable , here it is my code:
$html = file_get_html('./test.html');
$tmp = $html->find('a div.min_price_space')->plaintext;
so my first question is how can i detect encode of characters related to this string?
for detecting of encode char i used below code which is not working
echo mb_detect_encoding($tmp);
i put sample of string in my language(persian) here : "کمترین قیمت رزرو شبی ۲۲۸,۰۰۰ تومان" .i want to remove "تومان" from this string and i used below code:
$result = str_replace('تومان','',$tmp);
after i execute my php file in IE show just "?" instead of my string and if i add this code to my php file "header('Content-Type: text/html; charset=utf-8');" display my string with right characters but without remove determined string from it.
Do you have any idea to fix this?

i find my problem.
In visual studio 2013 go to : Tools > Options > Environment and checked option "Save documents as Unicode when data cannot be saved in codepage".
after that everything work perfect.

Related

Apache POI docx: HTML as an altChunk

Good morning
I would like to add HTML as an altChunk to a DOCX file using Apache POI. To do that I followed this stackoverflow answer
How to add an altChunk element to a XWPFDocument using Apache POI
Everything works perfectly except for a problem with special character of my language (italian).
My case is the follow: I have an external html file. To import that I use the following code
byte[] inputBytes = Files.readAllBytes(Paths.get("testo.html"));
String xhtml = new String(inputBytes, StandardCharsets.UTF_8);
Then I generate the docx using the code provided in the stackoverflow answer.
If I unzip the .docx under the "word" folder I have correctly the file "chunk1.html".
If I open it the special caracter are reported correctly, for example
L'attività in oggetto è:
but when I opened the document in Word I see this
L'attività in oggetto è:
Is there same Microsoft Configuration that I missed?
Do I need to specify the character set when I create the chunk?
Microsoft seems to take ANSI as the default character encoding for HTML chunks in Word. That's annoying as the whole other world takes Unicode (UTF-8) as the default now.
So we need to set charset for the HTML explicitly. In the template of the chunk's HTML do:
...
private MyXWPFHtmlDocument(PackagePart part, String id) throws Exception {
super(part);
this.html = "<!DOCTYPE html><html><head><meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\"><style></style><title>HTML import</title></head><body></body>";
this.id = id;
}
...
I would recommend this instead of using ANSI encoding for the HTML chunks.
I have edited this into my answer in How to add an altChunk element to a XWPFDocument using Apache POI too.

Unity 2019 - linebreak \n not working for UI text elements

I am having some difficulty getting linebreaks to work for my Unity UI elements. (Unity 2019.2.17f1 Personal)
What I'm doing is:
string twoLinesOfText = LanguagePack.getTextByID(ID);
result:
twoLinesOfText = "Text line 1\nText line 2"
Expected output:
Text line 1
Text line 2
Reality:
Text line 1\nText line 2
I have tried using "\n", "\\n" and "\r\n". None of these give the intended result.
I assign the text to the component using
UITextComponent.GetComponent<Text>().text = twoLinesOfText;
Can this direct assignment be a problem? Do i need to push my string through a toString() or parse it somehow for the \n to be recognised?
Workaround:
I have a workaround. By using an XML file for my LanguagePack, and inserting (enter) linebreaks in the base file, I feed the linebreaks into my Unity UI elements. Obviously this is not ideal.
Reading back the strings in Debug.Log does not show which linebreak code was ultimately used: it just breaks the string according to the (enter) linebreaks in the XML file.
You can't import it trought Language Package. What you should do is :
string line1 = LanguagePackage.getTextByID(ID1);
string line2 = LanguagePackage.getTextByID(ID2);
string twoLinesOfText = line1 + "\n" + line2;
UITextComponent.GetComponent<Text>().text = twoLinesOfText;
Run into this problem myself, a little investigation showed that what I thought was \n in the string had been converted to \\n so it showed in the text box as \n.
Converting it during debugging to just \n got me the multiline text I wanted.
Now to investigate where in my data chain it got converted :-)
Ok, investigation complete. A file was saved, on my PC from a program in Visual Basic using the File.WriteAllLines function, one of those lines had a couple of instances of \n. A look at that file in notepad shows it had correctly written that line. The problem came when I used File.ReadAllLines in my unity program as it converted those \n instances to \\n. As far as I can tell this is not a documented action, in fact it's possible, on reading the MS docs, to think that it would have split that line into multiple lines, which it doesn't do.
I checked in my VB program and File.ReadAllLines does not behave in this way there. It's probably something to do with the environment, VB does not use \n, C# does. I fixed the problem by tagging a replace onto the string e.g. string.Replace("\\n", "\n"). It's entirely possible that attempting to write a string from C# with File.WriteAllLines could also mess with \n.
Geez, this was hard to write as the Editor here messes with \\n and convert it to \n and I end up having to use \\\n
For people who encounter this issue. You Could try to use some HTML similar syntax and see whether it works or not.
Eg:
Using for newline instead of \n

Writing CR+LF into Open XML from a Database

I'm trying to take some data stored in a database and populate a Word template's Content Controls with it using the Open XML SDK. The data contains paragraphs and so there are carriage return and line feed characters in it. The data is stored in the database as nvarchar.
When I open the generated document, the CR+LF combination shows up as a question mark with a box around it (not sure the name of this character). This is actually two sequences back to back, so CR+LF CR+LF equals two strange characters:
If I unzip the .docx, take the Custom XML part and do a hex dump, I can clearly see 0d0a 0d0a so the CR+LF is there. Word is just printing it weird.
I've tried enforcing UTF-8 encoding in my XmlWriter's settings, but that didn't seem to help:
Dim docStream As New MemoryStream
Dim settings As XmlWriterSettings = New XmlWriterSettings()
settings.Encoding = New UTF8Encoding(False)
Dim docWriter As XmlWriter = XmlTextWriter.Create(docStream, settings)
Does anyone know how I can get Word to render these characters correctly when written to a .docx through the Open XML SDK?
To bind to a Word 2013 rich text control, your XML element has to contain a complete docx. See [MS-DOCX]:
the data stored in the XML element will be an escaped string comprised of a flattened WordprocessingML document representing the formatted data in the structured document tag range.
Earlier versions couldn't bind a rich text control.
Things should work though (with CR/LF, not w:br), if you bind to a plain text control, with multiline set to true.

convert html to text or formatted text from an iphone 4 notes backup sqlite file

I wanted to restore some of the lost notes that I obtained by using an iTunes backup (of an iphone 4) and opening up the notes.sqlite file. When I query the table that contains the notes text:
select zcontent from znotebody
I get the text that is in html format. How can I convert those entries to a more readable content? It doesnt have to be perfect, just enough to be able to read it. Here is an example of a note:
Meds fir odd<div>Trazadone</div><div>Effexor (& Cd)</div><div>Buspirone</div><div>Clonodine</div><div>Nortriptyline</div><div>Risperdal</div><div>Straterra </div>
Here is the actual note from above:
Meds fir odd
Trazadone
(Effexor & Cd)
Buspirone
Nortriptyline
Risperdal
Straterra<space here>
If you just want to retrieve the note text, I would try this
select "" + zcontent + ""
from znotebody
Then save to a file and open in browser
You have to look some NSString categories, to escape html tags in your text.
Follow this link will help to solve the issue.. Objective C HTML escape/unescape

Zend Framework Form not rendering special characters like (ä, ö, ü etc) - makes the form element value empty

I am trying to set the Zend Form working for me. I am using the same form for inserting and editing a particular database object. The object has name and I can easily create a new object with the name "Ülo". It saves correctly in database and when I fetch it to display in a report then it shows correclty "Ülo". The problem is with forms. When I open the edit form then the name element is empty. All other elements are showing correctly and if I change them to have "ü" in them they are displayed empty too. The same thing is with Form element labels. When I set a label to contain "ü" it does not show the label any more.
For example if I have $name->setLabel('Nameü: '); then it does not show the label but when I change it back to $name->setLabel('Name: '); then it shows correclty.
Same thing when I have $bcrForm->name->setValue('Ülo'); it does not show the value but when I change it to $bcrForm->name->setValue('Alo'); it shows correctly.
How can I fix it to display correctly? It seems like it is some kind of form rendering issue.
This one helped solving it for me:
make sure these settings are in /etc/php5/apache2/php.ini and /etc/php5/cli/php.ini:
default_charset = utf-8
Did you checked encoding? Try adding this to head...
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
make sure that both your script and view files are encoded on UTF-8 and your connection to your DB is set for UTF-8 too.
If you are using mysql you force it to return UTF-8 data by openning a DB connection and running : SET NAMES 'utf8'
or using mysqli with : mysqli_set_charset('utf8');
I would check:
view charset
database charset (backend)
Zend_Db_Adapter charset
file charset
The view escape method is set to expect utf8 chars and may strip anything else (ie. singlebyte strange chars) :)
It should be as simple as setting escape flag to false on the elements label decorator.
$name->addDecorator('Label', аrray('escape'=>false));
Or see setEscape(). http://framework.zend.com/manual/1.12/en/zend.form.standardDecorators.html