Read ansi file (persian language) file from internet and show that in textbox - encoding

I create a text file with ANSI format and write persian word in that, now I read that with this code:
System.Net.WebClient wc = new System.Net.WebClient();
string textBoxNewsRight2Left = wc.DownloadString("http://dl.rosesoftware.ir/RoseSoftware%20List/Settings/News.Settings.txt");
MessageBox.Show(textBoxNewsRight2Left);
but I see ???????????????????? character!
I change file format to UTF8 and my problem fix with persian word, but again I find another problem this time with english word!
once again I use from this codes with UTF8 format file:
System.Net.WebClient wc = new System.Net.WebClient();
string textBoxNewsRight2Left = wc.DownloadString("http://dl.rosesoftware.ir/RoseSoftware%20List/Settings/News.Settings.txt");
MessageBox.Show(textBoxNewsRight2Left);
Now I see: "ÿþr" and I don't see my file words!
Now I use:
wc.Encoding = Encoding.Unicode;
Now I see my file words! without "ÿþr".
I don't think I have in original file hello and after read by C# I have hello
now I test equaling textBoxNewsRight2Left and hello
if (textBoxNewsRight2Left == "hello")
{
MessageBox.Show("Equal");
}
else
{
MessageBox.Show("Not Equal");
}
Now I see message Not Equal, but I see hello!
what is the problem?
I how can fix this problem?

Related

Norwegian Characters as '?' in ics c#

I have been trying to send calendar event ics in a mail as attachment but the summary and description is showing norwegian character like 'ø' as '?'.
Please help me as I am new to the calendar events in ASP.Net MVC.
System.Text.StringBuilder str = new StringBuilder();
str.AppendLine("BEGIN:VCALENDAR");
str.AppendLine("PRODID:-//Schedule a Meeting");
str.AppendLine("VERSION:2.0");
str.AppendLine("METHOD:PUBLISH");
str.AppendLine("BEGIN:VEVENT");
str.AppendLine(string.Format("DTSTART:{0:yyyyMMddTHHmmssZ}",model.Startdate));
str.AppendLine(string.Format("DTSTAMP:{0:yyyyMMddTHHmmssZ}", DateTime.UtcNow));
str.AppendLine(string.Format("DTEND:{0:yyyyMMddTHHmmssZ}", model.EndDate));
str.AppendLine("LOCATION: " + model.Location);
str.AppendLine(string.Format("UID:{0}", Guid.NewGuid()));
str.AppendLine(string.Format("DESCRIPTION:{0}", model.desc));
str.AppendLine(string.Format("SUMMARY:{0}", model.Name));
str.AppendLine(string.Format("ORGANIZER:MAILTO:{0}", model.Email));
str.AppendLine("BEGIN:VALARM");
str.AppendLine("TRIGGER:-PT15M");
str.AppendLine("ACTION:DISPLAY");
str.AppendLine("DESCRIPTION:Reminder");
str.AppendLine("END:VALARM");
str.AppendLine("END:VEVENT");
str.AppendLine("END:VCALENDAR");
byte[] byteArray = Encoding.ASCII.GetBytes(str.ToString());
MemoryStream stream = new MemoryStream(byteArray);
Attachment attach = new Attachment(stream, "Invitation.ics");`
The problem here is that you're loosing the special characters when using the ASCII Encoding. Use some other Encoding, e.g. UTF8, which is a variable multi-byte encoding that can cover all characters.
The attached link shows how to specify the used encoding in the ics file:
https://theeventscalendar.com/support/forums/topic/ical-text-encoding/

How to create virtual XML for ZUGFeRD Invoices

I try to create a PDF/A-3b file which contains an embedded XML-File to be ZUGFeRD conform. I use Perl and PDFLib for this purpose. The PDFLib Documentation out there is just for Java and PHP. Creating the PDF works fine, but the XML part is my problem.
So how can i create a pvf from xml and join this to my pdf?
This is what PDFLib recommends in Java:
// Place XML stream in a virtual PVF file
String pvf_name = "/pvf/ZUGFeRD-invoice.xml";
byte[] xml_bytes = xml_string.getBytes("UTF-8");
p.create_pvf(pvf_name, xml_bytes, "");
// Create file attachment (asset) from PVF file
int xml_asset = p.load_asset("Attachment", pvf_name,
"mimetype=text/xml description={ZUGFeRD invoice in XML format} "
+ "relationship=Alternative documentattachment=true");
// Associate file attachment with the document
p.end_document("associatedfiles={" + xml_asset + "}");
So I thought, take the example and fit it to perl:
my $xmldata = read_file($xmlfile, binmode => ':utf8'); #I use example xml at the moment
my $pvf_xml = "/pvf/ZUGFeRD-invoice.xml";
PDF_create_pvf($pdf, $pvf_xml, $xmldata, ""); #because no OOP i need to call it this way (works with all other PDF Functions)
my $xml_invoice = PDF_load_asset("Attachment", $pvf_xml, "mimetype=text/xml "
."description={Rechnungsdaten im Zugferd-Xml-Format} "
."relationship=Alternative documentattachment=true");
PDF_end_document($pdf, "associatedfiles={".$xml_invoice."}");
In PHP examples it's also not needed to convert to ByteArray after reading xml. Further tried it with unpack but don't seem to be the problem.
If I call my script I'm just getting:
Usage: load_asset(type, filename, optlist); at signatur_test.pl line
41.
I think the problem is that pvf_xml isn't created the line before.
Anyone did this before and no how to solve this?
Arg, i was just missing the PDF-Handle in the load_asset method:
my $xml_invoice = PDF_load_asset($pdf, "Attachment", $pvf_xml, "mimetype=text/xml "
."description={Rechnungsdaten im Zugferd-Xml-Format} "
."relationship=Alternative documentattachment=true");
This way it works.

How do I process Russian text in Eclipse?

I need to sort some Russian text file and when I try to read the strings and print them out, they all appear garbled and like boxes. Looks like there is no Russian support for my eclipse. I downloaded Language packs plug in but I can't figure out how to install it.
Help required please.
FileInputStream fstream = new FileInputStream("c:\\textfile.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
ArrayList<String> allLines = new ArrayList<String>();
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
allLines.add(strLine);
System.out.println(strLine);
}
How can you be sure it's an eclipse problem? It could be:
The encoding of the text file
The method you used to read the text file (eg: InputStreamReader uses default charset unless you explicitly specify it on the constructor)
The method you used to store the text file in memory
The method you used to print the text
The method you used to view the printed text

Sed and awk application

I've read a little about sed and awk, and understand that both are text manipulators.
I plan to use one of these to edit groups of files (code in some programming language, js, python etc.) to make similar changes to large sets of files.
Primarily editing function definitions (parameters passed) and variable names for now, but the more I can do the better.
I'd like to know if someone's attempted something similar, and those who have, are there any obvious pitfalls that one should look out for? And which of sed and awk would be preferable/more suitable for such an application. (Or maybe something entirely else? )
Input
function(paramOne){
//Some code here
var variableOne = new ObjectType;
array[1] = "Some String";
instanceObj = new Something.something;
}
Output
function(ParamterOne){
//Some code here
var PartOfSomething.variableOne = new ObjectType;
sArray[1] = "Some String";
var instanceObj = new Something.something
}
Here's a GNU awk (for "gensub()" function) script that will transform your sample input file into your desired output file:
$ cat tst.awk
BEGIN{ sym = "[[:alnum:]_]+" }
{
$0 = gensub("^(" sym ")[(](" sym ")[)](.*)","\\1(ParameterOne)\\3","")
$0 = gensub("^(var )(" sym ")(.*)","\\1PartOfSomething.\\2\\3","")
$0 = gensub("^a(rray.*)","sA\\1","")
$0 = gensub("^(" sym " =.*)","var \\1","")
print
}
$ cat file
function(paramOne){
//Some code here
var variableOne = new ObjectType;
array[1] = "Some String";
instanceObj = new Something.something;
}
$ gawk -f tst.awk file
function(ParameterOne){
//Some code here
var PartOfSomething.variableOne = new ObjectType;
sArray[1] = "Some String";
var instanceObj = new Something.something;
}
BUT think about how your real input could vary from that - you could have more/less/different spacing between symbols. You could have assignments starting on one line and finishing on the next. You could have comments that contain similar-looking lines to the code that you don't want changed. You could have multiple statements on one line. etc., etc.
You can address every issue one at a time but it could take you a lot longer than just updating your files and chances are you still will not be able to get it completely right.
If your code is EXCEEDINGLY well structured and RIGOROUSLY follows a specific, highly restrictive coding format then you might be able to do what you want with a scripting language but your best bets are either:
change the files by hand if there's less than, say, 10,000 of them or
get a hold of a parser (e.g. the compiler) for the language your files are written in and modify that to spit out your updated code.
As soon as it starts to get slightly more complicated you will switch to a script language anyway. So why not start with python in the first place?
Walking directories:
walking along and processing files in directory in python
Replacing text in a file:
replacing text in a file with Python
Python regex howto:
http://docs.python.org/dev/howto/regex.html
I also recommend to install Eclipse + PyDev as this will make debugging a lot easier.
Here is an example of a simple automatic replacer
import os;
import sys;
import re;
import itertools;
folder = r"C:\Workspaces\Test\";
skip_extensions = ['.gif', '.png', '.jpg', '.mp4', ''];
substitutions = [("Test.Alpha.", "test.alpha."),
("Test.Beta.", "test.beta."),
("Test.Gamma.", "test.gamma.")];
for root, dirs, files in os.walk(folder):
for name in files:
(base, ext) = os.path.splitext(name);
file_path = os.path.join(root, name);
if ext in skip_extensions:
print "skipping", file_path;
else:
print "processing", file_path;
with open(file_path) as f:
s = f.read();
before = [[s[found.start()-5:found.end()+5] for found in re.finditer(old, s)] for old, new in substitutions];
for old, new in substitutions:
s = s.replace(old, new);
after = [[s[found.start()-5:found.end()+5] for found in re.finditer(new, s)] for old, new in substitutions];
for b, a in zip(itertools.chain(*before), itertools.chain(*after)):
print b, "-->", a;
with open(file_path, "w") as f:
f.write(s);

How to set receivedDataEncoding for big5 chinese?

I have a trouble in received data with chinese-big5 encoded web-page,
and I tried to get some sample code but can not find I need for big5 like below:
if ([encodingName isEqualToString:#"euc-jp"]) {
receivedDataEncoding = NSJapaneseEUCSStringEncoding;
} else {
receivedDataEncoding = NSUTF8StringEncoding};
How to replace the part of "NSJapaneseEUCSStringEncoding" for big5 chinese encoding?
Thanks for answer first.
You can use the kCFStringEncodingBig5_E constant which is available in
CoreFoundation/CFStringEncodingExt.h