How to split data in a text file in C# - c#-3.0

.This is how my data looks: name: abcsurname: abctel:1234 and I want it to look like:name: abc and surname: abc should go in the next line.
public void SeparateData()
{
//read file
StreamReader sr = new StreamReader("myTextFile.txt");
//string to hold line
string myline;
myline = sr.ReadLine();
while ((myline = sr.ReadLine()) != null)
{
string[] lines = Regex.Split(myline, " ");
foreach (string s in lines)
{
using (StreamWriter sw = new StreamWriter("myTextFile.txt"))
sw.WriteLine(lines);
}
}
}

Well, you don't need regex to do a string split on a space, but I don't that is going to get you want you want either way. I am assuming that "abc" stands in for actual values of varying length?
I think you just need to pull out your regex book and split up the string and rewrite it.
e.g. '(name: )(\w*?)(surname: )(\w*?)(tel: )(\d*)' and then just re-assemble and re-write the line using the capturing groups.
I think what you have will give you:
name:
abcsurname:
abctel:
1234

Related

MalformedInputException: Input length = 1 while reading text file with Files.readAllLines(Path.get("file").get(0);

Why am I getting this error? I'm trying to extract information from a bank statement PDF and tally different bills for the month. I write the data from a PDF to a text file so I can get specific data from the file (e.g. ASPEN HOME IMPRO, then iterate down to what the dollar amount is, then read that text line to a string)
When the Files.readAllLines(Path.get("bankData").get(0) code is run, I get the error. Any thoughts why? Encoding issue?
Here is the code:
public static void main(String[] args) throws IOException {
File file = new File("C:\\Users\\wmsai\\Desktop\\BankStatement.pdf");
PDFTextStripper stripper = new PDFTextStripper();
BufferedWriter bw = new BufferedWriter(new FileWriter("bankData"));
BufferedReader br = new BufferedReader(new FileReader("bankData"));
String pdfText = stripper.getText(Loader.loadPDF(file)).toUpperCase();
bw.write(pdfText);
bw.flush();
bw.close();
LineNumberReader lineNum = new LineNumberReader(new FileReader("bankData"));
String aspenHomeImpro = "PAYMENT: ACH: ASPEN HOME IMPRO";
String line;
while ((line = lineNum.readLine()) != null) {
if (line.contains(aspenHomeImpro)) {
int lineNumber = lineNum.getLineNumber();
int newLineNumber = lineNumber + 4;
String aspenData = Files.readAllLines(Paths.get("bankData")).get(0); //This is the code with the error
System.out.println(newLineNumber);
break;
} else if (!line.contains(aspenHomeImpro)) {
continue;
}
}
}
So I figured it out. I had to check the properties of the text file in question (I'm using Eclipse) to figure out what the actual encoding of the text file was.
Then, when creating the file in the program, encode the text file to UTF-8 so that Files.readAllLines could read and grab the data I wanted to get.

Insert multiple lines of text into a Rich Text content control with OpenXML

I'm having difficulty getting a content control to follow multi-line formatting. It seems to interpret everything I'm giving it literally. I am new to OpenXML and I feel like I must be missing something simple.
I am converting my multi-line string using this function.
private static void parseTextForOpenXML(Run run, string text)
{
string[] newLineArray = { Environment.NewLine, "<br/>", "<br />", "\r\n" };
string[] textArray = text.Split(newLineArray, StringSplitOptions.None);
bool first = true;
foreach (string line in textArray)
{
if (!first)
{
run.Append(new Break());
}
first = false;
Text txt = new Text { Text = line };
run.Append(txt);
}
}
I insert it into the control with this
public static WordprocessingDocument InsertText(this WordprocessingDocument doc, string contentControlTag, string text)
{
SdtElement element = doc.MainDocumentPart.Document.Body.Descendants<SdtElement>().FirstOrDefault(sdt => sdt.SdtProperties.GetFirstChild<Tag>().Val == contentControlTag);
if (element == null)
throw new ArgumentException("ContentControlTag " + contentControlTag + " doesn't exist.");
element.Descendants<Text>().First().Text = text;
element.Descendants<Text>().Skip(1).ToList().ForEach(t => t.Remove());
return doc;
}
I call it with something like...
doc.InsertText("Primary", primaryRun.InnerText);
Although I've tried InnerXML and OuterXML as well. The results look something like
Example AttnExample CompanyExample AddressNew York, NY 12345 or
<w:r xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:t>Example Attn</w:t><w:br /><w:t>Example Company</w:t><w:br /><w:t>Example Address</w:t><w:br /><w:t>New York, NY 12345</w:t></w:r>
The method works fine for simple text insertion. It's just when I need it to interpret the XML that it doesn't work for me.
I feel like I must be super close to getting what I need, but my fiddling is getting me nowhere. Any thoughts? Thank you.
I believe the way I was trying to do it was doomed to fail. Setting the Text attribute of an element is always going to be interpreted as text to be displayed it seems. I ended up having to take a slightly different tack. I created a new insert method.
public static WordprocessingDocument InsertText(this WordprocessingDocument doc, string contentControlTag, Paragraph paragraph)
{
SdtElement element = doc.MainDocumentPart.Document.Body.Descendants<SdtElement>().FirstOrDefault(sdt => sdt.SdtProperties.GetFirstChild<Tag>().Val == contentControlTag);
if (element == null)
throw new ArgumentException("ContentControlTag " + contentControlTag + " doesn't exist.");
OpenXmlElement cc = element.Descendants<Text>().First().Parent;
cc.RemoveAllChildren();
cc.Append(paragraph);
return doc;
}
It starts the same, and gets the Content Control by searching for it's Tag. But then I get it's parent, remove the Content Control elements that were there and just replace them with a paragraph element.
It's not exactly what I had envisioned, but it seems to work for my needs.

Java convert Unicode to original language

I have array such this
String[] array ={"062C06450644","062C06280644"}
I want to translate this unicode to the original language using this code:
StringBuilder stringBuilder = new StringBuilder();
for(String arr:array)
{
for(int i=0;i<arr.length();i+=4)
{
String s = arr.substring(i,i+4);
stringBuilder.append((char) Integer.parseInt(s,16));
}
System.out.println("sb:"+stringBuilder);
}
The result:
sb: جمل
sb: جملجبل
The problem:
this code append the 2nd word to 1st and so on.
The question:
How i translate each word without append each word to the word before.
I want the result to be such
sb:جمل
sb:جبل
You need a clean StringBuilder for each word. Just swap the first two rows:
for(String arr:array) {
StringBuilder stringBuilder = new StringBuilder();
for(int i=0;i<arr.length();i+=4) {
String s = arr.substring(i,i+4);
stringBuilder.append((char) Integer.parseInt(s,16));
}
System.out.println("sb:"+stringBuilder);
}
Alternatively, you can clear the StringBuilderat the end of the outer loop with stringBuilder.setLength(0).

' ', hexadecimal value 0x1F, is an invalid character. Line 1, position 1

I am trying to read a xml file from the web and parse it out using XDocument. It normally works fine but sometimes it gives me this error for day:
**' ', hexadecimal value 0x1F, is an invalid character. Line 1, position 1**
I have tried some solutions from Google but they aren't working for VS 2010 Express Windows Phone 7.
There is a solution which replace the 0x1F character to string.empty but my code return a stream which doesn't have replace method.
s = s.Replace(Convert.ToString((byte)0x1F), string.Empty);
Here is my code:
void webClient_OpenReadCompleted(object sender, OpenReadCompletedEventArgs e)
{
using (var reader = new StreamReader(e.Result))
{
int[] counter = { 1 };
string s = reader.ReadToEnd();
Stream str = e.Result;
// s = s.Replace(Convert.ToString((byte)0x1F), string.Empty);
// byte[] str = Convert.FromBase64String(s);
// Stream memStream = new MemoryStream(str);
str.Position = 0;
XDocument xdoc = XDocument.Load(str);
var data = from query in xdoc.Descendants("user")
select new mobion
{
index = counter[0]++,
avlink = (string)query.Element("user_info").Element("avlink"),
nickname = (string)query.Element("user_info").Element("nickname"),
track = (string)query.Element("track"),
artist = (string)query.Element("artist"),
};
listBox.ItemsSource = data;
}
}
XML file:
http://music.mobion.vn/api/v1/music/userstop?devid=
0x1f is a Windows control character. It is not valid XML. Your best bet is to replace it.
Instead of using reader.ReadToEnd() (which by the way - for a large file - can use up a lot of memory.. though you can definitely use it) why not try something like:
string input;
while ((input = sr.ReadLine()) != null)
{
string = string + input.Replace((char)(0x1F), ' ');
}
you can re-convert into a stream if you'd like, to then use as you please.
byte[] byteArray = Encoding.ASCII.GetBytes( input );
MemoryStream stream = new MemoryStream( byteArray );
Or else you could keep doing readToEnd() and then clean that string of illegal characters, and convert back to a stream.
Here's a good resource for cleaning illegal characters in your xml - chances are, youll have others as well...
https://seattlesoftware.wordpress.com/tag/hexadecimal-value-0x-is-an-invalid-character/
What could be happening is that the content is compressed in which case you need to decompress it.
With HttpHandler you can do this the following way:
var client = new HttpClient(new HttpClientHandler
{
AutomaticDecompression = DecompressionMethods.GZip
| DecompressionMethods.Deflate
});
With the "old" WebClient you have to derive your own class to achieve the similar effect:
class MyWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
return request;
}
}
Above taken from here
To use the two you would do something like this:
HttpClient
using (var client = new HttpClient(new HttpClientHandler { AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate }))
{
using (var stream = client.GetStreamAsync(url))
{
using (var sr = new StreamReader(stream.Result))
{
using (var reader = XmlReader.Create(sr))
{
var feed = System.ServiceModel.Syndication.SyndicationFeed.Load(reader);
foreach (var item in feed.Items)
{
Console.WriteLine(item.Title.Text);
}
}
}
}
}
WebClient
using (var stream = new MyWebClient().OpenRead("http://myrss.url"))
{
using (var sr = new StreamReader(stream))
{
using (var reader = XmlReader.Create(sr))
{
var feed = System.ServiceModel.Syndication.SyndicationFeed.Load(reader);
foreach (var item in feed.Items)
{
Console.WriteLine(item.Title.Text);
}
}
}
}
This way you also recieve the benefit of not having to .ReadToEnd() since you are working with the stream instead.
Consider using System.Web.HttpUtility.HtmlDecode if you're decoding content read from the web.
If you are having issues replacing the character
For me there were some issues if you try to replace using the string instead of the char. I suggest trying some testing values using both to see what they turn up. Also how you reference it has some effect.
var a = x.IndexOf('\u001f'); // 513
var b = x.IndexOf(Convert.ToString((byte)0x1F)); // -1
x = x.Replace(Convert.ToChar((byte)0x1F), ' '); // Works
x = x.Replace(Convert.ToString((byte)0x1F), " "); // Fails
I blagged this
I had the same issue and found that the problem was a  embedded in the xml.
The solution was:
s = s.Replace("", " ")
I'd guess it's probably an encoding issue but without seeing the XML I can't say for sure.
In terms of your plan to simply replace the character but not being able to, because you have a stream rather than a text, simply read the stream into a string and then remove the characters you don't want.
Works for me.........
string.Replace(Chr(31), "")
I used XmlSerializer to parse XML and faced the same exception.
The problem is that the XML string contains HTML codes of invalid characters
This method removes all invalid HTML codes from string (based on this thread - https://forums.asp.net/t/1483793.aspx?Need+a+method+that+removes+illegal+XML+characters+from+a+String):
public static string RemoveInvalidXmlSubstrs(string xmlStr)
{
string pattern = "&#((\\d+)|(x\\S+));";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
if (regex.IsMatch(xmlStr))
{
xmlStr = regex.Replace(xmlStr, new MatchEvaluator(m =>
{
string s = m.Value;
string unicodeNumStr = s.Substring(2, s.Length - 3);
int unicodeNum = unicodeNumStr.StartsWith("x") ?
Convert.ToInt32(unicodeNumStr.Substring(1), 16)
: Convert.ToInt32(unicodeNumStr);
//according to https://www.w3.org/TR/xml/#charsets
if ((unicodeNum == 0x9 || unicodeNum == 0xA || unicodeNum == 0xD) ||
((unicodeNum >= 0x20) && (unicodeNum <= 0xD7FF)) ||
((unicodeNum >= 0xE000) && (unicodeNum <= 0xFFFD)) ||
((unicodeNum >= 0x10000) && (unicodeNum <= 0x10FFFF)))
{
return s;
}
else
{
return String.Empty;
}
})
);
}
return xmlStr;
}
Nobody can answer if you don't show relevant info - I mean the Xml content.
As a general advice I would put a breakpoint after ReadToEnd() call. Now you can do a couple of things:
Reveal Xml content to this forum.
Test it using VS Xml visualizer.
Copy-paste the string into a txt file and investigate it offline.

how to check each elements of string array contains data or not in c#

i have created web application and using textbox and it can contains multiple line of data becoz i have set its textmode property is multiline.
my problem is that i want to check each line contain data or not so i using count variable which count how many line contain data.
string[] data;
int cntindex;
data = txt_invoicenumber.Text.ToString().Split("\n".ToCharArray());
cntindex = data.Length;
for (j = 0; j < cntindex; j++)
{
if (data[j]!="")
{
inv_count++;
}
}
Its not working.
Please help me.
I guess this is because new line is \r\n so there is a '\r' also on empty lines.
Change the if statement to:
if (data[j].Trim().Length != 0)
Firstly, You don't need to ToString() the .Text property as it is already a string.
try this
string[] lines = txt_invoicenumber.Text.Split(Environment.NewLine);
int lineCount = 0;
foreach(string line in lines)
{
if(!string.IsNullOrEmpty(line))
{
lineCount ++;
this.ProcessLine(line);
}
}
var lb = new String[] { "\r\n" };
var lines = txt_invoicenumber.Text.Split(lb, StringSplitOptions.None).Length;
This will count empty lines too. If you don't want to count empty lines, use the StringSplitOptions.RemoveEmptyEntries value.
Don't count 100% on "\r\n" if you have little control over your environment though.
This is the answer I came up with.
String[] lines = TextBox1.Text.Split(new Char[] { '\r', '\n' },
StringSplitOptions.RemoveEmptyEntries);
Int32 validLineCount = lines.Length;