MalformedInputException: Input length = 1 while reading text file with Files.readAllLines(Path.get("file").get(0); - encoding

Why am I getting this error? I'm trying to extract information from a bank statement PDF and tally different bills for the month. I write the data from a PDF to a text file so I can get specific data from the file (e.g. ASPEN HOME IMPRO, then iterate down to what the dollar amount is, then read that text line to a string)
When the Files.readAllLines(Path.get("bankData").get(0) code is run, I get the error. Any thoughts why? Encoding issue?
Here is the code:
public static void main(String[] args) throws IOException {
File file = new File("C:\\Users\\wmsai\\Desktop\\BankStatement.pdf");
PDFTextStripper stripper = new PDFTextStripper();
BufferedWriter bw = new BufferedWriter(new FileWriter("bankData"));
BufferedReader br = new BufferedReader(new FileReader("bankData"));
String pdfText = stripper.getText(Loader.loadPDF(file)).toUpperCase();
bw.write(pdfText);
bw.flush();
bw.close();
LineNumberReader lineNum = new LineNumberReader(new FileReader("bankData"));
String aspenHomeImpro = "PAYMENT: ACH: ASPEN HOME IMPRO";
String line;
while ((line = lineNum.readLine()) != null) {
if (line.contains(aspenHomeImpro)) {
int lineNumber = lineNum.getLineNumber();
int newLineNumber = lineNumber + 4;
String aspenData = Files.readAllLines(Paths.get("bankData")).get(0); //This is the code with the error
System.out.println(newLineNumber);
break;
} else if (!line.contains(aspenHomeImpro)) {
continue;
}
}
}

So I figured it out. I had to check the properties of the text file in question (I'm using Eclipse) to figure out what the actual encoding of the text file was.
Then, when creating the file in the program, encode the text file to UTF-8 so that Files.readAllLines could read and grab the data I wanted to get.

Related

Solution to upload image file via WCF service?

Being surfing for last 3-4 days downloading, running and fixing issues with available demo projects online, none of them work so far.
I need to upload an image using WCF webservice. Where from client side end I like to upload it by means of form (multipart/form-data), including some file description.
Any solution working with proper answer? My mind is really stacked overflow trying different solution. One which I initially have I am able to upload a text file where file gets created with some extra content in it. I need to upload image file.
------------cH2ae0GI3KM7GI3Ij5ae0ei4Ij5Ij5
Content-Disposition: form-data; name=\"Filename\"
testing file gets upload...
When I upload image file, the image file is empty.
Initial Code (one implantation), method by means of which I get the .txt file as above, in case of image its blank (or say corrupt don't know)
private string uplaodFile(Stream stream)
{
StreamReader sr = new StreamReader(stream);
int length = sr.ReadToEnd().Length;
byte[] buffer = new byte[length];
stream.Read(buffer, 0, length);
FileStream f = new FileStream(Path.Combine(HostingEnvironment.MapPath("~/Upload"), "test.png"), FileMode.OpenOrCreate);
f.Write(buffer, 0, buffer.Length);
f.Close();
stream.Close();
return "Recieved the image on server";
}
another;
public Stream FileUpload(string fileName, Stream stream)
{
string FilePath = Path.Combine(HostingEnvironment.MapPath("~/Upload"), fileName);
int length = 0;
using (FileStream writer = new FileStream(FilePath, FileMode.Create))
{
int readCount;
var buffer = new byte[8192];
while ((readCount = stream.Read(buffer, 0, buffer.Length)) != 0)
{
writer.Write(buffer, 0, readCount);
length += readCount;
}
}
return returnJson(new { resp_code = 302, resp_message = "occurred." });
}

iText7: Error at file pointer when merging two pdfs

We are in the last steps of evaluating iText7. We use iText 7.1.0 and html2pdf 2.0.0.
What we do: we send a json_encoded collection with pdf-data (which includes html for header, body and footer) to our Java app. There we iterate over the collection, create a byteArrayOutputStream for each pdf-data element and merge them together. We then send the results to a script which echoes it to e.g. a browser. Although the pdf is displayed correctly, we encounter errors while creating it:
com.itextpdf.io.IOException: Error at file pointer 226,416.
...
Caused by: com.itextpdf.io.IOException: xref subsection not found.
... 73 common frames omitted
If we create only one part of the collection, no error is thrown.
Iterate over collection and merge:
#RequestMapping(value = "/pdf", method = RequestMethod.POST, produces = MediaType.APPLICATION_PDF_VALUE)
public byte[] index(#RequestBody PDFDataModelCollection elements, Model model) throws IOException {
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PdfWriter writer = new PdfWriter(byteArrayOutputStream);
try (PdfDocument resultDoc = new PdfDocument(writer)) {
for (PDFDataModel pdfDataModel : elements.getElements()) {
PdfReader reader = new PdfReader(new ByteArrayInputStream(creationService.createDatasheet(pdfDataModel)));
try (PdfDocument sourceDoc = new PdfDocument(reader)) {
int n = sourceDoc.getNumberOfPages(); //<-- IOException on second iteration
for (int i = 1; i <= n; i++) {
PdfPage page = sourceDoc.getPage(i).copyTo(resultDoc);
resultDoc.addPage(page);
}
}
}
}
return byteArrayOutputStream.toByteArray(); //outputs the final pdf
}
Creation of part:
public byte[] createDatasheet(PDFDataModel pdfDataModel) throws IOException {
PdfWriter writer = new PdfWriter(byteArrayOutputStream);
//Initialize PDF document
PdfDocument pdfDoc = new PdfDocument(writer);
try (
Document document = new Document(pdfDoc)
) {
//header, footer, etc
//body
for (IElement element : HtmlConverter.convertToElements(pdfDataModel.getBody(), this.props)) {
document.add((IBlockElement) element);
}
footer.writeTotalNumberOnPages(pdfDoc);
}
return byteArrayOutputStream.toByteArray();
}
We are grateful for any suggestion.
In createDatasheet you appear to re-use some byteArrayOutputStream without clearing it first.
In the first iteration, therefore, everything works as desired, at the end of createDatasheet you have a single PDF file in it.
In the second iteration, though, you have two PDF files in that byteArrayOutputStream, one after the other. This concatenation does not form a valid single PDF.
Thus, byteArrayOutputStream.toByteArray() returns something broken.
To fix this, either make the byteArrayOutputStream local to createDatasheet and create a new instance every time or alternatively reset byteArrayOutputStream at the start of createDatasheet:
public byte[] createDatasheet(PDFDataModel pdfDataModel) throws IOException {
byteArrayOutputStream.reset();
PdfWriter writer = new PdfWriter(byteArrayOutputStream);
[...]

iText not returning text contents of a PDF after first page

I am trying to use the iText library with c# to capture the text portion of pdf files.
I created a pdf from excel 2013 (exported) and then copied the sample from the web of how to use itext (added the lib ref to the project).
It reads perfectly the first page but it gets garbled info after that. It is keeping part of the first page and merging the info with the next page. The commented lines is when I was trying to solve the problem, the string "thePage" is recreated inside the for loop.
Here is the code. I can email the pdf to whoever can help with this issue.
Thanks in advance
public static string ExtractTextFromPdf(string path)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();
using (PdfReader reader = new PdfReader(path))
{
StringBuilder text = new StringBuilder();
//string[] theLines;
//theLines = new string[COLUMNS];
//string thePage;
for (int i = 1; i <= reader.NumberOfPages; i++)
{
string thePage = "";
thePage = PdfTextExtractor.GetTextFromPage(reader, i, its);
string [] theLines = thePage.Split('\n');
foreach (var theLine in theLines)
{
text.AppendLine(theLine);
}
// text.AppendLine(" ");
// Array.Clear(theLines, 0, theLines.Length);
// thePage = "";
}
return text.ToString();
}
}
A strategy object collects text data and does not know if a new page has started or not.
Thus, use a new strategy object for each page.

Creating a PDF back page using Itext

I am currently using iText (Java) to create a PDF document.
This PDf document is meant to be a Businesscard to be printed.
This is part of my code:
public void write() throws DocumentException, FileNotFoundException, IOException {
String extension = ".pdf";
String file = "testerPDF";
String filename = file + extension;
Document doku = new Document(PageSize.A4, 10,20,50,150);
PdfWriter writer = PdfWriter.getInstance(doku, new FileOutputStream(new File(filename)));
doku.open();
Phrase textblock = new Phrase();
// jetzt durch den Vector iterarieren
for(int i = 0; i < vector.size(); i++) {
Chunk chunk=new Chunk((String)vector.get(i));
textblock.add(chunk);
}
doku.add(new Paragraph(textblock));
doku.close();
writer.close();
}
So I need now a way to also write a Back Page for this created PDF. In such a way that the document would look like a Business card on Paper.

How to update a text file which always contains a single line?

I have the following code to read a line from a text file.
In the UpdateFile() method I need to delete the existing one line and update it with a new line.
Can anybody please provide any ideas?
Thank you.
FileInfo JFile = new FileInfo(#"C:\test.txt");
using (FileStream JStream = JFile.Open(FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None))
{
int n = GetNUmber(JStream);
n = n + 1;
UpdateFile(JStream);
}
private int GetNUmber(FileStream jstream)
{
StreamReader sr = new StreamReader(jstream);
string line = sr.ReadToEnd().Trim();
int result;
if (string.IsNullOrEmpty(line))
{
return 0;
}
else
{
int.TryParse(line, out result);
return result;
}
}
private int UpdateFile(FileStream jstream)
{
jstream.Seek(0, SeekOrigin.Begin);
StreamWriter writer = new StreamWriter(jstream);
writer.WriteLine(n);
}
I think the below code can do your job
StreamWriter writer = new StreamWriter("file path", false); //false means do not append
writer.Write("your new line");
writer.Close();
If you're just writing a single line, there's no need for streams or buffers or any of that. Just write it directly.
using System.IO;
File.WriteAllText(#"C:\test.txt", "hello world");
var line = File.ReadLines(#"c:\temp\hello.txt").ToList()[0];
var number = Convert.ToInt32(line);
number++;
File.WriteAllText(#"c:\temp\hello.txt", number.ToString());
Manage the possible exceptions, file exists, file has lines, the cast......