ItextSharp with PowerShell Merging Tiff and PDF to 1 large PDF - powershell

I'm trying to write a powershell script that will loop through a csv file looking for Tiff & PDF files using ItextSharp dll. The desired end result is every image and page of a pdf needs to be in one large pdf.
My thoughts are to create two functions to accomplish this. 1 for images and the other for PDF's. The image function is working properly, but the pdf is throwing a error: Exception calling ".ctor" with "1" argument(s): " not found as file or resource."
Any thoughts on fixing add-pdf function?
Current script is below.
[System.Reflection.Assembly]::LoadFrom("C:\Temp\itextsharp`enter code here`\itextsharp.dll")
[System.Reflection.Assembly]::LoadWithPartialName("System.Drawing")
$doc = New-Object itextsharp.text.document
#output PDF with all combined tiff and pdfs
$stream = [IO.File]::OpenWrite("C:\temp\itext\test.pdf")
$writer = [itextsharp.text.pdf.PdfWriter]::GetInstance($doc, $stream)
#$pdfCopy =New-Object iTextSharp.text.pdf.PdfCopy($doc, $stream)
$doc.Open()
$doc.SetMargins(0, 0, 0, 0)
#get the size of image and change pdf
function add-picture( $file2use){
$pic = New-Object System.Drawing.Bitmap($file2use )
$rect = New-Object iTextSharp.text.Rectangle($pic.Width, $pic.Height)
## Set the next page size to those dimensions and add a new page
$doc.SetPageSize( $rect )
$doc.NewPage()
#add image jpg
$img = [iTextSharp.text.Image]::GetInstance($file2use )
$doc.Add($img);
$pic.dispose()
}
function add-pdf( $newPDF){
$pdf2Merge = [System.IO.Path]::Combine("",$newPDF)
$pdfCopy = New-Object iTextSharp.text.pdf.PdfCopy($doc, $stream);
$reader = New-Object iTextSharp.text.pdf.PdfReader($pdf2Merge);
$pageCount = $reader.NumberOfPages;
for ($i = 1; $i -lt $pageCount ; $i++) {
$pdfCopy.AddPage(
$pdfCopy.GetImportedPage($reader, $i ))
# ^^^^^
# your page number here
}
#$pdfCopy.FreeReader($reader);
}
add-picture -file2use "C:\Temp\itext\3-26-04 (1).JPG"
add-picture -file2use "C:\Temp\itext\CCITT_1.TIF"
add-picture -file2use "C:\Temp\itext\CCITT_2.TIF"
add-pdf -file2use "C:\Temp\itext\test2.pdf"
## Cleanup
#$doc.Close()
$stream.Close()

I'm not too good within PowerShell but it looks like you are so you should be able to adapt this C# code very easily. The code in this post is adapted from some code I wrote earlier here.
First off, I really don't recommend keeping global iText abstraction objects around and binding various things to them over and over, that's just looking for trouble.
Instead, for images I'd recommend a simple function that takes a supplied image file and returns a byte array representing that image added to a PDF. Instead of a byte array you could also write the PDF to a temporary file and return that path instead.
private static byte[] ImageToPdf(string imagePath) {
//Get the size of the current image
iTextSharp.text.Rectangle pageSize = null;
using (var srcImage = new Bitmap(imagePath)) {
pageSize = new iTextSharp.text.Rectangle(0, 0, srcImage.Width, srcImage.Height);
}
//Simple image to PDF
using (var m = new MemoryStream()) {
using (var d = new Document(pageSize, 0, 0, 0, 0)) {
using (var w = PdfWriter.GetInstance(d, m)) {
d.Open();
d.Add(iTextSharp.text.Image.GetInstance(imagePath));
d.Close();
}
}
//Grab the bytes before closing out the stream
return m.ToArray();
}
}
Then just create a new Document and bind a PdfSmartCopy object to it. You can then enumerate your files, if you have an image, convert it to a PDF first, then just use the PdfSmartCopy method AddDocument() to add that entire document to the final output.
The code below just loop through a single folder, grabs images first and then PDFs but you should be able to adapt it pretty easily, hopefully.
//Folder that contains our sample files
var sourceFolder = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "MergeTest");
//Final file that we're going to emit
var finalFile = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "test.pdf");
//Create our final file, standard iText setup here
using (var fs = new FileStream(finalFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
//Use a smart object copies to merge things
using (var copy = new PdfSmartCopy(doc, fs)) {
//Open the document for writing
doc.Open();
//Loop through each image in our test folder
foreach (var img in System.IO.Directory.EnumerateFiles(sourceFolder, "*.jpg")) {
//Convert the image to a byte array
var imageAsPdf = ImageToPdf(img);
//Bind a reader to that PDF
using( var r = new PdfReader(imageAsPdf) ){
//Add that entire document to our final PDF
copy.AddDocument(r);
}
}
//Loop through each PDF in our test folder
foreach (var pdf in System.IO.Directory.EnumerateFiles(sourceFolder, "*.pdf")) {
//Bind a reader to that PDF
using (var r = new PdfReader(pdf)) {
//Add that entire document to our final PDF
copy.AddDocument(r);
}
}
doc.Open();
}
}
}

Related

Word found unreadable content in xxx.docx after split a docx using openxml

I have a full.docx which includes two math questions, the docx embeds some pictures and MathType equation (oleobject), I split the doc according to this, get two files (first.docx, second.docx) , first.docx works fine, the second.docx, however, pops up a warning dialog when I try to open it:
"Word found unreadable content in second.docx. Do you want to recover the contents of this document? If you trust the source of this document, click Yes."
After click "Yes", the doc can be opened, the content is also correct, I want to know what is wrong with the second.docx? I have checked it with "Open xml sdk 2.5 productivity tool", but found no reason. Very appreciated for any help. Thanks.
The three files have been uploaded to here.
Show some code:
byte[] templateBytes = System.IO.File.ReadAllBytes(TEMPLATE_YANG_FILE);
using (MemoryStream templateStream = new MemoryStream())
{
templateStream.Write(templateBytes, 0, (int)templateBytes.Length);
string guidStr = Guid.NewGuid().ToString();
using (WordprocessingDocument document = WordprocessingDocument.Open(templateStream, true))
{
document.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);
MainDocumentPart mainPart = document.MainDocumentPart;
mainPart.Document = new Document();
Body bd = new Body();
foreach (DocumentFormat.OpenXml.Wordprocessing.Paragraph clonedParagrph in lst)
{
bd.AppendChild<DocumentFormat.OpenXml.Wordprocessing.Paragraph>(clonedParagrph);
clonedParagrph.Descendants<Blip>().ToList().ForEach(blip =>
{
var newRelation = document.CopyImage(blip.Embed, this.wordDocument);
blip.Embed = newRelation;
});
clonedParagrph.Descendants<DocumentFormat.OpenXml.Vml.ImageData>().ToList().ForEach(imageData =>
{
var newRelation = document.CopyImage(imageData.RelationshipId, this.wordDocument);
imageData.RelationshipId = newRelation;
});
}
mainPart.Document.Body = bd;
mainPart.Document.Save();
}
string subDocFile = System.IO.Path.Combine(this.outDir, guidStr + ".docx");
this.subWordFileLst.Add(subDocFile);
File.WriteAllBytes(subDocFile, templateStream.ToArray());
}
the lst contains Paragraph cloned from original docx using:
(DocumentFormat.OpenXml.Wordprocessing.Paragraph)p.Clone();
Using productivity tool, found oleobjectx.bin not copied, so I add below code after copy Blip and ImageData:
clonedParagrph.Descendants<OleObject>().ToList().ForEach(ole =>
{
var newRelation = document.CopyOleObject(ole.Id, this.wordDocument);
ole.Id = newRelation;
});
Solved the issue.

Take all text files in a folder and combine then into 1

I'm trying to merge all my text files into one file.
The problem I am having is that the file names are based on data previously captured in my app. I don't know how to define my path to where the text files are, maybe. I keep getting a error, but the path to the files are correct.
What am I missing?
string filesread = System.AppDomain.CurrentDomain.BaseDirectory + #"\data\Customers\" + CustComboB.SelectedItem + #"\";
Directory.GetFiles(filesread);
using (var output = File.Create("allfiles.txt"))
{
foreach (var file in new[] { filesread })
{
using (var input = File.OpenRead(file))
{
input.CopyTo(output);
}
}
}
System.Diagnostics.Process.Start("allfiles.txt");
my error:
System.IO.DirectoryNotFoundException
HResult=0x80070003
Message=Could not find a part of the path 'C:\Users\simeo\source\repos\UpMarker\UpMarker\bin\Debug\data\Customers\13Dec2018\'.
I cant post a pic, but let me try and give some more details on my form.
I select a combobox item, this item is a directory. then I have a listbox that displays the files in my directory. I then have a button that executes my desires of combining the files. thanks
I finally got it working.
string path = #"data\Customers\" + CustComboB.SelectedItem;
string topath = #"data\Customers\";
string files = "*.txt";
string[] txtFiles;
txtFiles = Directory.GetFiles(path, files);
using (StreamWriter writer = new StreamWriter(topath + #"\allfiles.txt"))
{
for (int i = 0; i < txtFiles.Length; i++)
{
using (StreamReader reader = File.OpenText(txtFiles[i]))
{
writer.Write(reader.ReadToEnd());
}
}
System.Diagnostics.Process.Start(topath + #"\allfiles.txt");
}

Solution to upload image file via WCF service?

Being surfing for last 3-4 days downloading, running and fixing issues with available demo projects online, none of them work so far.
I need to upload an image using WCF webservice. Where from client side end I like to upload it by means of form (multipart/form-data), including some file description.
Any solution working with proper answer? My mind is really stacked overflow trying different solution. One which I initially have I am able to upload a text file where file gets created with some extra content in it. I need to upload image file.
------------cH2ae0GI3KM7GI3Ij5ae0ei4Ij5Ij5
Content-Disposition: form-data; name=\"Filename\"
testing file gets upload...
When I upload image file, the image file is empty.
Initial Code (one implantation), method by means of which I get the .txt file as above, in case of image its blank (or say corrupt don't know)
private string uplaodFile(Stream stream)
{
StreamReader sr = new StreamReader(stream);
int length = sr.ReadToEnd().Length;
byte[] buffer = new byte[length];
stream.Read(buffer, 0, length);
FileStream f = new FileStream(Path.Combine(HostingEnvironment.MapPath("~/Upload"), "test.png"), FileMode.OpenOrCreate);
f.Write(buffer, 0, buffer.Length);
f.Close();
stream.Close();
return "Recieved the image on server";
}
another;
public Stream FileUpload(string fileName, Stream stream)
{
string FilePath = Path.Combine(HostingEnvironment.MapPath("~/Upload"), fileName);
int length = 0;
using (FileStream writer = new FileStream(FilePath, FileMode.Create))
{
int readCount;
var buffer = new byte[8192];
while ((readCount = stream.Read(buffer, 0, buffer.Length)) != 0)
{
writer.Write(buffer, 0, readCount);
length += readCount;
}
}
return returnJson(new { resp_code = 302, resp_message = "occurred." });
}

iText not returning text contents of a PDF after first page

I am trying to use the iText library with c# to capture the text portion of pdf files.
I created a pdf from excel 2013 (exported) and then copied the sample from the web of how to use itext (added the lib ref to the project).
It reads perfectly the first page but it gets garbled info after that. It is keeping part of the first page and merging the info with the next page. The commented lines is when I was trying to solve the problem, the string "thePage" is recreated inside the for loop.
Here is the code. I can email the pdf to whoever can help with this issue.
Thanks in advance
public static string ExtractTextFromPdf(string path)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();
using (PdfReader reader = new PdfReader(path))
{
StringBuilder text = new StringBuilder();
//string[] theLines;
//theLines = new string[COLUMNS];
//string thePage;
for (int i = 1; i <= reader.NumberOfPages; i++)
{
string thePage = "";
thePage = PdfTextExtractor.GetTextFromPage(reader, i, its);
string [] theLines = thePage.Split('\n');
foreach (var theLine in theLines)
{
text.AppendLine(theLine);
}
// text.AppendLine(" ");
// Array.Clear(theLines, 0, theLines.Length);
// thePage = "";
}
return text.ToString();
}
}
A strategy object collects text data and does not know if a new page has started or not.
Thus, use a new strategy object for each page.

Replace the text in pdf document using itextSharp

I want to replace a particular text in PDF document. I am currently using itextSharp library to play with PDF documents.
I had extracted the bytes from pdfdocument and then replaced that byte and then write the document again with the bytes but it is not working. In the below example I am trying to replace string 1234 with 5678
Any advise on how to perform this would be helpful.
PdfReader reader = new PdfReader(opf.FileNames[i]);
byte[] pdfbytes = reader.GetPageContent(1);
PdfString oldstring = new PdfString("1234");
PdfString newstring = new PdfString("5678");
byte[] byte1022 = oldstring.GetOriginalBytes();
byte[] byte1067 = newstring.GetOriginalBytes();
int position = 0;
for (int j = 0; j <pdfbytes.Length ; j++)
{
if (pdfbytes[j] == byte1022[0])
{
if (pdfbytes[j+1] == byte1022[1])
{
if (pdfbytes[j+2] == byte1022[2])
{
if (pdfbytes[j+3] == byte1022[3])
{
position = j;
break;
}
}
}
}
}
pdfbytes[position] = byte1067[0];
pdfbytes[position + 1] = byte1067[1];
pdfbytes[position + 2] = byte1067[2];
pdfbytes[position + 3] = byte1067[3];
File.WriteAllBytes(opf.FileNames[i].Replace(".pdf","j.pdf"), pdfbytes);
What makes you think 1234 is part of the page's content stream and not of a form XObject? Your code is never going to work in general if you don't parse all the resources of a page.
Also: I see GetPageContent(), but I don't see you using SetPageContent() anywhere. How are the changes ever going to be stored in the PdfReader object?
Moreover, I don't see you using PdfStamper to write the altered PdfReader contents to a file.
Finally: I'm to shy to quote the words of Leonard Rosenthol, Adobe's PDF Architect, but ask him, and he'll tell you personally that you shouldn't do what you're trying to do. PDF is NOT a format for editing.Read the intro of chapter 6 of the book I wrote on iText: http://www.manning.com/lowagie2/samplechapter6.pdf