c#-openxml word Replacement and page break - openxml

i am a new member an i really like this site because it help me always
my problem is
i want replace word document using openxml and add a page break
end then i want to write replaced text second page
here my codes
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(#"d:\a.docx", true))
{
using (StreamReader reader = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
text = reader.ReadToEnd();
}
Regex regexText = new Regex("#db#");
text = regexText.Replace(text, textBox4.Text.Trim());
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(text);
}
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
Run r = new Run();
Paragraph para = new Paragraph(new Run(new Break() { Type = BreakValues.Page }));
using (StreamWriter sw1 = new StreamWriter(mainPart.GetStream(FileMode.Create)))
{
sw1.Write(text);
}
mainPart.Document.Body.InsertAfter(para, mainPart.Document.Body.LastChild);
mainPart.Document.Save();
}
}

I suggest you insert a page break in your a.docx in advance. Then, use MergeField to locate where you want to replace with.
Here is the example

Related

How to remove the extra page at the end of a word document which created during mail merge

I have written a piece of code to create a word document by mail merge using Syncfusion (Assembly Syncfusion.DocIO.Portable, Version=17.1200.0.50,), Angular 7+ and .NET Core. Please see the code below.
private MemoryStream MergePaymentPlanInstalmentsScheduleToPdf(List<PaymentPlanInstalmentReportModel>
PaymentPlanDetails, byte[] templateFileBytes)
{
if (templateFileBytes == null || templateFileBytes.Length == 0)
{
return null;
}
var templateStream = new MemoryStream(templateFileBytes);
var pdfStream = new MemoryStream();
WordDocument mergeDocument = null;
using (mergeDocument = new WordDocument(templateStream, FormatType.Docx))
{
if (mergeDocument != null)
{
var mergeList = new List<PaymentPlanInstalmentScheduleMailMergeModel>();
var obj = new PaymentPlanInstalmentScheduleMailMergeModel();
obj.Applicants = 0;
if (PaymentPlanDetails != null && PaymentPlanDetails.Any()) {
var applicantCount = PaymentPlanDetails.GroupBy(a => a.StudentID)
.Select(s => new
{
StudentID = s.Key,
Count = s.Select(a => a.StudentID).Distinct().Count()
});
obj.Applicants = applicantCount?.Count() > 0 ? applicantCount.Count() : 0;
}
mergeList.Add(obj);
var reportDataSource = new MailMergeDataTable("Report", mergeList);
var tableDataSource = new MailMergeDataTable("PaymentPlanDetails", PaymentPlanDetails);
List<DictionaryEntry> commands = new List<DictionaryEntry>();
commands.Add(new DictionaryEntry("Report", ""));
commands.Add(new DictionaryEntry("PaymentPlanDetails", ""));
MailMergeDataSet ds = new MailMergeDataSet();
ds.Add(reportDataSource);
ds.Add(tableDataSource);
mergeDocument.MailMerge.ExecuteNestedGroup(ds, commands);
mergeDocument.UpdateDocumentFields();
using (var converter = new DocIORenderer())
{
using (var pdfDocument = converter.ConvertToPDF(mergeDocument))
{
pdfDocument.Save(pdfStream);
pdfDocument.Close();
}
}
mergeDocument.Close();
}
}
return pdfStream;
}
Once the document is generated, I notice there is a blank page (with the footer) at the end. I searched for a solution on the internet over and over again, but I was not able to find a solution. According to experts, I have done the initial checks such as making sure that the initial word template file has no page breaks, etc.
I am wondering if there is something that I can do from my code to remove any extra page breaks or anything like that, which can cause this.
Any other suggested solution for this, even including MS Word document modifications also appreciated.
Please refer the below documentation link to remove empty page at the end of Word document using Syncfusion Word library (Essential DocIO).
https://www.syncfusion.com/kb/10724/how-to-remove-empty-page-at-end-of-word-document
Please reuse the code snippet before converting Word to PDF in your sample application.
Note: I work for Syncfusion.

Is it possible to merge several pdfs using iText7

I have several datasheets for products. Each is a separate file. What I want to do is to use iText to generate a summary / recommended set of actions, based on answers to a webform, and then append to that all the relevant datasheets. This way, I only need to open one new tab in the browser to print all information, rather than opening one for the summary, and one for each datasheet that is needed.
So, is it possible to do this using iText?
Yes, you can merge PDFs using iText 7. E.g. look at the iText 7 Jump-Start tutorial sample C06E04_88th_Oscar_Combine, the pivotal code is:
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
PdfMerger merger = new PdfMerger(pdf);
//Add pages from the first document
PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(SRC1));
merger.merge(firstSourcePdf, 1, firstSourcePdf.getNumberOfPages());
//Add pages from the second pdf document
PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(SRC2));
merger.merge(secondSourcePdf, 1, secondSourcePdf.getNumberOfPages());
firstSourcePdf.close();
secondSourcePdf.close();
pdf.close();
(C06E04_88th_Oscar_Combine method createPdf)
Depending on your use case, you might want to use the PdfDenseMerger with its helper class PageVerticalAnalyzer instead of the PdfMerger here. It attempts to put content from multiple source pages onto a single target page and corresponds to the iText 5 PdfVeryDenseMergeTool from this answer. Due to the nature of PDF files this only works for PDFs without headers, footers, and similar artifacts.
I found a solution that works quite well.
public byte[] Combine(IEnumerable<byte[]> pdfs)
{
using (var writerMemoryStream = new MemoryStream())
{
using (var writer = new PdfWriter(writerMemoryStream))
{
using (var mergedDocument = new PdfDocument(writer))
{
var merger = new PdfMerger(mergedDocument);
foreach (var pdfBytes in pdfs)
{
using (var copyFromMemoryStream = new MemoryStream(pdfBytes))
{
using (var reader = new PdfReader(copyFromMemoryStream))
{
using (var copyFromDocument = new PdfDocument(reader))
{
merger.Merge(copyFromDocument, 1, copyFromDocument.GetNumberOfPages());
}
}
}
}
}
}
return writerMemoryStream.ToArray();
}
}
Use
DirectoryInfo d = new DirectoryInfo(INPUT_FOLDER);
var pdfList = new List<byte[]> { };
foreach (var file in d.GetFiles("*.pdf"))
{
pdfList.Add(File.ReadAllBytes(file.FullName));
}
File.WriteAllBytes(OUTPUT_FOLDER + "\\merged.pdf", Combine(pdfList));
Autor:
https://www.nikouusitalo.com/blog/combining-pdf-documents-using-itext7-and-c/
If you want to add two array of bytes and return one array of bytes as PDF/A
public static byte[] mergePDF(byte [] first, byte [] second) throws IOException {
// Initialize PDF writer
ByteArrayOutputStream arrayOutputStream = new ByteArrayOutputStream();
PdfWriter writer = new PdfWriter(arrayOutputStream);
// Initialize PDF document
PdfADocument pdf = new PdfADocument(writer, PdfAConformanceLevel.PDF_A_1B, new PdfOutputIntent("Custom", "",
"https://www.color.org", "sRGB IEC61966-2.1", new FileInputStream("sRGB_CS_profile.icm")));
PdfMerger merger = new PdfMerger(pdf);
//Add pages from the first document
PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(new ByteArrayInputStream(first)));
merger.merge(firstSourcePdf, 1, firstSourcePdf.getNumberOfPages());
//Add pages from the second pdf document
PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(new ByteArrayInputStream(second)));
merger.merge(secondSourcePdf, 1, secondSourcePdf.getNumberOfPages());
firstSourcePdf.close();
secondSourcePdf.close();
writer.close();
pdf.close();
return arrayOutputStream.toByteArray();
}
The question doesn't specify the language, so I'm adding an answer using C#; this works for me. I'm creating three separate but related PDFs then combining them into one.
After creating the three separate PDF docs and adding data to them, I combine them this way:
PdfDocument pdfCombined = new PdfDocument(new PdfWriter(destCombined));
PdfMerger merger = new PdfMerger(pdfCombined);
PdfDocument pdfReaderExecSumm = new PdfDocument(new PdfReader(destExecSumm));
merger.Merge(pdfReaderExecSumm, 1, pdfReaderExecSumm.GetNumberOfPages());
PdfDocument pdfReaderPhrases = new PdfDocument(new PdfReader(destPhrases));
merger.Merge(pdfReaderPhrases, 1, pdfReaderPhrases.GetNumberOfPages());
PdfDocument pdfReaderUncommonWords = new PdfDocument(new PdfReader(destUncommonWords));
merger.Merge(pdfReaderUncommonWords, 1, pdfReaderUncommonWords.GetNumberOfPages());
pdfCombined.Close();
So the combined PDF is a PDFWriter type of PdfDocument, and the merged pieces parts are PdfReader types of PdfDocuments, and the PdfMerger is the glue that binds it all together.
Here is the minimum C# code needed to merge file1.pdf into file2.pdf creating new merged.pdf:
var path = #"C:\Temp\";
var src0 = System.IO.Path.Combine(path, "merged.pdf");
var wtr0 = new PdfWriter(src0);
var pdf0 = new PdfDocument(wtr0);
var src1 = System.IO.Path.Combine(path, "file1.pdf");
var fi1 = new FileInfo(src1);
var rdr1= new PdfReader(fi1);
var pdf1 = new PdfDocument(rdr1);
var src2 = System.IO.Path.Combine(path, "file2.pdf");
var fi2 = new FileInfo(src2);
var rdr2 = new PdfReader(fi2);
var pdf2 = new PdfDocument(rdr2);
var merger = new PdfMerger(pdf0);
merger.Merge(pdf1, 1, pdf1.GetNumberOfPages());
merger.Merge(pdf2, 1, pdf2.GetNumberOfPages());
merger.Close();
pdf0.Close();
Here is a VB.NET solution using open source iText7 that can merge multiple PDF files to an output file.
Imports iText.Kernel.Pdf
Imports iText.Kernel.Utils
Public Function Merge_PDF_Files(ByVal input_files As List(Of String), ByVal output_file As String) As Boolean
Dim Input_Document As PdfDocument = Nothing
Dim Output_Document As PdfDocument = Nothing
Dim Merger As PdfMerger
Try
Output_Document = New iText.Kernel.Pdf.PdfDocument(New iText.Kernel.Pdf.PdfWriter(output_file))
'Create the output file (Document) from a Merger stream'
Merger = New PdfMerger(Output_Document)
'Merge each input PDF file to the output document'
For Each file As String In input_files
Input_Document = New PdfDocument(New PdfReader(file))
Merger.Merge(Input_Document, 1, Input_Document.GetNumberOfPages())
Input_Document.Close()
Next
Output_Document.Close()
Return True
Catch ex As Exception
'catch Exception if needed'
If Input_Document IsNot Nothing Then Input_Document.Close()
If Output_Document IsNot Nothing Then Output_Document.Close()
File.Delete(output_file)
Return False
End Try
End Function
USAGE EXAMPLE:
Dim success as boolean = false
Dim input_files_list As New List(Of String)
input_files_list.Add("c:\input_PDF1.pdf")
input_files_list.Add("c:\input_PDF2.pdf")
input_files_list.Add("c:\input_PDF3.pdf")
success = Merge_PDF_Files(input_files_list, "c:\output_PDF.pdf")
'Optional: handling errors'
if success then
'Files merged'
else
'Error merging files'
end if

iText not returning text contents of a PDF after first page

I am trying to use the iText library with c# to capture the text portion of pdf files.
I created a pdf from excel 2013 (exported) and then copied the sample from the web of how to use itext (added the lib ref to the project).
It reads perfectly the first page but it gets garbled info after that. It is keeping part of the first page and merging the info with the next page. The commented lines is when I was trying to solve the problem, the string "thePage" is recreated inside the for loop.
Here is the code. I can email the pdf to whoever can help with this issue.
Thanks in advance
public static string ExtractTextFromPdf(string path)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();
using (PdfReader reader = new PdfReader(path))
{
StringBuilder text = new StringBuilder();
//string[] theLines;
//theLines = new string[COLUMNS];
//string thePage;
for (int i = 1; i <= reader.NumberOfPages; i++)
{
string thePage = "";
thePage = PdfTextExtractor.GetTextFromPage(reader, i, its);
string [] theLines = thePage.Split('\n');
foreach (var theLine in theLines)
{
text.AppendLine(theLine);
}
// text.AppendLine(" ");
// Array.Clear(theLines, 0, theLines.Length);
// thePage = "";
}
return text.ToString();
}
}
A strategy object collects text data and does not know if a new page has started or not.
Thus, use a new strategy object for each page.

Bolding with Rich Text Values in iTextSharp

Is it possible to bold a single word within a sentence with iTextSharp? I'm working with large paragraphs of text coming from xml, and I am trying to bold several individual words without having to break the string into individual phrases.
Eg:
document.Add(new Paragraph("this is <b>bold</b> text"));
should output...
this is bold text
As #kuujinbo pointed out there is the XMLWorker object which is where most of the new HTML parsing work is being done. But if you've just got simple commands like bold or italic you can use the native iTextSharp.text.html.simpleparser.HTMLWorker class. You could wrap it into a helper method such as:
private Paragraph CreateSimpleHtmlParagraph(String text) {
//Our return object
Paragraph p = new Paragraph();
//ParseToList requires a StreamReader instead of just text
using (StringReader sr = new StringReader(text)) {
//Parse and get a collection of elements
List<IElement> elements = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(sr, null);
foreach (IElement e in elements) {
//Add those elements to the paragraph
p.Add(e);
}
}
//Return the paragraph
return p;
}
Then instead of this:
document.Add(new Paragraph("this is <b>bold</b> text"));
You could use this:
document.Add(CreateSimpleHtmlParagraph("this is <b>bold</b> text"));
document.Add(CreateSimpleHtmlParagraph("this is <i>italic</i> text"));
document.Add(CreateSimpleHtmlParagraph("this is <b><i>bold and italic</i></b> text"));
I know that this is an old question, but I could not get the other examples here to work for me. But adding the text in Chucks with different fonts did.
//define a bold font to be used
Font boldFont = FontFactory.GetFont(FontFactory.HELVETICA_BOLD, 12);
//add a phrase and add Chucks to it
var phrase2 = new Phrase();
phrase2.Add(new Chunk("this is "));
phrase2.Add(new Chunk("bold", boldFont));
phrase2.Add(new Chunk(" text"));
document.Add(phrase2);
Not sure how complex your Xml is, but try XMLWorker. Here's a working example with an ASP.NET HTTP handler:
<%# WebHandler Language="C#" Class="boldText" %>
using System;
using System.IO;
using System.Web;
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.xml;
using iTextSharp.tool.xml;
public class boldText : IHttpHandler {
public void ProcessRequest (HttpContext context) {
HttpResponse Response = context.Response;
Response.ContentType = "application/pdf";
StringReader xmlSnippet = new StringReader(
"<p>This is <b>bold</b> text</p>"
);
using (Document document = new Document()) {
PdfWriter writer = PdfWriter.GetInstance(
document, Response.OutputStream
);
document.Open();
XMLWorkerHelper.GetInstance().ParseXHtml(
writer, document, xmlSnippet
);
}
}
public bool IsReusable { get { return false; } }
}
You may have to pre-process your Xml before sending it to XMLWorker. (notice the snippet is a bit different from yours) Support for parsing HTML/Xml was released relatively recently, so your mileage may vary.
Here is another XMLWorker example that uses a different overload of ParseHtml and returns a Phrase instead of writing it directly to the document.
private static Phrase CreateSimpleHtmlParagraph(String text)
{
var p = new Phrase();
var mh = new MyElementHandler();
using (TextReader sr = new StringReader("<html><body><p>" + text + "</p></body></html>"))
{
XMLWorkerHelper.GetInstance().ParseXHtml(mh, sr);
}
foreach (var element in mh.elements)
{
foreach (var chunk in element.Chunks)
{
p.Add(chunk);
}
}
return p;
}
private class MyElementHandler : IElementHandler
{
public List<IElement> elements = new List<IElement>();
public void Add(IWritable w)
{
if (w is iTextSharp.tool.xml.pipeline.WritableElement)
{
elements.AddRange(((iTextSharp.tool.xml.pipeline.WritableElement)w).Elements());
}
}
}

How to update a text file which always contains a single line?

I have the following code to read a line from a text file.
In the UpdateFile() method I need to delete the existing one line and update it with a new line.
Can anybody please provide any ideas?
Thank you.
FileInfo JFile = new FileInfo(#"C:\test.txt");
using (FileStream JStream = JFile.Open(FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None))
{
int n = GetNUmber(JStream);
n = n + 1;
UpdateFile(JStream);
}
private int GetNUmber(FileStream jstream)
{
StreamReader sr = new StreamReader(jstream);
string line = sr.ReadToEnd().Trim();
int result;
if (string.IsNullOrEmpty(line))
{
return 0;
}
else
{
int.TryParse(line, out result);
return result;
}
}
private int UpdateFile(FileStream jstream)
{
jstream.Seek(0, SeekOrigin.Begin);
StreamWriter writer = new StreamWriter(jstream);
writer.WriteLine(n);
}
I think the below code can do your job
StreamWriter writer = new StreamWriter("file path", false); //false means do not append
writer.Write("your new line");
writer.Close();
If you're just writing a single line, there's no need for streams or buffers or any of that. Just write it directly.
using System.IO;
File.WriteAllText(#"C:\test.txt", "hello world");
var line = File.ReadLines(#"c:\temp\hello.txt").ToList()[0];
var number = Convert.ToInt32(line);
number++;
File.WriteAllText(#"c:\temp\hello.txt", number.ToString());
Manage the possible exceptions, file exists, file has lines, the cast......