CwvReader not loading lines starting with # - csvreader

I'm trying to load a text file (.csv) into a SQL Server database table. Each line in the file is supposed to be loaded into a single column in the table. I find that lines starting with "#" are skipped, with no error. For example, the first two of the following four lines are loaded fine, but the last two are not. Anybody knows why?
ThisLineShouldBeLoaded
This one as well
#ThisIsATestLine
#This is another test line
Here's the segment of my code:
var sqlConn = connection.StoreConnection as SqlConnection;
sqlConn.Open();
CsvReader reader = new CsvReader(new StreamReader(f), false);
using (var bulkCopy = new SqlBulkCopy(sqlConn))
{
bulkCopy.DestinationTableName = "dbo.TestTable";
try
{
reader.SkipEmptyLines = true;
bulkCopy.BulkCopyTimeout = 300;
bulkCopy.WriteToServer(reader);
reader.Dispose();
reader = null;
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
System.Diagnostics.Debug.WriteLine(ex.Message);
throw;
}
}

# is the default comment character for CsvReader. You can change the comment character by changing the Comment property of the Configuration object. You can disable comment processing altogether by setting the AllowComment property to false, eg:
reader.Configuration.AllowComments=false;
SqlBulkCopy doesn't deal with CSV files at all, it sends any data that's passed to WriteServer to the database. It doesn't care where the data came from or what it contains, as long as the column mappings match
Update
Assuming LumenWorks.Framework.IO.Csv refers to this project the comment character can be specified in the constructor. One could set it to something that wouldn't appear in a normal file, perhaps even the NUL character, the default char value :
CsvReader reader = new CsvReader(new StreamReader(f), false, escape:default);
or
CsvReader reader = new CsvReader(new StreamReader(f), false, escape : '\0');

Related

How to merge two ppt by poi

I want to merge multiple ppts. I use POI realize most functions, but there are still some problems. Some elements are not generated. I tested several groups of ppts.
Case 1: If there is only one slide in the PPT, the result is right. If there are multiple slides, will throw exception.
Below is the exception stack:
java.lang.ClassCastException: org.apache.poi.ooxml.POIXMLDocumentPart cannot be cast to org.apache.poi.xslf.usermodel.XSLFPictureData
at org.apache.poi.xslf.usermodel.XSLFSheet.importBlip(XSLFSheet.java:649)
at org.apache.poi.xslf.usermodel.XSLFPictureShape.copy(XSLFPictureShape.java:378)
at org.apache.poi.xslf.usermodel.XSLFSheet.wipeAndReinitialize(XSLFSheet.java:454)
at org.apache.poi.xslf.usermodel.XSLFSheet.importContent(XSLFSheet.java:433)
at org.apache.poi.xslf.usermodel.XSLFSlide.importContent(XSLFSlide.java:294)
at com.office.MergingMultiplePresentations.main(MergingMultiplePresentations.java:38)
Case 2: I tested another PPT, and when I opened it, it prompted “there is a problem with the content, you can try to repair it“”. When I click repair, the some slide of the PPT was deleted. Is there something that hasn't been copied?
Here is my code:
XMLSlideShow ppt = new XMLSlideShow();
//taking the two presentations that are to be merged
String path = "E:\\prj\\test\\";
String file1 = "1.pptx";
String file2 = "2.pptx";
String[] inputs = {file1,file2};
for(String arg : inputs){
FileInputStream inputstream = new FileInputStream(path+arg);
XMLSlideShow src = new XMLSlideShow(inputstream);
for(XSLFSlide srcSlide : src.getSlides()) {
try {
XSLFSlideLayout srcLayout = srcSlide.getSlideLayout();
XSLFSlideMaster srcMaster = srcSlide.getSlideMaster();
XSLFSlide slide = ppt.createSlide();
XSLFSlideLayout layout = slide.getSlideLayout();
XSLFSlideMaster master = slide.getSlideMaster();
layout.importContent(srcLayout);
master.importContent(srcMaster);
slide.importContent(srcSlide);
}
catch (Exception e){
e.printStackTrace();
}
}
}
String file3 = "3.pptx";
//creating the file object
FileOutputStream out = new FileOutputStream(path+file3);
// saving the changes to a file
ppt.write(out);
out.close();
The operation of merging presentations using POI looks a bit cumbersome due to the fact that you have to take care of the layouts and masters yourself. It's easier to use Aspose.Slides for Java for this. The following code example shows you how to merge presentations using that library. Slide layouts and slide masters will be merged automatically.
String file1 = "1.pptx";
String file2 = "2.pptx";
String[] inputs = {file1, file2};
// Prepare a new empty presentation.
Presentation ppt = new Presentation();
ppt.getSlides().removeAt(0); // removes the first empty slide
ppt.getSlideSize().setSize(SlideSizeType.Widescreen, SlideSizeScaleType.Maximize);
// Merge the input presentations.
for (String file : inputs) {
Presentation source = new Presentation(file);
for (ISlide slide : source.getSlides()) {
ppt.getSlides().addClone(slide);
}
source.dispose();
}
ppt.save("3.pptx", SaveFormat.Pptx);
ppt.dispose();
This is a paid product, but you can get a temporary license to try it out.
Alternatively, you could use Aspose.Slides Cloud SDK for Java. This product provides a REST-based API that allows you to make 150 free API calls per month for API learning and presentation processing. The following code example shows you how to do the same using Aspose.Slides Cloud:
SlidesApi slidesApi = new SlidesApi("my_client_id", "my_client_secret");
String file1 = "1.pptx";
String file2 = "2.pptx";
String outFile = "3.pptx";
// Prepare a new empty presentation.
slidesApi.createPresentation(outFile, null, null, null, null, null);
slidesApi.deleteSlide(outFile, 1, null, null, null); // removes the first empty slide
SlideProperties slideProperties = new SlideProperties();
slideProperties.setSizeType(SlideProperties.SizeTypeEnum.WIDESCREEN);
slideProperties.setScaleType(SlideProperties.ScaleTypeEnum.MAXIMIZE);
slidesApi.setSlideProperties(outFile, slideProperties, null, null, null);
// Merge the input presentations.
PresentationsMergeRequest mergeRequest = new PresentationsMergeRequest();
mergeRequest.setPresentationPaths(Arrays.asList(file1, file2));
slidesApi.merge(outFile, mergeRequest, null, null, null);
Sometimes it is necessary to merge presentations without any code. For such cases, you can use the free Aspose Online Merger.
I work as a Support Developer at Aspose.

Word OpenXml Word Found Unreadable Content

We are trying to manipulate a word document to remove a paragraph based on certain conditions. But the word file produced always ends up being corrupted when we try to open it with the error:
Word found unreadable content
The below code corrupts the file but if we remove the line:
Document document = mdp.Document;
The the file is saved and opens without issue. Is there an obvious issue that I am missing?
var readAllBytes = File.ReadAllBytes(#"C:\Original.docx");
using (var stream = new MemoryStream(readAllBytes))
{
using (WordprocessingDocument wpd = WordprocessingDocument.Open(stream, true))
{
MainDocumentPart mdp = wpd.MainDocumentPart;
Document document = mdp.Document;
}
}
File.WriteAllBytes(#"C:\New.docx", readAllBytes);
UPDATE:
using (WordprocessingDocument wpd = WordprocessingDocument.Open(#"C:\Original.docx", true))
{
MainDocumentPart mdp = wpd.MainDocumentPart;
Document document = mdp.Document;
document.Save();
}
Running the code above on a physical file we can still open Original.docx without the error so it seems limited to modifying a stream.
Here's a method that reads a document into a MemoryStream:
public static MemoryStream ReadAllBytesToMemoryStream(string path)
{
byte[] buffer = File.ReadAllBytes(path);
var destStream = new MemoryStream(buffer.Length);
destStream.Write(buffer, 0, buffer.Length);
destStream.Seek(0, SeekOrigin.Begin);
return destStream;
}
Note how the MemoryStream is instantiated. I am passing the capacity rather than the buffer (as in your own code). Why is that?
When using MemoryStream() or MemoryStream(int), you are creating a resizable MemoryStream instance, which you will want in case you make changes to your document. When using MemoryStream(byte[]) (as in your code), the MemoryStream instance is not resizable, which will be problematic unless you don't make any changes to your document or your changes will only ever make it shrink in size.
Now, to read a Word document into a MemoryStream, manipulate that Word document in memory, and end up with a consistent MemoryStream, you will have to do the following:
// Get a MemoryStream.
// In this example, the MemoryStream is created by reading a file stored
// in the file system. Depending on the Stream you "receive", it makes
// sense to copy the Stream to a MemoryStream before processing.
MemoryStream stream = ReadAllBytesToMemoryStream(#"C:\Original.docx");
// Open the Word document on the MemoryStream.
using (WordprocessingDocument wpd = WordprocessingDocument.Open(stream, true)
{
MainDocumentPart mdp = wpd.MainDocumentPart;
Document document = mdp.Document;
// Manipulate document ...
}
// After having closed the WordprocessingDocument (by leaving the using statement),
// you can use the MemoryStream for whatever comes next, e.g., to write it to a
// file stored in the file system.
File.WriteAllBytes(#"C:\New.docx", stream.GetBuffer());
Note that you will have to reset the stream.Position property by calling stream.Seek(0, SeekOrigin.Begin) whenever your next action depends on that MemoryStream.Position property (e.g., CopyTo, CopyToAsync). Right after having left the using statement, the stream's position will be equal to its length.

How to edit pasted content using the Open XML SDK

I have a custom template in which I'd like to control (as best I can) the types of content that can exist in a document. To that end, I disable controls, and I also intercept pastes to remove some of those content types, e.g. charts. I am aware that this content can also be drag-and-dropped, so I also check for it later, but I'd prefer to stop or warn the user as soon as possible.
I have tried a few strategies:
RTF manipulation
Open XML manipulation
RTF manipulation is so far working fairly well, but I'd really prefer to use Open XML as I expect it to be more useful in the future. I just can't get it working.
Open XML Manipulation
The wonderfully-undocumented (as far as I can tell) "Embed Source" appears to contain a compound document object, which I can use to modify the copied content using the Open XML SDK. But I have been unable to put the modified content back into an object that lets it be pasted correctly.
The modification part seems to work fine. I can see, if I save the modified content to a temporary .docx file, that the changes are being made correctly. It's the return to the clipboard that seems to be giving me trouble.
I have tried assigning just the Embed Source object back to the clipboard (so that the other types such as RTF get wiped out), and in this case nothing at all gets pasted. I've also tried re-assigning the Embed Source object back to the clipboard's data object, so that the remaining data types are still there (but with mismatched content, probably), which results in an empty embedded document getting pasted.
Here's a sample of what I'm doing with Open XML:
using OpenMcdf;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
...
object dataObj = Forms.Clipboard.GetDataObject();
object embedSrcObj = dateObj.GetData("Embed Source");
if (embedSrcObj is Stream)
{
// read it with OpenMCDF
Stream stream = embedSrcObj as Stream;
CompoundFile cf = new CompoundFile(stream);
CFStream cfs = cf.RootStorage.GetStream("package");
byte[] bytes = cfs.GetData();
string savedDoc = Path.GetTempFileName() + ".docx";
File.WriteAllBytes(savedDoc, bytes);
// And then use the OpenXML SDK to read/edit the document:
using (WordprocessingDocument openDoc = WordprocessingDocument.Open(savedDoc, true))
{
OpenXmlElement body = openDoc.MainDocumentPart.RootElement.ChildElements[0];
foreach (OpenXmlElement ele in body.ChildElements)
{
if (ele is Paragraph)
{
Paragraph para = (Paragraph)ele;
if (para.ParagraphProperties != null && para.ParagraphProperties.ParagraphStyleId != null)
{
string styleName = para.ParagraphProperties.ParagraphStyleId.Val;
Run run = para.LastChild as Run; // I know I'm assuming things here but it's sufficient for a test case
run.RunProperties = new RunProperties();
run.RunProperties.AppendChild(new DocumentFormat.OpenXml.Wordprocessing.Text("test"));
}
}
// etc.
}
openDoc.MainDocumentPart.Document.Save(); // I think this is redundant in later versions than what I'm using
}
// repackage the document
bytes = File.ReadAllBytes(savedDoc);
cf.RootStorage.Delete("Package");
cfs = cf.RootStorage.AddStream("Package");
cfs.Append(bytes);
MemoryStream ms = new MemoryStream();
cf.Save(ms);
ms.Position = 0;
dataObj.SetData("Embed Source", ms);
// or,
// Clipboard.SetData("Embed Source", ms);
}
Question
What am I doing wrong? Is this just a bad/unworkable approach?

merge word documents to a single document

I used the code in the link mentioned below to merge word files into a single file
http://devpinoy.org/blogs/keithrull/archive/2007/06/09/updated-how-to-merge-multiple-microsoft-word-documents.aspx
However, seeing the output file i realized that it was unable to copy header image in the first document. How do we merge documents preserving format and content.
I will suggest to use GroupDocs.Merger Cloud for merging multiple word document to a single word document, it keeps the formatting and contents of the source documents. It is a platform independent REST API solution without depending on any third-party tool or software.
Sample C# code:
var configuration = new GroupDocs.Merger.Cloud.Sdk.Client.Configuration(MyAppSid, MyAppKey);
var apiInstance_Document = new GroupDocs.Merger.Cloud.Sdk.Api.DocumentApi(configuration);
var apiInstance_File = new GroupDocs.Merger.Cloud.Sdk.Api.FileApi(configuration);
var pathToSourceFiles = #"C:/Temp/input/";
var remoteFolder = "Temp/";
var joinItem_list = new List<JoinItem>();
try
{
DirectoryInfo dir = new DirectoryInfo(pathToSourceFiles);
System.IO.FileInfo[] files = dir.GetFiles();
foreach (System.IO.FileInfo file in files)
{
var request_upload = new GroupDocs.Merger.Cloud.Sdk.Model.Requests.UploadFileRequest(remoteFolder + file.Name, File.Open(file.FullName, FileMode.Open));
var response_upload = apiInstance_File.UploadFile(request_upload);
var item = new JoinItem
{
FileInfo = new GroupDocs.Merger.Cloud.Sdk.Model.FileInfo
{ FilePath = remoteFolder + file.Name }
};
joinItem_list.Add(item);
}
var options = new JoinOptions
{
JoinItems = joinItem_list,
OutputPath = remoteFolder + "Merged_Document.docx"
};
var request = new JoinRequest(options);
var response = apiInstance_Document.Join(request);
Console.WriteLine("Output file path: " + response.Path);
}
catch (Exception e)
{
Console.WriteLine("Exception while Merging Documents: " + e.Message);
}
That code is inserting a page break after each file.
Since sections control headers, if a second or subsequent document has a header, you'll probably be wanting to keep the original section properties, and insert those after your first document.
If you look at your original document as a docx, you'll probably see that your section is a document level section properties element.
The easiest way around your problem may be to create a second section properties element inside the last paragraph (which contains the header information). Then this should just stay there when the documents are merged (ie other paragraphs added after it).
That's the theory. See also http://www.pcreview.co.uk/forums/thread-898133.php
But I haven't tried it; it assumes InsertFile behaves as I expect it should.

how to maintain the spaces between the characters?

i am using the following code
String keyword=request.getParameter("keyword");
keyword = keyword.toLowerCase();
keyword.replaceAll(" "," "); //first double space and then single space
keyword = keyword.trim();
System.out.println(keyword);
i am given the input as t s
but iam getting as
[3/12/10 12:07:10:431 IST] 0000002c SystemOut O t s // here i am getting the two spaces
how can decrease two single space
use the follwoing program
public class whitespaces {
public static void main(String []args){
try{
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String str = br.readLine();
System.out.println( str.replaceAll("\b\s{2,}\b", " "));
}catch(Exception e){
e.printStackTrace();
}
}
}
thanks,
murali
If your database always have only one space, you could use some keypress event to automatically ignore any occurrences of multiple spaces (by replace double spaces with single space in the search string or something).
StackOverflow has solved the same (or at least a similar) problem regarding spaces in tags, by not having them. Instead, if you want to denote a space in a tag on SO, use - (dash). You could run a query to replace all spaces with - in your database (even though it would probably take quite some time to run you'll only have to do it once). If you want to display them as spaces on the page, just do a replace when you render.