itextsharp - Problems reading PDFs with 1 column (page1) and 2 columns (page2) - itext

My code below is lost when opening PDF file which has only one column on the front page and more than 1 column on other pages.
Someone can tell me what I'm doing wrong?
Below my code:
PdfReader pdfreader = new PdfReader(pathNmArq);
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
for (int page=1; page <= lastPage; page++)
{
     extractText = PdfTextExtractor.GetTextFromPage(pdfreader, page, strategy);
extractText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(extractText)));
    / / ...
}

You use the SimpleTextExtractionStrategy. This strategy assumes that the text drawing instructions in the PDF are sorted by the reading order. In your case that does not seem to be the case.
If you cannot count on the PDF containing drawing operations in reading order but are only using iText text extraction strategies from the distribution, you have to know areas which constitute a single column. If a page contains multiple columns, you have to use RegionTextRenderFilter to restrict to a column and then use the LocationTextExtractionStrategy.
PS: What exactly is your intention in that
extractText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(extractText)));
line?

Related

Exporting output results from a model into the input of a different model

I'm trying to build a model of a factory using the personal learning edition of AnyLogic. Since this version has a limited number of blocks per model, building the full factory on a single model is presenting itself as an impossible task. In order to surpass this issue I want to split the factorys main processes into different models, which means I'll have to feed the output of process A into the input of process B.
My question is: how can I export a time stamped output of a model into the input of a different model?
Thank you in advance.
You have 2 options
Option 1: Through an Excel file (or txt file)
Simply link an Excel file in your model, using the object from the connectivity palette
Then you can get the data using code similar to below
int excelRow = 2;
String sheetName = "Sheet1!";
String cellName = sheetName + "A" + excelRow;
while (excelFile.cellExists( cellName )) {
int x = (int)excelFile.getCellNumericValue( sheetName + "A" + excelRow);
int b = (int)excelFile.getCellNumericValue( sheetName + "B" + excelRow);
int c = (int)excelFile.getCellNumericValue( sheetName + "C" + excelRow);
boolean d = excelFile.getCellBooleanValue( sheetName + "D" + excelRow);
excelRow ++; // Increase the row that we will lookup in the Excel
}
Just a while loop where you go from one excel line to the next as long as the line exists, and then do what ever is needed with the data
Option 2: AnyLogic Internal DB
Simply import your excel sheet to the AnyLogic DB and then loop over the entries in the table using a for loop
List<Tuple> rows = selectFrom(db_table).list();
for (Tuple row : rows) {
traceln(
row.get( db_table.db_column )
);
}

Read data from a website with Matlab

Can someone show me how to read data from this website: http://www.amlbook.com/data/zip/features.train
I used to copy+paste to form a array in my Matlab editor, but this time it seems the data amount is huge...
block = URLREAD('http://www.amlbook.com/data/zip/features.train');
readData = textscan(block,'%f%f%f','delimiter', char(9));
train1 = readData{1};
train2 = readData{2};
train3 = readData{3};
clear readData
Three 7291*1 double arrays are imported, representing three different columns on the website page.

Crystal Report 2008 - Memory Full

I am developing a C#/ASP.NET Web project at VS 2010 and it uses Crystal Reports (2008) version 12.3.0.601. Project calls the report and exports it as pdf. Anytime I change something at report design, "Memory Full" error shows up at when page is refreshed. Sometimes it does not give the error, but sometimes i try not to get the error for hours.
I have searched many sites related to the title but had no luck with a solution.
Has anyone ever encountered such error before?
System.Runtime.InteropServices.COMException (0x80041004): Memory full. Failed to export the report. Not enough memory for operation.
Thanks for your help.
For exporting to pdf I recommend you download some pdf printer ie. http://www.cutepdf.com/products/cutepdf/writer.asp
Then you can "print" the report to pdf no problems.
Hope it helps!
I had a similar problem. On my report there are some texts and images. I have read that Crystal Report first converts the JPG, PNG, etc images to BMPs then it shows the report. And converting other image type to BMP consumes much memory. First I tried to convert JPG images to BMP images in my database but then my database became bigger and bigger. Finally I found the solution (thanks to this answer).
Instead of trying to export all pages to PDF file I splitted files, zipped them and download the zip file:
Dim exportOpts As ExportOptions = New ExportOptions()
Dim pdfRtfWordOpts As PdfRtfWordFormatOptions = ExportOptions.CreatePdfRtfWordFormatOptions()
Dim destinationOpts As DiskFileDestinationOptions = ExportOptions.CreateDiskFileDestinationOptions()
Dim intPageCount As Integer = crReportDocument.FormatEngine.GetLastPageNumber(New CrystalDecisions.Shared.ReportPageRequestContext)
Dim pagecount As Integer
pagecount = Int(intPageCount / 100) + 1
Dim sonsayfa As Integer
Dim ilksayfa As Integer
Dim Anadosyaadi As String
Dim foldername As String
Dim foldernameMap As String
Anadosyaadi = Now.ToString("yyyy-MM-dd-hh-mm-ss")
foldername = "C:\inetpub\wwwroot\" + Anadosyaadi
foldernameMap = "./" + Anadosyaadi
If Not Directory.Exists(foldername) Then
Directory.CreateDirectory(foldername)
End If
For li_count As Integer = 1 To pagecount
ilksayfa = (li_count - 1) * 100 + 1
pdfRtfWordOpts.FirstPageNumber = ilksayfa
sonsayfa = li_count * 100
If sonsayfa > intPageCount Then sonsayfa = intPageCount
pdfRtfWordOpts.LastPageNumber = sonsayfa
pdfRtfWordOpts.UsePageRange = True
exportOpts.ExportFormatOptions = pdfRtfWordOpts
exportOpts.ExportFormatType = ExportFormatType.PortableDocFormat
destinationOpts.DiskFileName = foldername + "\" + li_count.ToString + ".pdf"
exportOpts.ExportDestinationOptions = destinationOpts
exportOpts.ExportDestinationType = ExportDestinationType.DiskFile
crReportDocument.Export(exportOpts)
Next
Using zip As ZipFile = New ZipFile
zip.AddDirectory(foldername)
zip.Save(foldername + "\" + Anadosyaadi + ".zip")
End Using
Response.Redirect(foldernameMap + "/" + Anadosyaadi + ".zip")
PS1: I used Ionic.Zip to zip the files. You should add
Imports Ionic.Zip
on top of your source.
PS2: When I restart the server, without doing anything else, I can get 500 pages of PDF from page number 1 to 500. At this time, I think, Crystal Reports uses some memory. Then if I want to get the second 500 pages (I mean from page number 501 to 1000), I see memory full error. Then I can get 300 pages (from 501 to 800). Then another memory full problem and I can get from 801 to 900, etc. That's why I preferred to split 100 pages. Maybe you can change it to another number.

my report is empty

I'm using vb net 2005. I have created a .xsd dataset with one table and a rpt report.
the dataset is PedidosDataImpresion.xsd with the table DatosPedido.
The rpt form is pedidoimpreso.rpt
this is my code:
Dim DatosImpresion As New PedidosDataImpresion
Dim TablaPrincipal As DataTable = DatosImpresion.DatosPedido
I = 0
TablaPrincipal.Rows.Add(1)
TablaPrincipal.Rows(I).Item("razonsocial") = Me.lblfacturacion.Text
TablaPrincipal.Rows(I).Item("calle") = Me.LblDireccion.Text
TablaPrincipal.Rows(I).Item("colonia") = Me.LblColonia.Text
TablaPrincipal.Rows(I).Item("ciudad") = Me.LblCiudad.Text
TablaPrincipal.Rows(I).Item("estado") = Me.LblEdo.Text
TablaPrincipal.Rows(I).Item("cp") = Me.LblEdo.Text
TablaPrincipal.Rows(I).Item("rfc") = Me.LblRfc.Text
PedidoImpreso.SetDataSource(DatosImpresion)
PedidoImpreso.PrintOptions.PrinterName = Impresora
PedidoImpreso.PrintToPrinter(Copias, True, 1, 1)
i wish to print directly to the printer with no reportviewer first
and the report comes with no data. Can you help me ?
You had to use some third party tool to do it.
What you can do is:
Make a HTML table
Fill Data
Print table on page load (using jQuery)
In case you want report format then you will need report viewer to get itself rendered.

How to quickly add many columns to GWT DataGrid

I am currently trying to create a DataGrid that can take an entity with a list of values as a row. Each value in the list is in its own column in the DataGrid. The entities' lists of values may have different sizes, so the DataGrid will have a variable number of columns. I have noticed that when I try to create the DataGrid and loop over the process of adding each of the column to the DataGrid, the time it takes to add the columns does not grow linearly.
Here is the code I was using to test the quickness of adding the columns
DataGrid<String> table = new DataGrid<String>();
table.setPageSize(25);
int NUM_COLUMNS = 40;
for (int i = 0; i < NUM_COLUMNS; i++) {
GWT.log("Adding column "+i);
TextColumn<String> nameColumn = new TextColumn<String>() {
public String getValue(String object) {
return object;
}
};
table.addColumn(nameColumn, "Column " + i);
table.setColumnWidth(nameColumn, 100, Unit.PX);
}
ArrayList<String> data = new ArrayList<String>();
for (int i = 0; i < 10; i++) {
data.add("row "+i);
}
table.setRowCount(data.size(), true);
table.setRowData(0, data);
table.setWidth("100");
This took about 48 seconds, give or take 1 second, every time I ran it. It seems that loading less than 10 columns were fairly quickly, but as the number of columns grew, the time it took to load it grew exponentially.
Is there another way to add columns to the DataGrid that would be quicker? Thanks in advance.
One question you might want to ask yourself is if there's a better way to do it. A table with 40 columns (IMO) seems inefficient. In general, you're going to have significant performance loss when loading more than ~15 columns in a DataGrid, and FlexTable isn't any better.
I've worked with DataGrid quite a bit and haven't seen any of the behavior you're talking about, though in my case they typically only have 10 or fewer columns with several thousand rows. (Data is of course paged and not being jammed in all at once.)
One thing I've noticed does speed it up is pre-rendering. Are you adding the table to the DOM prior to adding all these columns, or are you adding them all first? Lots of time can be spent waiting for the DOM to update. If you're adding it to the page after rendering everything, you're probably looking at the best speed you'll get, since there's no built-in function for adding multiple columns simultaneously.