I'm supposed to be testing different methods to store PDF files in a Postgres Database using JDBC. Currently I'm trying it with BYTEA. Storing files works without problems, but the retrieval is super slow.
I am working with a couple files around 3MB each. Storing them takes around 3 seconds (total), so that's alright. But when I try to retreive them, it takes around 2 minutes between the output of how many files are in the DB and the program actually starting to create the files. Once it starts, it only takes around 5 seconds though to finish. Why is Postgres taking so long for the Query "SELECT file..." ? The query takes equally long when I use pgAdmin. Not retrieving the filesize doesn't change anything.
As far as I understand, the DB uses TOAST to split my files up and when I want to retrieve them, it hast to piece them back together first. But since splitting them (when uploading) only takes a couple seconds, putting them back together shouldn't take that long, right?
Here are some code snippets:
public void saveToDB(Files[] files) {
try (PreparedStatement s = con.prepareStatement("INSERT INTO fileTable (filename, file) VALUES (?,?)")) {
for (File f : files) {
System.out.println(f.getName()+" (" + f.length() / 1024 + "KB)");
s.setString(1, f.getName());
s.setBinaryStream(2, new FileInputStream(f), f.length());
s.executeUpdate();
}
con.commit();
}
}
public void getFromDB(File dir) {
dir.mkdirs();
try (Statement s = con.createStatement(); ResultSet rs = s.executeQuery("SELECT COUNT(*) FROM useByteA")) {
rs.next();
System.out.println("Files: " + rs.getInt(1));
}
try (Statement s = con.createStatement(); ResultSet rs = s.executeQuery("SELECT length(file), filename, file FROM fileTable")) {
while (rs.next()) {
System.out.println(rs.getString(2)+" (" + rs.getInt(1) / 1024 + "KB)");
File f = new File(dir, filename);
f.createNewFile();
try (FileOutputStream out = new FileOutputStream(f)) {
out.write(rs.getBytes(3));
out.flush();
}
}
}
}
Related
I have about 10 tables that I load into a DataSet using a single DataAdapter in a sequence. During the load, I use only one DataAdapter, and I replace the table names and SELECT statements as required. I replace the table name and the SQL select statement and successively fill tables in the DataSet. Everything is done inside of two nested "using" statements to dispose of the connection and DataAdapter objects as shown below.
using (OleDbConnection conn = new OleDbConnection (Db.DbConnGet ())) {
using (var da = new OleDbDataAdapter (sql, conn)) {
tablename = "Table1";
da.SelectCommand.CommandText = $"Select * from {tablename}";
try {
da.Fill (hsdset, tablename);
} catch (Exception ex) {
...
}
tablename = "Table2";
da.SelectCommand.CommandText = $"Select * from {tablename}";
try {
da.Fill (hsdset, tablename);
} catch (Exception ex) {
...
}
}}
As you can see, the DataAdapter is disposed of once the loading is done, and I pass the DataSet around my application as necessary for reading data.
But now I have a need to update or extend the data in the dataset and get it back into the database. Updating the DataTables inside the dataset is not a problem - there are many examples on the net. I regenerated a new Connection and DataAdapter to do the update with a table in the existing, modified, strongly-typed DataSet, as follows.
using (OleDbConnection conn = new OleDbConnection (Db.DbConnGet ())) {
using (var da = new OleDbDataAdapter ("", conn)) {
// this is required; I don't know if it is used by Update
da.SelectCommand.CommandText = $"Select * from " + tablename;
try {
// build special update commands from the table->db differences
var cbuilder = new OleDbCommandBuilder (da);
da.Update (dset, "Layers");
} catch (Exception ex) {
...
}
}
}
}
My first question is, "Does the Update operation actually use the original SELECT statement to retrieve info from the database? If not, why is it required? I thought the DataSet kept track of modified rows, new rows, deleted rows, and so on. I thought updating could be done without reading the whole data table again? Or maybe it reads only the records that are marked as modified in the DataTable?
My second question is what is the best (or normal) way of working with DataSets and DataAdapters this way? Is it best practice to always save the original DataAdapters for later use, or is it good practice to create new ones like I did above? (Does the original DataAdapter keep any state information during the load that the newly-created DataAdapter would not have?) Thank you.
I was trying to remove some contents from PDF using PDFSweep, below are part of my code, I am using the CompositeCleanupStrategy and adding RegexBasedCleanupStrategy to the strategy:
CompositeCleanupStrategy strategy = new CompositeCleanupStrategy();
for (int i = 0; i < keywordlist.size(); i++) {
String kvalue = keywordlist.get(i);
Loger.getLogger().info("keyword " + i + "=" + kvalue);
strategy.add(new RegexBasedCleanupStrategy(kvalue).setRedactionColor(ColorConstants.GRAY));
}
try {
PdfWriter writer = new PdfWriter(dest);
writer.setCompressionLevel(0);
PdfDocument pdf = new PdfDocument(new PdfReader(src), writer);
// sweep
PdfAutoSweep pdfAutoSweep = new PdfAutoSweep(strategy);
pdfAutoSweep.cleanUp(pdf);
// close the document
pdf.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
When stragety is small, like there is only one or two , the cleanup is working fine, howere if there are 243 in the keywordlist, the PDF size is about 70 MB, I got following error:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
exceeded
at java.lang.String.toLowerCase(String.java:2590)
at java.lang.String.toLowerCase(String.java:2670)
at com.itextpdf.io.font.PdfEncodings.convertToString(PdfEncodings.java:287)
at com.itextpdf.kernel.pdf.PdfString.toUnicodeString(PdfString.java:163)
at com.itextpdf.kernel.pdf.canvas.parser.data.TextRenderInfo.getUnscaledBaselineWithOffset(TextRenderInfo.java:425)
at com.itextpdf.kernel.pdf.canvas.parser.data.TextRenderInfo.getBaseline(TextRenderInfo.java:213)
at com.itextpdf.kernel.pdf.canvas.parser.listener.CharacterRenderInfo.<init>(CharacterRenderInfo.java:112)
at com.itextpdf.kernel.pdf.canvas.parser.listener.RegexBasedLocationExtractionStrategy.toCRI(RegexBasedLocationExtractionStrategy.java:156)
at com.itextpdf.kernel.pdf.canvas.parser.listener.RegexBasedLocationExtractionStrategy.eventOccurred(RegexBasedLocationExtractionStrategy.java:135)
at com.itextpdf.pdfcleanup.autosweep.CompositeCleanupStrategy.eventOccurred(CompositeCleanupStrategy.java:115)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.eventOccurred(PdfCanvasProcessor.java:534)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.displayPdfString(PdfCanvasProcessor.java:549)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.access$4700(PdfCanvasProcessor.java:108)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor$ShowTextArrayOperator.invoke(PdfCanvasProcessor.java:617)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.invokeOperator(PdfCanvasProcessor.java:452)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processContent(PdfCanvasProcessor.java:281)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processPageContent(PdfCanvasProcessor.java:302)
at com.itextpdf.kernel.pdf.canvas.parser.PdfDocumentContentParser.processContent(PdfDocumentContentParser.java:77)
at com.itextpdf.kernel.pdf.canvas.parser.PdfDocumentContentParser.processContent(PdfDocumentContentParser.java:90)
at com.itextpdf.pdfcleanup.autosweep.PdfAutoSweep.getPdfCleanUpLocations(PdfAutoSweep.java:130)
at com.itextpdf.pdfcleanup.autosweep.PdfAutoSweep.cleanUp(PdfAutoSweep.java:186)
(Full disclosure: original author of RegexBasedCleanupStrategy here)
RegexBasedCleanupStrategy is not meant to be used like this.
You are creating 200 instances of this class, all of which will go over the document to see whether they can match (chunk by chunk) the PDF against the regular expression.
In order to do this, they will store all chunks in the document, sort them, and then loop over them.
So you are duplicating the document 200-something times in memory.
That is your bottleneck.
My suggestion: build a better regular expression.
You can obviously match keyword a, b, c etc with regex
(a)|(b)|(c)
This would copy the document in memory only once, and then attempt to match the aggregate regex against it.
It has both performance, and memory-footprint benefits.
I'm trying to create a report for my scenario, I want to execute some validations and add the retults in a string, then, write this string in a TXT file (for each validation I would like to add the result and execute again till the last item), something like this:
it ("Perform the loop to search for different strings", function()
{
browser.waitForAngularEnabled(false);
browser.get("http://WebSite.es");
//strings[] contains 57 strings inside the json file
for (var i = 0; i == jsonfile.strings.length ; ++i)
{
var valuetoInput = json.Strings[i];
var writeInFile;
browser.wait;
httpGet("http://website.es/search/offers/list/"+valuetoInput+"?page=1&pages=3&limit=20").then(function(result) {
writeInFile = writeInFile + "Validation for String: "+ json.Strings[i] + " Results is: " + expect(result.statusCode).toBe(200) + "\n";
});
if (i == jsonfile.strings.length)
{
console.log("Executions finished");
var fs = require('fs');
var outputFilename = "Output.txt";
fs.writeFile(outputFilename, "Validation of Get requests with each string:\n " + writeInFile, function(err) {
if(err)
{
console.log(err);
}
else {
console.log("File saved to " + outputFilename);
}
});
}
};
});
But when I check my file I only get the first row writen in the way I want and nothing else, could you please let me know what am I doing wrong?
*The validation works properly in the screen for each of string in my file used as data base
**I'm a newbie with protractor
Thank you a lot!!
writeFile documentation
Asynchronously writes data to a file, replacing the file if it already
exists
You are overwriting the file every time, which is why it only has 1 line.
The easiest way would probably (my opinion) be appendFile. It writes to a file without overwriting existing data and will also create the file if it doesnt exist in the first place.
You could also re-read that log file, store that data in a variable, and re-write to that file with the old AND new data included in it. You could also create a writeStream etc.
There are quite a few ways to go about it and plenty of other answers
on SO specifically on those functions that can provide more info.
Node.js Write a line into a .txt file
Node.js read and write file lines
Final note, if you are using Jasmine you can also create a custom jasmine reporter. They have methods that contain exactly what you want (status Pass/Fail, actual vs expected values etc) and it's fairly easy to set up with Protractor
Hello I am looking for fastest bat rather hi-level way to work with large data collection.
My task consist of two task read alot of large files in memory and then make some statistical calculations (the easiest way to work with data in this task is random access array ).
My first approach was to use java.io.ByteArrayOutputStream, becuase it can resize it's internal storage .
def packTo(buf:java.io.ByteArrayOutputStream,f:File) = {
try {
val fs = new java.io.FileInputStream(f)
IOUtils.copy(fs,buf)
} catch {
case e:java.io.FileNotFoundException =>
}
}
val buf = new java.io.ByteArrayOutputStream()
files foreach { f:File => packTo(buf,f) }
println(buf.size())
for(i <- 0 to buf.size()) {
for(j <- 0 to buf.size()) {
for(k <- 0 to buf.size()) {
// println("i " + i + " " + buf[i] );
// Calculate something amathing using buf[i] buf[j] buf[k]
}
}
}
println("amazing = " + ???)
but ByteArrayOutputStream can't get me as byte[] only copy of it. But I can not allow to have 2 copies of data .
Have you tried scala-io? Should be as simple as Resource.fromFile(f).byteArray with it.
Scala's built in library already provides a nice API to do this
io.Source.fromFile("/file/path").mkString.getBytes
However, it's not often a good idea to load whole file as byte array into memory. Do make sure the largest possible file can still fit into your JVM memory properly.
I began upgrading our layers to Roll Up 7 while we still were developing in another environment with TFS turned on. We were at say version 1850, and by the time I finished, we were at 1900. So the goal is to merge in the 50 different check-ins into the completed RU7 environment. Each check-in can contain many different objects, and each object is stored in TFS as an XPO somewhere.
My code is 90% of the way there, but the issue arrises when copying the files out of the temp directory. When I look in the temp directory, the files aren't there, but somehow they're able to be accessed.
static void Job33(Args _args)
{
#File
SysVersionControlSystem sysVersionControlSystem = versioncontrol.parmSysVersionControlSystem();
SysVersionControlTmpItem contents;
SysVersionControlTmpChange change;
SysVersionControlTmpChange changes;
int i;
SysVersionControlTmpItem contentsAddition;
SysVersionControlTmpItem contentsItem;
str writePath;
Set permissionSet = new Set(Types::Class);
str fileName;
int n;
;
change = versioncontrol.getChangesHistory();
// BP deviation documented
changes.setTmp();
changes.checkRecord(false);
changes.setTmpData(change);
while select changes
order by changes.ChangeNumber asc
where changes.ChangeNumber > 1850
{
writePath = #'C:\TEMP\' + int2str(changes.ChangeNumber) + #'\';
contentsAddition = versioncontrol.getChangeNumberContents(changes.ChangeNumber);
n = 0;
while select contentsAddition
{
// HOW DOES THIS LINE ACCESS THE FILE BUT MY METHOD CAN NOT??
contentsAddition.viewFile();
//?????????????
// Write to appropriate directory
if(!WinAPI::pathExists(writePath))
WinAPI::createDirectory(writePath);
n++;
fileName = int2str(changes.ChangeNumber) + '_' + int2str(n) + '.xpo';
if (WinAPI::fileExists(contentsAddition.fileName(), false))
{
// Write to appropriate directory
if(!WinAPI::pathExists(writePath))
WinAPI::createDirectory(writePath);
WinAPI::copyFile(contentsAddition.fileName(), writePath + fileName, true);
info(strfmt("|%1|%2|", contentsAddition.fileName(), writePath + fileName));
}
}
info(strfmt("%1", changes.ChangeNumber));
}
}
Buried in Classes\SysVersionControlFilebasedBackEndTfs there is a .Net assembly that is used. I was able to use this to extract what I needed mixed in with the upper code. After I used this...my code from above started working strangely enough??
Somehow there was a file lock on the folder that I copied TO, that just wouldn't let me delete it until I closed AX...no big deal, but it suggests there is a tfsProxy.close() method or something I should have called.
Microsoft.Dynamics.Morphx.TeamFoundationServer.Proxy tfsProxy = new Microsoft.Dynamics.Morphx.TeamFoundationServer.Proxy();
;
tfsProxy.DownloadFile(contentsAddition.InternalFilename, changes.ChangeNumber, writePath + fileName);
So you are trying to just get the objects that were changed so you can import them into the new RU7 environment? Why not do this within Visual Studio directly? You can pull the XPOs from there based on the history of changesets since you started the RU7 upgrade.
Also, you should use branching for this. It would have been easy to just branch the new code in that way. Something you should look into for the future.