Lucene.net read past EOF error during IndexWriter creation - lucene.net

I'm trying to implement Lucene.net in my C# application.
At this point i'm still at the very start: creating an index.
I use the following code:
var directory = new Lucene.Net.Store.SimpleFSDirectory(new System.IO.DirectoryInfo("d:\\tmp\\lucene-index\\"));
var analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
var writer = new Lucene.Net.Index.IndexWriter(directory, analyzer, true, Lucene.Net.Index.IndexWriter.MaxFieldLength.UNLIMITED);
I get an IOException on the writer initialization line.
The error message is "Read past EOF" and it occurs in the IndexInput class in the ReadInt() method.
The code does produce some files in the lucene-index directory (segments.gen and write.lock) but both are 0 bytes.
I tried to google for this problem but i can't find any good info about it.
Is there a Lucene.Net expert here who can help me?

Here's some code that I've used before. I think the problem that you are experiencing is with the SimpleFSDirectory.
var writer = new IndexWriter("SomePath", new StandardAnalyzer());
writer.SetMaxBufferedDocs(100);
writer.SetRAMBufferSizeMB(256);
// add your document here
writer.AddDocument( ... );
writer.Flush();
// the Optimize method is optional and is used by lucene to combine multiple index files
writer.Optimize();
writer.Close();

Related

DxlImporter inside a loop throws error " DXL importer operation failed"

I am having a java agent which loops through the view and gets the attachment from each document, The attachment is nothing but the .dxl file containing the document xml data. I am extracting the file at some temp directory and trying import the extracted .dxl as soon as it get extracted.
But the problem here is ,it only imports or works on first document's attachment in the loop and throws the error in java debug console
NotesException: DXL importer operation failed
at lotus.domino.local.DxlImporter.importDxl(Unknown Source)
at JavaAgent.NotesMain(Unknown Source)
at lotus.domino.AgentBase.runNotes(Unknown Source)
at lotus.domino.NotesThread.run(Unknown Source)
My java Agent code is
public class JavaAgent extends AgentBase {
static DxlImporter importer = null;
public void NotesMain() {
try {
Session session = getSession();
AgentContext agentContext = session.getAgentContext();
// (Your code goes here)
// Get current database
Database db = agentContext.getCurrentDatabase();
View v = db.getView("DXLProcessing_mails");
DocumentCollection dxl_tranfered_mail = v.getAllDocumentsByKey("dxl_tranfered_mail");
Document dxlDoc = dxl_tranfered_mail.getFirstDocument();
while(dxlDoc!=null){
RichTextItem rt = (RichTextItem) dxlDoc.getFirstItem("body");
Vector allObjects= rt.getEmbeddedObjects();
System.out.println("File name is "+ allObjects.get(0));
EmbeddedObject eo = dxlDoc.getAttachment(allObjects.get(0).toString());
if(eo.getFileSize()>0){
eo.extractFile(System.getProperty("java.io.tmpdir") + eo.getName());
System.out.println("Extracted File to "+System.getProperty("java.io.tmpdir") + eo.getName());
String filePath = System.getProperty("java.io.tmpdir") + eo.getName();
Stream stream = session.createStream();
if (stream.open(filePath) & (stream.getBytes() >0)) {
System.out.println("In If"+System.getProperty("java.io.tmpdir"));
importer = session.createDxlImporter();
importer.setDocumentImportOption(DxlImporter.DXLIMPORTOPTION_CREATE);
System.out.println("Break Point");
importer.importDxl(stream,db);
System.out.println("Imported Sucessfully");
}else{
System.out.println("In else"+stream.getBytes());
}
}
dxlDoc = dxl_tranfered_mail.getNextDocument();
}
} catch(Exception e) {
e.printStackTrace();
}
The code executes till it prints "Break Point" and throws the error but the attachment get imported for first time
In other case if i hard code the filePath for the specific dxl file from file system it imports the dxl as document in the database with no errors
I am wondering if it is the issue of the stream passed doesn't get completes and the next loop executes.
Any kind of suggestion will be helpful.
I can't see any part where your while loop would move on from the first document.
Usually you would have something like:
Document nextDoc = dxl_tranfered_mail.getNextDocument(dxlDoc);
dxlDoc.recycle();
dxlDoc = nextDoc;
Near the end of the loop to advance it to the next document. As your code currently stands it looks like it would never advance, and always be on the first document.
If you do not know about the need to 'recycle' domino objects I suggest you have a search for some blog posts articles that explain the need to do so.
It is a little complicated but basically, the Java Objects are just a 'wrapper' for the the objects in the C API.
Whenever you create a Domino Object (such as a Document, View, DocumentCollection etc.) a memory handle is allocated in the underlying 'C' layer. This needs to be released (or recycled) and it will eventually do so when the session is recycled, however when your are processing in a loop it is much more important to recycle as you can easily exhaust the available memory handles and cause a crash.
Also it's possible you may need to close (and recycle) each Stream after you a finished importing each file
Lastly, double check that the extracted file that is causing an exception is definitely a valid DXL file, it could simply be that some of the attachments are not valid DXL and will always throw an exception.
you could put a try/catch within the loop to handle that scenario (and report the problem files), which will allow the agent to continue without halting

Rebuild failed using PDF compression

Im trying to use the methods described bt kuujinbo here.
PDF Compression with iTextSharp
This is my code, and it results in this error:
"Rebuild failed: trailer not found.; Original message: PDF startxref not found."
PdfReader reader = new PdfReader(output.ToArray());
ReduceResolution(reader, 9);
// Save altered PDF. then you can pass the btye array to a database, etc
using (MemoryStream ms = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(reader, ms))
{
}
document.Close();
Response.ContentType = "application/pdf";
Response.AddHeader("Content-Disposition", string.Format("attachment;filename=Produktark-{0}.pdf", myItem.Key));
Response.BinaryWrite(output.ToArray());
}
What might I be missing?
An exception stating Rebuild failed: ...; Original message: ... is thrown by iText only during PdfReader initialization, i.e. in your case in the line
PdfReader reader = new PdfReader(output.ToArray());
and it indicates that the read data, i.e. output.ToArray(), does not constitute a valid PDF. You should write output.ToArray() to some file, too, and inspect it.
If you wonder why the message indicates that some Rebuild failed... you actually don't get the initial error but a follow-up one, the PDF digesting code has multiple blocks like this
try {
read some part of the PDF;
} catch(Exception) {
try {
try to repair that part of the PDF and read it;
} catch(Exception) {
throw "Rebuild failed: ...; Original message: ...";
}
}
In your case the part of interest was the cross reference table/stream and the issue was that the PDF startxref (a statement containing the offset of the cross reference start in the document) was not found.
When I receive this error message it is caused by not closing the PDFStamper that I am using to edit the form fields.
Stamper.Close();
Must call before closing the PDF or will throw specified error.

Entity framework extended throws DynamicProxy exception

When trying to do bulk updates using EntityFramework.Extended I get one of two exceptions.
Looking at the example I tried:
context.ProcessJobs.Where(job => true).Update(job => new ProcessJob
{
Status = ProcessJobStatus.Processing,
StatusTime = DateTime.Now,
LogString = "Processing"
});
I got the following exception:
'EntityFramework.Reflection.DynamicProxy' does not contain a definition for 'InternalQuery'
...
System.Core.dll!System.Dynamic.UpdateDelegates.UpdateAndExecute1(System.Runtime.CompilerServices.CallSite site, object arg0) + 0x153 bytes
EntityFramework.Extended.dll!EntityFramework.Extensions.ObjectQueryExtensions.ToObjectQuery(System.Linq.IQueryable query) + 0x2db bytes
EntityFramework.Extended.dll!EntityFramework.Extensions.BatchExtensions.Update(System.Linq.IQueryable source, System.Linq.Expressions.Expression> updateExpression) + 0xe9 bytes
EntityFramework.Extended.dll!EntityFramework.Extensions.BatchExtensions.Update(System.Linq.IQueryable source, System.Linq.Expressions.Expression> updateExpression) + 0xe9 bytes
Based on a github issue, I tried :
var c = ((IObjectContextAdapter) context).ObjectContext.CreateObjectSet<ProcessJob>();
c.Update(job => new ProcessJob
{
Status = ProcessJobStatus.Processing,
StatusTime = DateTime.Now,
LogString = "Processing"
});
Which results in the exception (probably same error as reported here)
'EntityFramework.Reflection.DynamicProxy' does not contain a definition for 'EnsureMetadata'
...
EntityFramework.Extended.dll!EntityFramework.Mapping.ReflectionMappingProvider.FindMappingFragment(System.Collections.Generic.IEnumerable itemCollection, System.Data.Entity.Core.Metadata.Edm.EntitySet entitySet) + 0xc1e bytes
EntityFramework.Extended.dll!EntityFramework.Mapping.ReflectionMappingProvider.CreateEntityMap(System.Data.Entity.Core.Objects.ObjectQuery query) + 0x401 bytes
EntityFramework.Extended.dll!EntityFramework.Mapping.ReflectionMappingProvider.GetEntityMap(System.Data.Entity.Core.Objects.ObjectQuery query) + 0x58 bytes
EntityFramework.Extended.dll!EntityFramework.Mapping.MappingResolver.GetEntityMap(System.Data.Entity.Core.Objects.ObjectQuery query) + 0x9f bytes
EntityFramework.Extended.dll!EntityFramework.Extensions.BatchExtensions.Update(System.Linq.IQueryable source, System.Linq.Expressions.Expression> updateExpression) + 0x1c8 bytes
I tried the latest version for EF5, and I upgraded to EF6 to see if the latest version works, but I get the same problem. We use Code First.
I am not sure how to proceed, I've started trying to understand how the EntityFramework.Extensions code works. But I am wondering whether I will have to fall back to using a stored procedure or SQL, which neither are ideal for our setup.
Does anyone know what these problems are, or have any ideas about how to work out what is going on?
It turns out that you can ignore this error. I had CLR runtime exceptions debug option turned on. I followed through the source code, and then downloaded it and started debugging.
It seems that the exception being thrown initially is expected and it retries with some other options. Unfortunately I didn't have time to look into the exact problem because I ran into another - but that's the subject of a different question.

Error inserting document into mongodb from scala

Trying to insert into a mongodb database from scala. the below codes dont create a db or collection. tried using the default test db too. how do i perform CRUD operations?
object Store {
def main(args: Array[String]) = {
def addMongo(): Unit = {
var mongo = new Mongo()
var db = mongo.getDB("mybd")
var coll = db.getCollection("somecollection")
var obj = new BasicDBObject()
obj.put("name", "Mongo")
obj.put("type", "db")
coll.insert(obj)
coll.save(obj)
println("Saved") //to print to console
}
}
On a first glance things look OK in your code although you have that stray def addMongo(): Unit = {
code at the top. I'll defer to a suggestion on looking for errors here.... Two items of note:
1) save() and insert() are complementary operations - you only need one. insert() will always attempt to create a new document ... save() will create one if the _id field isn't set, and update the represented _id if it does.
2) Mongo clients do not wait for an answer to a write operation by default. It is very possible & likely that an error is occurring within MongoDB causing your write to fail. the getLastError() command will return the result of the last write operation on the current connection. Because MongoDB's Java driver uses connection pools you have to tell it to lock you onto a single connection for the duration of an operation you want to run 'safely' (e.g. check result). This is the easiest way from the Java driver (in Scala, sample code wise, though):
mongo.requestStart() // lock the connection in
coll.insert(obj) // attempt the insert
getLastError.throwOnError() // This tells the getLastError command to throw an exception in case of an error
mongo.requestDone() // release the connection lock
Take a look at this excellent writeup on MongoDB's Write Durability, which focuses specifically on the Java Driver.
You may also want to take a look at the Scala driver I maintain (Casbah) which wraps the Java driver and provides more scala functionality.
We provide among other things an execute-around-method version of the safe write concept in safely() which makes things a lot easier for testing for writes' success.
You just missed the addMongo call in main. The fix is trivial:
object Store {
def main(args: Array[String]) = {
def addMongo(): Unit = {
var mongo = new Mongo()
var db = mongo.getDB("mybd")
var coll = db.getCollection("somecollection")
var obj = new BasicDBObject()
obj.put("name", "Mongo")
obj.put("type", "db")
coll.insert(obj)
coll.save(obj)
println("Saved") //to print to console
}
addMongo // call it!
}

Invalid attempt to call FieldCount when reader is closed

The error above occurs when I try to do a dataReader.Read on the data recieved from the database. I know there are two rows in there so it isnt because no data actually exists.
Could it be the CommandBehavior.CloseConnection, causing the problem? I was told you had to do this right after a ExecuteReader? Is this correct?
try
{
_connection.Open();
using (_connection)
{
SqlCommand command = new SqlCommand("SELECT * FROM Structure", _connection);
SqlDataReader dataReader = command.ExecuteReader(CommandBehavior.CloseConnection);
if (dataReader == null) return null;
var newData = new List<Structure>();
while (dataReader.Read())
{
var entity = new Structure
{
Id = (int)dataReader["StructureID"],
Path = (string)dataReader["Path"],
PathLevel = (string)dataReader["PathLevel"],
Description = (string)dataReader["Description"]
};
newData.Add(entity);
}
dataReader.Close();
return newData;
}
}
catch (SqlException ex)
{
AddError(new ErrorModel("An SqlException error has occured whilst trying to return descendants", ErrorHelper.ErrorTypes.Critical, ex));
return null;
}
catch (Exception ex)
{
AddError(new ErrorModel("An error has occured whilst trying to return descendants", ErrorHelper.ErrorTypes.Critical, ex));
return null;
}
finally
{
_connection.Close();
}
}
Thanks in advance for any help.
Clare
When you use the Using in C#, after the last } from the using, the Connection automatically close, thats why you get the fieldcount to be closed when u try to read him, as that is impossible, because u want those datas, read then before close the using, or u can open and close manually the connection, by not using the (using)
Your code, as displayed is fine. I've taken it into a test project, and it works. It's not immediately clear why you get this message with the code shown above. Here are some debugging tips/suggestions. I hope they're valuable for you.
Create a breakpoint on the while (dataReader.Read()). Before it enters its codeblock, enter this in your Immediate or Watch Window: dataReader.HasRows. That should evaluate to true.
While stopped on that Read(), open your Locals window to inspect all the properties of dataReader. Ensure that the FieldCount is what you expect from your SELECT statement.
When stepping into this Read() iteration, does a student object get created at all? What's the value of dataReader["StructureID"] and all others in the Immediate Window?
It's not the CommandBehavior.CloseConnection causing the problem. That simply tells the connection to also close itself when you close the datareader.
When I got that error, it happened to be a command timeout problem (I was reading some large binary data). As a first attempt, I increased the command timeout (not the connection timeout!) and the problem was solved.
Note: while attempting to find out the problem, I tried to listen to the (Sql)connection's StateChanged event, but it turned out that the connection never fall in a "broken" state.
Same problem here. Tested all the above solutions
increase command timeout
close the connection after read
Here's the code
1 objCmd.Connection.Open()
2 objCmd.CommandTimeout = 3000
3 Dim objReader As OleDbDataReader = objCmd.ExecuteReader()
4 repeater.DataSource = objReader
5 CType(repeater, Control).DataBind()
6 objReader.Close()
7 objCmd.Connection.Dispose()
Moreover, at line 4 objReader has Closed = False
I got this exception while using the VS.NET debugger and trying to examine some IQueryable results. Bad decision because the IQueryable resulted in a large table scan. Stopping and restarting the debugger and NOT trying to preview this particular IQueryable was the workaround.