Can't delete document on Lucene.Net - lucene.net

I am trying to delete a document but i am unable to delete it in any way. A specific thing that is related to my example that i am using RAMDirectory as directory and i am using Lucene.Net 3.0.3 version. My example is as below.
public void DeleteIndex(IndexWriter writer,IndexSearcher searcher)
{
var boolQuery = new BooleanQuery();
boolQuery.Add(new TermQuery(new Term("Id", "2")), Occur.MUST);
boolQuery.Add(new TermQuery(new Term("Type", "Product")), Occur.MUST);
writer.DeleteDocuments(boolQuery);
writer.Optimize(true);
//writer.Flush(true, true, true);//even this line doesn't help
writer.Commit();
var result = searcher.Search(boolQuery,1); // I can access deleted doc in search results
}

After writer.Commit(); you need to reopen you searcher.
IndexReader newReader = YOURIndexReader.Reopen(true);
searcher= new IndexSearcher(newReader );
...
The code example here are only examples, not a working code(!), i'm sure you can continue from here...

Related

iTextSharp and Hyphenation

In earlier versions of iTextSharp, I have incorporated hyphenation in the following way (example is for German hyphenation):
HyphenationAuto autoDE = new HyphenationAuto("de", "DR", 3, 3);
BaseFont.AddToResourceSearch(RuntimePath + "itext-hyph-xml.dll");
chunk = new Chunk(text).SetHyphenation(autoDE);
In recent versions of iText, this is no longer possible as the function
BaseFont.AddToResourceSearch()
has been removed from iText. Now how to replace this statement?
When inspecting the 2nd ed. of the iText IN ACTION manual, the statement need not be replaced at all, apparently. When doing so, however, no hyphenation takes place (and no errors occur). I also have taken a newer version of
itext-hyph-xml.dll
and re-referenced it. Same result, no hyphenation. This file resides on the same path as iTextSharp.dll, and I have included the path in the CLASSPATH environment variable. Nothing helps. I'm stuck, please help.
Calling iTextSharp.text.io.StreamUtil.AddToResourceSearch() works for me:
var content = #"
Allein ist besser als mit Schlechten im Verein: mit Guten im Verein, ist besser als allein.
";
var table = new PdfPTable(1);
// make sure .dll is in correct /bin directory
StreamUtil.AddToResourceSearch("itext-hyph-xml.dll");
using (var stream = new MemoryStream())
{
using (var document = new Document(PageSize.A8.Rotate()))
{
PdfWriter.GetInstance(document, stream);
document.Open();
var chunk = new Chunk(content)
.SetHyphenation(new HyphenationAuto("de", "DR", 3, 3));
table.AddCell(new Phrase(chunk));
document.Add(table);
}
File.WriteAllBytes(OUT_FILE, stream.ToArray());
}
Tested with iTextSharp 5.5.11 and itext-hyph-xml 2.0.0.0. Output PDF:

Having Difficulty Using MongoDb C# Driver's Sample()

I am trying to get some random items from the database using the Sample() method. I updated to the latest version of the driver and copied the using statements from the linked example. However, something isn't working, and I am hoping it's some simple mistake on my part. All the relevant information is in the image below:
Edit:
Greg, I read the aggregation docs here and the raw db method doc here, but I still don't get it. Last two lines are my attempts, I have never used aggregation before:
var mongoCollection = GetMongoCollection<BsonDocument>(collectionName);
long[] locationIds = new long[] { 1, 2, 3, 4, 5 };
var locationsArray = new BsonArray();
foreach (long l in locationIds) { locationsArray.Add(l); };
FilterDefinition<BsonDocument> sampleFilter = Builders<BsonDocument>.Filter.In("LocationId", locationsArray);
var findSomething = mongoCollection.Find(sampleFilter); //this works, the two attempts below don't.
var aggregateSomething = mongoCollection.Aggregate(sampleFilter).Sample(25);
var aggregateSomething2 = mongoCollection.Aggregate().Match(sampleFilter).Sample(25);
Sample is only available from an aggregation. You need to start with Aggregate, not Find. I believe it's also available in Linq.
UPDATE:
Looks like we don't have a method for it specifically. However, you can use the AppendStage method.
mongoCollection.Aggregate(sampleFilter)
.AppendStage<BsonDocument>("{ $sample: { size: 3 } }");

Lucene Duplicated results / spaces in text search

I’m actually using Lucene 2.9.4.1 and everything works just fine if I search for something that exists just once in the same line.
Per instance, if Lucene find the same string that I’m looking for in the same line, I have duplicated (or more) results.
I’m actually using the following BooleanQuery:
booleanQuery.Add(new TermQuery(new Term(propertyInfo.Name, textSearch)), BooleanClause.Occur.SHOULD);
The second issue is about searching by something with spaces like “hello world”: never works.
Can anyone advise me or help me with these two malfunctioning features, please?
Thank you so much in advance,
Best regards,
Well, I just found the answer that solved both of my issues =)
I was using this:
BooleanQuery booleanQuery = new BooleanQuery();
PropertyInfo[] propertyInfos = typeof(T).GetProperties();
foreach (PropertyInfo propertyInfo in propertyInfos)
{
booleanQuery.Add(new TermQuery(new Term(propertyInfo.Name, textSearch)), BooleanClause.Occur.SHOULD);
}
And now I use this:
var booleanQuery = new BooleanQuery();
textSearch = QueryParser.Escape(textSearch.Trim().ToLower());
string[] properties = typeof(T).GetProperties().Select(x => x.Name).ToArray();
Analyzer analyzer = new StandardAnalyzer(global::Lucene.Net.Util.Version.LUCENE_29);
MultiFieldQueryParser titleParser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, properties, analyzer);
Query titleQuery = titleParser.Parse(textSearch);
booleanQuery.Add(titleQuery, BooleanClause.Occur.SHOULD);
It seems that Analyzer and MultiFieldQueryParser are the solution for my problems: no more duplicated results, I can search by something with spaces and … the performance as significantly raised up (faster results) =)

Lucene.Net - weird behaviour in different servers

I was writing a search for one of our sites: (SITE A)
BooleanQuery booleanQuery = new BooleanQuery();
foreach (var field in fields)
{
QueryParser qp = new QueryParser(field, new StandardAnalyzer());
Query query = qp.Parse(search.ToLower() + "*");
if (field.Contains("Title")) { query.SetBoost((float)1.8); }
booleanQuery.Add(query, BooleanClause.Occur.SHOULD);
}
// CODE DIFFERENCE IS HERE
Query query2 = new TermQuery(new Term("StateProperties.IsActive", "True"));
booleanQuery.Add(query2, BooleanClause.Occur.MUST);
// END CODE DIFFERENCE
Lucene.Net.Search.TopScoreDocCollector collector = Lucene.Net.Search.TopScoreDocCollector.create(21, true);
searcher.Search(booleanQuery, collector);
hits = collector.TopDocs().scoreDocs;
this was working as expected.
since we own a few sites, and they use the same skeleton,
i uploaded the search to another site ( SITE B )
but the search stopped returning results.
after playing a round a bit with the code, i managed to make it work like so: (showing only the rewriten lines of code)
QueryParser qp2 = new QueryParser("StateProperties.IsActive", new StandardAnalyzer());
Query query2 = qp2.Parse("True");
booleanQuery.Add(query2, BooleanClause.Occur.MUST);
anyone knows why this is happening ?
i have checked the Lucene dll version, and its the same version in both sites (2.9.2.2)
is the code i have written in SITE A wrong ? is SITE B code wrong ?
is this my fault at all ? can production server infulance something like this ?
Doesn't they have individual indexes on disk? If they have been indexed differently, they would also return different results. One thing that comes to mind is if there is some sort of case sensitivity that matters, becayse a TermQuery will look for an EXACT match, where as the parser will try to tokenize/filter the search term according to the analyzer (and probably search for "true" instead of "True".

merge word documents to a single document

I used the code in the link mentioned below to merge word files into a single file
http://devpinoy.org/blogs/keithrull/archive/2007/06/09/updated-how-to-merge-multiple-microsoft-word-documents.aspx
However, seeing the output file i realized that it was unable to copy header image in the first document. How do we merge documents preserving format and content.
I will suggest to use GroupDocs.Merger Cloud for merging multiple word document to a single word document, it keeps the formatting and contents of the source documents. It is a platform independent REST API solution without depending on any third-party tool or software.
Sample C# code:
var configuration = new GroupDocs.Merger.Cloud.Sdk.Client.Configuration(MyAppSid, MyAppKey);
var apiInstance_Document = new GroupDocs.Merger.Cloud.Sdk.Api.DocumentApi(configuration);
var apiInstance_File = new GroupDocs.Merger.Cloud.Sdk.Api.FileApi(configuration);
var pathToSourceFiles = #"C:/Temp/input/";
var remoteFolder = "Temp/";
var joinItem_list = new List<JoinItem>();
try
{
DirectoryInfo dir = new DirectoryInfo(pathToSourceFiles);
System.IO.FileInfo[] files = dir.GetFiles();
foreach (System.IO.FileInfo file in files)
{
var request_upload = new GroupDocs.Merger.Cloud.Sdk.Model.Requests.UploadFileRequest(remoteFolder + file.Name, File.Open(file.FullName, FileMode.Open));
var response_upload = apiInstance_File.UploadFile(request_upload);
var item = new JoinItem
{
FileInfo = new GroupDocs.Merger.Cloud.Sdk.Model.FileInfo
{ FilePath = remoteFolder + file.Name }
};
joinItem_list.Add(item);
}
var options = new JoinOptions
{
JoinItems = joinItem_list,
OutputPath = remoteFolder + "Merged_Document.docx"
};
var request = new JoinRequest(options);
var response = apiInstance_Document.Join(request);
Console.WriteLine("Output file path: " + response.Path);
}
catch (Exception e)
{
Console.WriteLine("Exception while Merging Documents: " + e.Message);
}
That code is inserting a page break after each file.
Since sections control headers, if a second or subsequent document has a header, you'll probably be wanting to keep the original section properties, and insert those after your first document.
If you look at your original document as a docx, you'll probably see that your section is a document level section properties element.
The easiest way around your problem may be to create a second section properties element inside the last paragraph (which contains the header information). Then this should just stay there when the documents are merged (ie other paragraphs added after it).
That's the theory. See also http://www.pcreview.co.uk/forums/thread-898133.php
But I haven't tried it; it assumes InsertFile behaves as I expect it should.