Special character indexing - special-characters

Special character indexing - special-characters

i am creating a Lucene 3.0.3 index using StandardAnalyzer.
when searching is made on index using query like C, C# or C++ it gives same result for all these three term. As, i know while creating index analyzer ignore special character and do not create index for same.
Need to be able to differentiate between "C", "C#" and "C++"
please suggest me that, Is any existing analyzer will resolve this issue?
Any suggestion will be appreciated!!!

I guess that happens because of the fact that StandardAnalyzer uses StandardFilter, which uses StandardTokenizer, which removes special characters.
You could create your own Analyzer implementation.

See http://www.gossamer-threads.com/lists/lucene/java-user/91747?do=post_view_threaded#91747

Related

Reading CSV file with Spring batch and map to Domain objects based on the the first field and then insert them in DB accordingly [duplicate]

How can we implement pattern matching in Spring Batch, I am using org.springframework.batch.item.file.mapping.PatternMatchingCompositeLineMapper
I got to know that I can only use ? or * here to create my pattern.
My requirement is like below:
I have a fixed length record file and in each record I have two fields at 35th and 36th position which gives record type
for example below "05" is record type which is at 35th and 36th position and total length of record is 400.
0000001131444444444444445589868444050MarketsABNAKKAAAAKKKA05568551456...........
I tried to write regular expression but it does not work, i got to know only two special character can be used which are * and ? .
In that case I can only write like this
??????????????????????????????????05?????????????..................
but it does not seem to be good solution.
Please suggest how can I write this solution, Thanks a lot for help in advance

The PatternMatchingCompositeLineMapper uses an instance of org.springframework.batch.support.PatternMatcher to do the matching. It's important to note that PatternMatcher does not use true regular expressions. It uses something closer to ant patterns (the code is actually lifted from AntPathMatcher in Spring Core).
That being said, you have three options:
Use a pattern like you are referring to (since there is no short hand way to specify the number of ? that should be checked like there is in regular expressions).
Create your own composite LineMapper implementation that uses regular expressions to do the mapping.
For the record, if you choose option 2, contributing it back would be appreciated!

Get Entire List<String> from map in Drools

I need to get entire List from hashmap in Drools.I have inserted hashmap as fact in working memory. In the when part i need to get the list from hashmap.I am using drools 7 version
Map contains String,List of Strings
Please help.
Thanks

Most of Drools rules use MVEL syntax on when part. According to MVEL Language Guyide:
user["foobar"]
is the equivalent of the Java code:
user.get("foobar");

How to extend IErrorParser in eclipse to define own syntax checking?

My intention is to have own naming rules in eclipse editor for C programming
Ex: a function should start with File name, it shall contain maximum of 20 character- FILENAME_MaxOf20Char().
When it is violated has to show an warning.
To do this tried to extend org.eclipse.cdt.core.IErrorParser. But this one is parsing from compilor output.

IErrorParser is not the right extension point to use for this.
You want to use the Code Analysis (CodAn) framework and write a custom checker. See this page for documentation.

MongoDB the difference between db.getCollection.find and db.tablename.find?

What is the difference between:
db.getCollection('booking').find()
and
db.booking.find()
Are they exactly the same, or when should I use which one?
db.getCollection('booking').find({_id:"0J0DR"})
db.booking.find({_id:"0J0DR"})

Yes, they are exactly the same and you can use either.
The first form db.getCollection(collectionName).find() becomes handy when your collection name contains special characters that will otherwise render the other syntax redundant.
Example:
Suppose your collection has a name that begin with _ or matches a database shell method or has a space, then you can use db.getCollection("booking trips").find() or db["booking trips"].find() where doing db.booking trips.find() is impossible.

I prefer using db.collection() to either as it will work on nonexistent collections, which is particularly useful when for example creating the first user in a users collection that doesn't yet exist.
db.collection('users').findOneAndUpdate(...) // Won't throw even if the collection doesn't exist yet

In addition to the previous answers, on the shell, they might be exactly the same but in real IDE (like PyCharm), db.getCollection(collectionName) gives you back the whole doculment even with out the find() method.

Wildcard at the Beginning of a searchterm -Lucene

As far as i know lucene(.net) doesn't support the wildcard at the beginning of a searchterm
--> http://lucene.apache.org/java/2_0_0/queryparsersyntax.html
"Note: You cannot use a * or ? symbol as the first character of a search."
for example
*myword
maybe because it's quiet difficult to search "everything" before the searchterm.
Despite that, We are looknig for a way to use the wildcard at the beginning.
Does anyone know if this is possible?
One Thought was asearchterm, bsearchterm, ....z*searchterm
... but that seems a bit random to me.
thanks in advance

Your question is tagged with Lucene.NET so I assume you mean the .NET version rather than the Java version.
Yes, you can have wildcards at the beginning of a search term by via
var queryParser = new QueryParser(LuceneVersion, "content", new StandardAnalyzer(LuceneVersion));
queryParser.SetAllowLeadingWildcard(true);
but you need to be aware of the performance consequences. Find more detailed source code in this blog.
Since Lucene.NET is a port of the Java version, I suspect you could use the same approach for the Java version. I didn't verify this, though.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Special character indexing - special-characters

I guess that happens because of the fact that StandardAnalyzer uses StandardFilter, which uses StandardTokenizer, which removes special characters. You could create your own Analyzer implementation.

See http://www.gossamer-threads.com/lists/lucene/java-user/91747?do=post_view_threaded#91747

Related

Reading CSV file with Spring batch and map to Domain objects based on the the first field and then insert them in DB accordingly [duplicate]

Get Entire List<String> from map in Drools

How to extend IErrorParser in eclipse to define own syntax checking?

MongoDB the difference between db.getCollection.find and db.tablename.find?

Wildcard at the Beginning of a searchterm -Lucene

Categories

Resources