I have a query implemented in my Elastic Search repository using the #Query annotation. One of the fields in the query is a String and I found that when I attempt to execute the query using a parameter that has double quotes in it, the query fails to parse.
It appears that the ElasticsearchStringQuery:88 class simply does a replacement of the parameter value into the string query without escaping possible quotes in the parameter. Am I doing something wrong or is this just a known limitation? If it is known, it seems odd that the caller must escape the quotes in the parameter before executing the finder method on the repository. This causes my code to be littered with double quote stripping or escaping and it leaks the query format externally (i.e. that double quotes are used to define the query and therefore can't be used in parameters).
An example query:
#Query("{ \"filtered\" : { \"query\" : { \"bool\" : { \"must\" : [ { \"match\" : { \"programTitle\" : { \"query\" : \"?0\", \"type\" : \"boolean\", \"operator\" : \"AND\", \"fuzziness\" : \"0\" } } }, { \"match\" : { \"title\" : { \"query\" : \"?2\", \"type\" : \"boolean\", \"operator\" : \"AND\",\"fuzziness\" : \"0\" } } } ] } }, \"filter\" : { \"term\" : { \"programAirDate\" : \"?1\" } } }}")
The query is being executed with the test case:
repository.save(piece);
List<Piece> actual = repository.findByProgramTitleAndProgramAirDateAndTitle(
piece.getProgramTitle(),
airDate.format(DateTimeFormatter.ISO_INSTANT),
piece.getTitle(), new PageRequest(0, 1));
If piece.getTitle() contains double quotes such as with Franz Joseph Haydn: String Quartet No. 53 "The Lark", the query fails with the error:
at [Source: [B#1696066a; line: 1, column: 410]]; }{[7x1Q86yNTdy0a0zHarqYVA][metapub][4]: SearchParseException[[metapub][4]: from[0],size[1]: Parse Failure [Failed to parse source [{"from":0,"size":1,"query_binary":"eyAgImZpbHRlcmVkIiA6IHsgICAgInF1ZXJ5IiA6IHsgICAgICAiYm9vbCIgOiB7ICAgICAgICAibXVzdCIgOiBbIHsgICAgICAgICAgIm1hdGNoIiA6IHsgICAgICAgICAgICAicHJvZ3JhbVRpdGxlIiA6IHsgICAgICAgICAgICAgICJxdWVyeSIgOiAiQmVzdCBQcm9ncmFtIEV2ZXIiLCAgICAgICAgICAgICAgInR5cGUiIDogImJvb2xlYW4iLCAgICAgICAgICAgICAgIm9wZXJhdG9yIiA6ICJBTkQiLCAgICAgICAgICAgICAgImZ1enppbmVzcyIgOiAiMCIgICAgICAgICAgICB9ICAgICAgICAgIH0gICAgICAgIH0sIHsgICAgICAgICAgIm1hdGNoIiA6IHsgICAgICAgICAgICAidGl0bGUiIDogeyAgICAgICAgICAgICAgInF1ZXJ5IiA6ICJGcmFueiBKb3NlcGggSGF5ZG46IFN0cmluZyBRdWFydGV0IE5vLiA1MyAiVGhlIExhcmsiIiwgICAgICAgICAgICAgICJ0eXBlIiA6ICJib29sZWFuIiwgICAgICAgICAgICAgICJvcGVyYXRvciIgOiAiQU5EIiwgICAgICAgICAgICAgICJmdXp6aW5lc3MiIDogIjAiICAgICAgICAgICAgfSAgICAgICAgICB9ICAgICAgICB9IF0gICAgICB9ICAgIH0sICAgICJmaWx0ZXIiIDogeyAgICAgICJ0ZXJtIiA6IHsgICAgICAgICJwcm9ncmFtQWlyRGF0ZSIgOiAiMjAxNi0wNC0wMVQxMzowMDowMFoiICAgICAgfSAgICB9ICB9fQ=="}]]];
nested: QueryParsingException[[metapub] Failed to parse]; nested: JsonParseException[Unexpected character ('T' (code 84)): was expecting comma to separate OBJECT entries
Decoding the base64 query I can see that the double quotes are causing the issue:
... "title" : { "query" : "Franz Joseph Haydn: String Quartet No. 53 "The Lark"", "type" : "boolean", "operator" : "AND", ...
I can make things work by removing all the double quotes (for example, piece.getTitle().replaceAll("\"", "")) but that seems like something the framework should handle. When using a JDBC repository, the user doesn't have to remove/replace special characters because the JDBC driver does it automatically.
So what is the best way to handle this?
Related
Below is a structure of Mongo entry I have that needs to come up in search
{
"_id" : BinData(0, "lAAXAUCdQp2sBJH7gbGaEku2Lt07G90MXQ7OsfMmRw+UGQ=="),
"subdata" : {
"abcd" : "qrst",
"qwer" : "asdf",
"abc:def" : "some value",
},
"abc" : "zxcv"
}
I am trying to do a search and update the field value for field name abc:def from 'some value' to 'true value' using following:
db.content.find({ "abc": "zxcv" }).forEach(
function(ae) {
ae.subdata.abc:def = "true value";
db.content.save(ae);
})
I see error "The selection cannot be executed as it contains following errors:
Error at line 3, position 16: :."
Any way to escape the ':' in abc:def?
If I do the same for ae.subdata.qwer the update is made, but not for ae.subdata.abc:def
The colon in the field name cannot be changed at the moment..
Please suggest
I'm trying to find all lists which the person number has '-' or '.' inside of it. I already tried this answer, but it's not working for elements inside of the array.
But when I try to find by the entire String, without the regex notation, the document is found.
Per example:
db.getCollection('list').find({"persons.number": "123456789"}) //works!
db.getCollection('list').find({"persons.number": /3/}) //not work...
db.getCollection('list').find({"persons.number": /.*3.*/}) //not work
db.getCollection('list').find({"persons.number": /.*..*/}) //not work
db.getCollection('list').find({"persons.number": /.*[-\.]+.*/}) //not work
If I try to find the document by some attribute outside of the array (an attribute from the list, per example), the /3/, /.*3.*/ and /.*[-\.]+.*/ works.
Document format:
{
"_id" : ObjectId("5af3037ee8006c4a04e84b2f"),
"id" : 1,
"persons" : [
{
"id" : 1,
"number" : "123.123.123-22"
},
{
"id" : 2,
"number" : "123.456.789-11"
}
]
}
So, what are the options?
I'm using the MongoDB from Azure. Executing the db.version() on console returns 3.2.0.
The regex .*[-\.]+.* should work,
Try this,
db.getCollection('list').find({"persons.number": /.*[-\.]+.*/})
For searching multiple patterns , i.e an OR of patterns the regex format is little different.
(pattern1)|(pattern2)
Tried the below mongo query on my local running 3.4.6 , but it should work on 3.2.x as well
db.list.aggregate([{"$unwind":{"path":"$persons"}},{"$project":{"_id":1,"persons.number":1}},{"$match":{"persons.number":{"$regex":/.*(\.)|(\-).*/}}},{"$group":{_id:"$_id"}}])
I am using pig to read avro files and normalize/transform the data before writing back out. The avro files have records of the form:
{
"type" : "record",
"name" : "KeyValuePair",
"namespace" : "org.apache.avro.mapreduce",
"doc" : "A key/value pair",
"fields" : [ {
"name" : "key",
"type" : "string",
"doc" : "The key"
}, {
"name" : "value",
"type" : {
"type" : "map",
"values" : "bytes"
},
"doc" : "The value"
} ]
}
I have used the AvroTools command-line utility in conjunction with jq to dump the first record to JSON:
$ java -jar avro-tools-1.8.1.jar tojson part-m-00000.avro | ./jq --compact-output 'select(.value.pf_v != null)' | head -n 1 | ./jq .
{
"key": "some-record-uuid",
"value": {
"pf_v": "v1\u0003Basic\u0001slcvdr1rw\u001a\u0004v2\u0003DayWatch\u0001slcva2omi\u001a\u0004v3\u0003Performance\u0001slc1vs1v1w1p1g1i\u0004v4\u0003Fundamentals\u0001snlj1erwi\u001a\u0004v5\u0003My Portfolio\u0001svr1dews1b2b3k1k2\u001a\u0004v0\u00035"
}
}
I run the following pig commands:
REGISTER avro-1.8.1.jar
REGISTER json-simple-1.1.1.jar
REGISTER piggybank-0.15.0.jar
REGISTER jackson-core-2.8.6.jar
REGISTER jackson-databind-2.8.6.jar
DEFINE AvroLoader org.apache.pig.piggybank.storage.avro.AvroStorage();
AllRecords = LOAD 'part-m-00000.avro'
USING AvroLoader()
AS (key: chararray, value: map[]);
Records = FILTER AllRecords BY value#'pf_v' is not null;
SmallRecords = LIMIT Records 10;
DUMP SmallRecords;
The corresponding record for the last command above is as follows:
...
(some-record-uuid,[pf_v#v03v1Basicslcviv2DayWatchslcva2omiv3Performanceslc1vs1v1w1p1g1i])
...
As you can see the unicode chars have been removed from the pf_v value. The unicode characters are actually being used as delimiters in these values so I will need them in order to fully parse the records into their desired normalized state. The unicode characters are clearly present in the encoded .avro file (as demonstrated by dumping the file to JSON). Is anybody aware of a way to get AvroStorage to not remove the unicode chars when loading records?
Thank you!
Update:
I have also performed the same operation using Avro's python DataFileReader:
import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
reader = DataFileReader(open("part-m-00000.avro", "rb"), DatumReader())
for rec in reader:
if 'some-record-uuid' in rec['key']:
print rec
print '--------------------------------------------'
break
reader.close()
This prints a dict with what looks like hex chars substituted for the unicode chars (which is preferable to removing them entirely):
{u'value': {u'pf_v': 'v0\x033\x04v1\x03Basic\x01slcvi\x1a\x04v2\x03DayWatch\x01slcva2omi\x1a\x04v3\x03Performance\x01slc1vs1v1w1p1g1i\x1a'}, u'key': u'some-record-uuid'}
This is the case: A webshop in which I want to configure which items should be listed in the sjop based on a set of parameters.
I want this to be configurable, because that allows me to experiment with different parameters also change their values easily.
I have a Product collection that I want to query based on multiple parameters.
A couple of these are found here:
within product:
"delivery" : {
"maximum_delivery_days" : 30,
"average_delivery_days" : 10,
"source" : 1,
"filling_rate" : 85,
"stock" : 0
}
but also other parameters exist.
An example of such query to decide whether or not to include a product could be:
"$or" : [
{
"delivery.stock" : 1
},
{
"$or" : [
{
"$and" : [
{
"delivery.maximum_delivery_days" : {
"$lt" : 60
}
},
{
"delivery.filling_rate" : {
"$gt" : 90
}
}
]
},
{
"$and" : [
{
"delivery.maximum_delivery_days" : {
"$lt" : 40
}
},
{
"delivery.filling_rate" : {
"$gt" : 80
}
}
]
},
{
"$and" : [
{
"delivery.delivery_days" : {
"$lt" : 25
}
},
{
"delivery.filling_rate" : {
"$gt" : 70
}
}
]
}
]
}
]
Now to make this configurable, I need to be able to handle boolean logic, parameters and values.
So, I got the idea, since such query itself is JSON, to store it in Mongo and have my Java app retrieve it.
Next thing is using it in the filter (e.g. find, or whatever) and work on the corresponding selection of products.
The advantage of this approach is that I can actually analyse the data and the effectiveness of the query outside of my program.
I would store it by name in the database. E.g.
{
"name": "query1",
"query": { the thing printed above starting with "$or"... }
}
using:
db.queries.insert({
"name" : "query1",
"query": { the thing printed above starting with "$or"... }
})
Which results in:
2016-03-27T14:43:37.265+0200 E QUERY Error: field names cannot start with $ [$or]
at Error (<anonymous>)
at DBCollection._validateForStorage (src/mongo/shell/collection.js:161:19)
at DBCollection._validateForStorage (src/mongo/shell/collection.js:165:18)
at insert (src/mongo/shell/bulk_api.js:646:20)
at DBCollection.insert (src/mongo/shell/collection.js:243:18)
at (shell):1:12 at src/mongo/shell/collection.js:161
But I CAN STORE it using Robomongo, but not always. Obviously I am doing something wrong. But I have NO IDEA what it is.
If it fails, and I create a brand new collection and try again, it succeeds. Weird stuff that goes beyond what I can comprehend.
But when I try updating values in the "query", changes are not going through. Never. Not even sometimes.
I can however create a new object and discard the previous one. So, the workaround is there.
db.queries.update(
{"name": "query1"},
{"$set": {
... update goes here ...
}
}
)
doing this results in:
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 52,
"errmsg" : "The dollar ($) prefixed field '$or' in 'action.$or' is not valid for storage."
}
})
seems pretty close to the other message above.
Needles to say, I am pretty clueless about what is going on here, so I hope some of the wizzards here are able to shed some light on the matter
I think the error message contains the important info you need to consider:
QUERY Error: field names cannot start with $
Since you are trying to store a query (or part of one) in a document, you'll end up with attribute names that contain mongo operator keywords (such as $or, $ne, $gt). The mongo documentation actually references this exact scenario - emphasis added
Field names cannot contain dots (i.e. .) or null characters, and they must not start with a dollar sign (i.e. $)...
I wouldn't trust 3rd party applications such as Robomongo in these instances. I suggest debugging/testing this issue directly in the mongo shell.
My suggestion would be to store an escaped version of the query in your document as to not interfere with reserved operator keywords. You can use the available JSON.stringify(my_obj); to encode your partial query into a string and then parse/decode it when you choose to retrieve it later on: JSON.parse(escaped_query_string_from_db)
Your approach of storing the query as a JSON object in MongoDB is not viable.
You could potentially store your query logic and fields in MongoDB, but you have to have an external app build the query with the proper MongoDB syntax.
MongoDB queries contain operators, and some of those have special characters in them.
There are rules for mongoDB filed names. These rules do not allow for special characters.
Look here: https://docs.mongodb.org/manual/reference/limits/#Restrictions-on-Field-Names
The probable reason you can sometimes successfully create the doc using Robomongo is because Robomongo is transforming your query into a string and properly escaping the special characters as it sends it to MongoDB.
This also explains why your attempt to update them never works. You tried to create a document, but instead created something that is a string object, so your update conditions are probably not retrieving any docs.
I see two problems with your approach.
In following query
db.queries.insert({
"name" : "query1",
"query": { the thing printed above starting with "$or"... }
})
a valid JSON expects key, value pair. here in "query" you are storing an object without a key. You have two options. either store query as text or create another key inside curly braces.
Second problem is, you are storing query values without wrapping in quotes. All string values must be wrapped in quotes.
so your final document should appear as
db.queries.insert({
"name" : "query1",
"query": 'the thing printed above starting with "$or"... '
})
Now try, it should work.
Obviously my attempt to store a query in mongo the way I did was foolish as became clear from the answers from both #bigdatakid and #lix. So what I finally did was this: I altered the naming of the fields to comply to the mongo requirements.
E.g. instead of $or I used _$or etc. and instead of using a . inside the name I used a #. Both of which I am replacing in my Java code.
This way I can still easily try and test the queries outside of my program. In my Java program I just change the names and use the query. Using just 2 lines of code. It simply works now. Thanks guys for the suggestions you made.
String documentAsString = query.toJson().replaceAll("_\\$", "\\$").replaceAll("#", ".");
Object q = JSON.parse(documentAsString);
I'm using MongoDB with Sails.
db.post.find( { body: {"$regex" : /^.*twitter.*$/i }}).
This query is supposed to find only posts which contain 'twitter' in their body-field.
For some reason it won't find a match if a backslash is present in the field (like a newline).
Is this known? What can I do about it?
Here are two examples to showcase the problem:
This document is returned by the find-command above:
{
"_id" : ObjectId("54e0d7eac2280519ac14cfda"),
"title" : "Cool Post",
"body" : "Twitter Twitter twitter twittor"
}
This document is not returned by the find-command above:
{
"_id" : ObjectId("54e0d7eac2280519ac14cfdb"),
"title" : "Cool Post with Linebreaks",
"body" : "This is a cool post with Twitter mentioned into it \n Foo bar"
}
Is this known?
No. Its not an issue with the MongoDb search Engine.
What can I do about it?
You could change your Regular Expression to:
var regex = new RegExp("twitter","i");
db.posts.find( { body: regex});
The problem with your code is the decimal point .
According to the doc,
.
(The decimal point) matches any single character except the newline
character.
which is why the string with a newline character is not taken to be a match.