MongoDB search with colon in field name - mongodb

Below is a structure of Mongo entry I have that needs to come up in search
{
"_id" : BinData(0, "lAAXAUCdQp2sBJH7gbGaEku2Lt07G90MXQ7OsfMmRw+UGQ=="),
"subdata" : {
"abcd" : "qrst",
"qwer" : "asdf",
"abc:def" : "some value",
},
"abc" : "zxcv"
}
I am trying to do a search and update the field value for field name abc:def from 'some value' to 'true value' using following:
db.content.find({ "abc": "zxcv" }).forEach(
function(ae) {
ae.subdata.abc:def = "true value";
db.content.save(ae);
})
I see error "The selection cannot be executed as it contains following errors:
Error at line 3, position 16: :."
Any way to escape the ':' in abc:def?
If I do the same for ae.subdata.qwer the update is made, but not for ae.subdata.abc:def
The colon in the field name cannot be changed at the moment..
Please suggest

Related

Pig's AvroStorage LOAD removes unicode chars from input

I am using pig to read avro files and normalize/transform the data before writing back out. The avro files have records of the form:
{
"type" : "record",
"name" : "KeyValuePair",
"namespace" : "org.apache.avro.mapreduce",
"doc" : "A key/value pair",
"fields" : [ {
"name" : "key",
"type" : "string",
"doc" : "The key"
}, {
"name" : "value",
"type" : {
"type" : "map",
"values" : "bytes"
},
"doc" : "The value"
} ]
}
I have used the AvroTools command-line utility in conjunction with jq to dump the first record to JSON:
$ java -jar avro-tools-1.8.1.jar tojson part-m-00000.avro | ./jq --compact-output 'select(.value.pf_v != null)' | head -n 1 | ./jq .
{
"key": "some-record-uuid",
"value": {
"pf_v": "v1\u0003Basic\u0001slcvdr1rw\u001a\u0004v2\u0003DayWatch\u0001slcva2omi\u001a\u0004v3\u0003Performance\u0001slc1vs1v1w1p1g1i\u0004v4\u0003Fundamentals\u0001snlj1erwi\u001a\u0004v5\u0003My Portfolio\u0001svr1dews1b2b3k1k2\u001a\u0004v0\u00035"
}
}
I run the following pig commands:
REGISTER avro-1.8.1.jar
REGISTER json-simple-1.1.1.jar
REGISTER piggybank-0.15.0.jar
REGISTER jackson-core-2.8.6.jar
REGISTER jackson-databind-2.8.6.jar
DEFINE AvroLoader org.apache.pig.piggybank.storage.avro.AvroStorage();
AllRecords = LOAD 'part-m-00000.avro'
USING AvroLoader()
AS (key: chararray, value: map[]);
Records = FILTER AllRecords BY value#'pf_v' is not null;
SmallRecords = LIMIT Records 10;
DUMP SmallRecords;
The corresponding record for the last command above is as follows:
...
(some-record-uuid,[pf_v#v03v1Basicslcviv2DayWatchslcva2omiv3Performanceslc1vs1v1w1p1g1i])
...
As you can see the unicode chars have been removed from the pf_v value. The unicode characters are actually being used as delimiters in these values so I will need them in order to fully parse the records into their desired normalized state. The unicode characters are clearly present in the encoded .avro file (as demonstrated by dumping the file to JSON). Is anybody aware of a way to get AvroStorage to not remove the unicode chars when loading records?
Thank you!
Update:
I have also performed the same operation using Avro's python DataFileReader:
import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
reader = DataFileReader(open("part-m-00000.avro", "rb"), DatumReader())
for rec in reader:
if 'some-record-uuid' in rec['key']:
print rec
print '--------------------------------------------'
break
reader.close()
This prints a dict with what looks like hex chars substituted for the unicode chars (which is preferable to removing them entirely):
{u'value': {u'pf_v': 'v0\x033\x04v1\x03Basic\x01slcvi\x1a\x04v2\x03DayWatch\x01slcva2omi\x1a\x04v3\x03Performance\x01slc1vs1v1w1p1g1i\x1a'}, u'key': u'some-record-uuid'}

Handling double quotes in parameter with #Query annotation

I have a query implemented in my Elastic Search repository using the #Query annotation. One of the fields in the query is a String and I found that when I attempt to execute the query using a parameter that has double quotes in it, the query fails to parse.
It appears that the ElasticsearchStringQuery:88 class simply does a replacement of the parameter value into the string query without escaping possible quotes in the parameter. Am I doing something wrong or is this just a known limitation? If it is known, it seems odd that the caller must escape the quotes in the parameter before executing the finder method on the repository. This causes my code to be littered with double quote stripping or escaping and it leaks the query format externally (i.e. that double quotes are used to define the query and therefore can't be used in parameters).
An example query:
#Query("{ \"filtered\" : { \"query\" : { \"bool\" : { \"must\" : [ { \"match\" : { \"programTitle\" : { \"query\" : \"?0\", \"type\" : \"boolean\", \"operator\" : \"AND\", \"fuzziness\" : \"0\" } } }, { \"match\" : { \"title\" : { \"query\" : \"?2\", \"type\" : \"boolean\", \"operator\" : \"AND\",\"fuzziness\" : \"0\" } } } ] } }, \"filter\" : { \"term\" : { \"programAirDate\" : \"?1\" } } }}")
The query is being executed with the test case:
repository.save(piece);
List<Piece> actual = repository.findByProgramTitleAndProgramAirDateAndTitle(
piece.getProgramTitle(),
airDate.format(DateTimeFormatter.ISO_INSTANT),
piece.getTitle(), new PageRequest(0, 1));
If piece.getTitle() contains double quotes such as with Franz Joseph Haydn: String Quartet No. 53 "The Lark", the query fails with the error:
at [Source: [B#1696066a; line: 1, column: 410]]; }{[7x1Q86yNTdy0a0zHarqYVA][metapub][4]: SearchParseException[[metapub][4]: from[0],size[1]: Parse Failure [Failed to parse source [{"from":0,"size":1,"query_binary":"eyAgImZpbHRlcmVkIiA6IHsgICAgInF1ZXJ5IiA6IHsgICAgICAiYm9vbCIgOiB7ICAgICAgICAibXVzdCIgOiBbIHsgICAgICAgICAgIm1hdGNoIiA6IHsgICAgICAgICAgICAicHJvZ3JhbVRpdGxlIiA6IHsgICAgICAgICAgICAgICJxdWVyeSIgOiAiQmVzdCBQcm9ncmFtIEV2ZXIiLCAgICAgICAgICAgICAgInR5cGUiIDogImJvb2xlYW4iLCAgICAgICAgICAgICAgIm9wZXJhdG9yIiA6ICJBTkQiLCAgICAgICAgICAgICAgImZ1enppbmVzcyIgOiAiMCIgICAgICAgICAgICB9ICAgICAgICAgIH0gICAgICAgIH0sIHsgICAgICAgICAgIm1hdGNoIiA6IHsgICAgICAgICAgICAidGl0bGUiIDogeyAgICAgICAgICAgICAgInF1ZXJ5IiA6ICJGcmFueiBKb3NlcGggSGF5ZG46IFN0cmluZyBRdWFydGV0IE5vLiA1MyAiVGhlIExhcmsiIiwgICAgICAgICAgICAgICJ0eXBlIiA6ICJib29sZWFuIiwgICAgICAgICAgICAgICJvcGVyYXRvciIgOiAiQU5EIiwgICAgICAgICAgICAgICJmdXp6aW5lc3MiIDogIjAiICAgICAgICAgICAgfSAgICAgICAgICB9ICAgICAgICB9IF0gICAgICB9ICAgIH0sICAgICJmaWx0ZXIiIDogeyAgICAgICJ0ZXJtIiA6IHsgICAgICAgICJwcm9ncmFtQWlyRGF0ZSIgOiAiMjAxNi0wNC0wMVQxMzowMDowMFoiICAgICAgfSAgICB9ICB9fQ=="}]]];
nested: QueryParsingException[[metapub] Failed to parse]; nested: JsonParseException[Unexpected character ('T' (code 84)): was expecting comma to separate OBJECT entries
Decoding the base64 query I can see that the double quotes are causing the issue:
... "title" : { "query" : "Franz Joseph Haydn: String Quartet No. 53 "The Lark"", "type" : "boolean", "operator" : "AND", ...
I can make things work by removing all the double quotes (for example, piece.getTitle().replaceAll("\"", "")) but that seems like something the framework should handle. When using a JDBC repository, the user doesn't have to remove/replace special characters because the JDBC driver does it automatically.
So what is the best way to handle this?

mongodb find element within a hash within a hash

I am attempting to build a query to run from Mongo client that will allow access to the following element of a hash within a hash within a hash.
Here is the structure of the data:
"_id" : ObjectId("BSONID"),
"e1" : "value",
"e2" : "value",
"e3" : "value"),
"updated_at" : ISODate("2015-08-31T21:04:37.669Z"),
"created_at" : ISODate("2015-01-05T07:20:17.833Z"),
"e4" : 62,
"e5" : {
"sube1" : {
"26444745" : {
"subsube1" : "value",
"subsube2" : "value",
"subsube3" : "value I am looking for",
"subsube4" : "value",
"subsube5" : "value"
},
"40937803" : {
"subsube1" : "value",
"subsube2" : "value",
"subsube3" : "value I am looking for",
"subsube4" : "value",
"subsube5" : "value"
},
"YCPGF5SRTJV2TVVF" : {
"subsube1" : "value",
"subsube2" : "value",
"subsube3" : "value I am looking for",
"subsube4" : "value",
"subsube5" : "value"
}
}
}
So I have tried dotted notation based on a suggestion for "diving" into an wildcard named hash using db.my_collection.find({"e5.sube1.subsube4": "value I am looking for"}) which keeps coming back with an empty result set. I have also tried the find using a match instead of an exact value using /value I am lo/ and still an empty result set. I know there is at least 1 document which has the "value I am looking for".
Any ideas - note I am restricted to using the Mongo shell client.
Thanks.
So since this is not capable of being made into a javascript/mongo shell array I will go to plan B which is write some code be it Perl or Ruby and pull the result set into an array of hashes and walk each document/sub-document.
Thanks Mario for the help.
You have two issues:
You're missing one level.
You are checking subsube4 instead of subsube3
Depending on what subdocument of sube1 you want to check, you should do
db.my_collection.find({"e5.sube1.26444745.subsube4": "value I am looking for"})
or
db.my_collection.find({"e5.sube1.40937803.subsube4": "value I am looking for"})
or
db.my_collection.find({"e5.sube1.YCPGF5SRTJV2TVVF.subsube4": "value I am looking for"})
You could use the $or operator if you want to look in any one of the three.
If you don't know the keys of your documents, that's an issue with your schema design: you should use arrays instead of objects. Similar case: How to query a dynamic key - mongodb schema design
EDIT
Since you explain that you have a special request to know the count of "value I am looking for" only one time, we can run a map reduce. You can run those commands in the shell.
Define map function
var iurMapFunction = function() {
for (var key in this.e5.sube1) {
if (this.e5.sube1[key].subsube3 == "value I am looking for") {
var value = {
count: 1,
subkey: key
}
emit(key, value);
}
}
};
Define reduce function
var iurReduceFunction = function(keys, countObjVals) {
reducedVal = {
count: 0
};
for (var idx = 0; idx < countObjVals.length; idx++) {
reducedVal.count += countObjVals[idx].count;
}
return reducedVal;
};
Run mapreduce command
db.my_collection.mapReduce(iurMapFunction,
iurReduceFunction, {
out: {
replace: "map_reduce_result"
},
}
);
Find your counts
db.map_reduce_result.find()
This should give you, for each dynamic key in your object, the number of times it had an embedded field subsube3 with value value I am looking for.

Mongo query with regex fails when backslash\newline are present

I'm using MongoDB with Sails.
db.post.find( { body: {"$regex" : /^.*twitter.*$/i }}).
This query is supposed to find only posts which contain 'twitter' in their body-field.
For some reason it won't find a match if a backslash is present in the field (like a newline).
Is this known? What can I do about it?
Here are two examples to showcase the problem:
This document is returned by the find-command above:
{
"_id" : ObjectId("54e0d7eac2280519ac14cfda"),
"title" : "Cool Post",
"body" : "Twitter Twitter twitter twittor"
}
This document is not returned by the find-command above:
{
"_id" : ObjectId("54e0d7eac2280519ac14cfdb"),
"title" : "Cool Post with Linebreaks",
"body" : "This is a cool post with Twitter mentioned into it \n Foo bar"
}
Is this known?
No. Its not an issue with the MongoDb search Engine.
What can I do about it?
You could change your Regular Expression to:
var regex = new RegExp("twitter","i");
db.posts.find( { body: regex});
The problem with your code is the decimal point .
According to the doc,
.
(The decimal point) matches any single character except the newline
character.
which is why the string with a newline character is not taken to be a match.

MongoDB unable to eliminate certain fields from mongodb query result

I have got records in my collection as shown below
{
"_id" : ObjectId("53722c39e4b04a53021cf3c6"),
"symbol" : "AIA",
"tbq" : 1356,
"tsq" : 0,
"tquan" : 6831336,
"tvol" : 17331.78,
"bquantity" : 1356,
"squantity" : 0
}
{
"_id" : ObjectId("53722c38e4b04a53021cf3c1"),
"symbol" : "SAA",
"tbq" : 0,
"tsq" : 9200,
"tquan" : 6036143,
"tvol" : 50207.43,
"bquantity" : 0,
"squantity" : 9200
}
I am displaying the results in the ascending order of bquantity, and also at the same time I want to display only certain columns in the result (symbol, bquantity, squantity) and ignore the rest.
I tried with the below query, but still its displaying all the fields .
Please tell me how can i eliminate those fields from the result ?
db.stock.find().sort(
{"bquantity":-1},
{
symbol: 1,
bquantity: 1,
squantity:1 ,
_id:0,
tbq:0,
tsq:0,
tquan:0,
tvol:0
}
)
The field filter is a parameter to the find function, not to the sort function, i.e.:
db.stock.find({}, { _id:0,tbq:0, tsq:0,tquan:0,tvol:0}).sort({"bquantity":-1})
The empty hash used as first parameter to find is required as an 'empty query' which matches all documents in stock.