I have collection full of the following documents, for example:
{
"_id" : ObjectId("2819738917238dgd21873"),
"mailIntid" : 10000000,
"mailCreated" : "2019-02-08",
"mailLastModified" : null,
"mailReceived" : "2019-02-08",
"mailSend" : "2019-02-08",
"hadAttachment" : false,
"subject" : "nieuwe vacature ",
"bodyPreview" : "Test Body",
"importance" : "normal",
"isDeliveryRequested" : null,
"isReadReceiptRequested" : false,
"isRead" : false,
"isDraft" : false,
"inferenceClassification" : "focused",
"bodyContentType" : "html",
"senderName" : "Jobs",
"senderEmail" : "noreply#test.nl",
"fromName" : "Jobs",
"fromEmail" : "noreply#mailing.test.nl",
"flagStatus" : "notFlagged",
"urls" : "https://test.nl/request-details?id=1337",
"insertDate" : "2019-02-09",
"modifiedDate" : "2019-05-05",
"parseComplete" : true,
"rawMailUrl" : [
{
"url" : "https://test.nl/request-details?id=1337",
"parsed" : false
}
]
}
I created a index with the following code:
db.testAB.createIndex(
{"rawMailUrl.parsed": 1},
{partialFilterExpression: { "rawMailUrl.parsed": false}}
)
But whenever I use the following query, it's not using the index from above.
db.testAB.find({"rawMailUrl.parsed": false})
Any ideas? Am I doing something wrong with creating a index with arrays and a true or false expression?
You defined an index for partialFilterExpression: { "rawMailUrl.parsed": false}, i.e. all records in your index have the same value false.
An index with low number of distinct values is a bad index, it does not improve the access path. Thus the index is not used, it is just a waste of disc space.
Related
In one of my collection I am having deleted records marked as trash: true, Now can I index only records not marked as trash. What indexing strategy should I use here.
{
"_id" : ObjectId("5ad5eb0b9590e413debd81d6"),
"voucher" : ObjectId("5a731de8ffd6f75b1471742f"),
"brand" : ObjectId("58e3b1e1b473f74462d6c9b2"),
"code" : "6DZQ34KD",
"purchaseDate" : ISODate("2018-04-17T12:39:39.287Z"),
"owner" : ObjectId("5ac3460b8dd26664ab57cb66"),
"isInWishlist" : true,
"expiryDate" : ISODate("2018-04-30T20:59:59.000Z"),
"isUsed" : false,
"trash" : true,
"updatedAt" : ISODate("2018-04-17T12:39:39.800Z"),
"createdAt" : ISODate("2018-04-17T12:39:39.288Z"),
"__v" : 0
}
You can create a partial index.
E.g. to index documents with "trash": false by brand:
db.collection.createIndex(
{ brand: 1},
{ partialFilterExpression: { trash: false } }
)
Please note the documents without trash field will also be excluded from the index.
I have this array of documents, I would like to put "table" on the same level like mastil_antenas and other variables. how Can I do that with aggregate?
I'm trying with the aggregate $project but I can't get the result.
Example of Data
[ {
"mastil_antena" : "1",
"nro_platf" : "1",
"antmarcmast" : "ANDREW",
"antmodelmast" : "HWXXX6516DSA3M",
"retmarcmast" : "Ericsson",
"retmodelmast" : "ATM200-A20",
"distmast" : "1.50",
"altncramast" : "41.30",
"ORIENTMAG" : "73.00",
"incelecmast" : "RET",
"incmecmast" : "1.00",
"Feedertypemast" : "Fibra Optica",
"longjumpmast" : "5.00",
"longfo" : "100",
"calibrecablefuerza" : "10 mm",
"longcablefuerza" : "65.00",
"modelorruantena" : "32B66A",
"tiltmecfoto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017114934746000.jpg",
"tiltmecfoto_fh" : "2017-10-18T05:51:22Z",
"az0foto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017115012727000.jpg",
"az0foto_fh" : "2017-10-18T05:55:21Z",
"azneg60foto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017115016199000.jpg",
"azneg60foto_fh" : "2017-10-18T05:55:36Z",
"azpos60foto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017115020147000.jpg",
"azpos60foto_fh" : "2017-10-18T05:55:49Z",
"etiqantenafoto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017114920853000.jpg",
"etiqantenafoto_fh" : "2017-10-18T05:56:01Z",
"tiltelectfoto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017114914236000.jpg",
"tiltelectfoto_fh" : "2017-10-18T05:56:13Z",
"idcablefoto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017114900279000.jpg",
"idcablefoto_fh" : "2017-10-18T05:56:38Z",
"rrutmafoto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017114947279000.jpg",
"rrutmafoto_fh" : "2017-10-18T05:56:49Z",
"etiquetarrufoto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017114954648000.jpg",
"etiquetarrufoto_fh" : "2017-10-18T05:57:02Z",
"rrutmafoto1" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017114959738000.jpg",
"rrutmafoto1_fh" : "2017-10-18T05:57:12Z",
"etiquetarrufoto1" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017115005545000.jpg",
"etiquetarrufoto1_fh" : "2017-10-18T05:57:27Z",
"botontorre4" : "sstelcel3",
"table" : { /* put all varibles one level up*/
"tecmast" : "LTE",
"frecmast" : "2100",
"secmast" : "1",
"untitled440" : "Salir"
},
"comentmast" : "",
"longfeedmast" : "",
"numtmasmast" : "",
"otra_marca_antena" : "",
"otro_modelo_antena" : ""
}]
Starting from MongoDB version 3.4 you could use $addFields to do this.
//replace products with what makes sense in your database
db.getCollection('products').aggregate(
[
{ //1 add the properties from subdocument table to documents
$addFields: {
"documents.tecmast" : "documents.0.table.tecmast",
"documents.frecmast" : "documents.0.table.frecmast",
"documents.secmast" : "documents.0.table.secmast",
"documents.untitled440" : "documents.0.table.untitled440"
}
},
{
//(optional) 2 remove the table property from the documents
$project: {"documents.table" : 0}
}
]
)
Step 1: use $addFields to grab properties from table inside documents.table and put them on documents
Step 2: (optional) remove property "table" from documents.
I hope this helps!!!
So I have mongodb collection with documents that some of them have the same key (beatmapset_id):
{
"_id" : "fgq7SK7LnLbcFzgSb",
"beatmapset_id" : "59049",
"version" : "Hard",
"artist" : "UNDEAD CORPORATION",
"title" : "Yoru Naku Usagi wa Yume wo Miru",
"creator" : "Strawberry"
}
{
"_id" : "u7M8wibPwxuStNLSM",
"beatmapset_id" : "59049",
"version" : "jom's Taiko Muzukashii",
"artist" : "UNDEAD CORPORATION",
"title" : "Yoru Naku Usagi wa Yume wo Miru",
"creator" : "Strawberry"
}
{
"_id" : "ChRwdj7kbPL7oYgty",
"beatmapset_id" : "59049",
"version" : "jom's Taiko Oni",
"artist" : "UNDEAD CORPORATION",
"title" : "Yoru Naku Usagi wa Yume wo Miru",
"creator" : "Strawberry"
}
How do I publish only unique documents? Something like this:
Meteor.publish("beatmaps", function() {
return Beatmaps.find({ beatmapset_id: 1 }, { unique: true });
});
Now i solve this having 2 collections. One with all documents and one where I removed those documents with this code:
db.beatmaps.ensureIndex({ beatmapset_id: 1 }, { unique: true, dropDups: true });
But it's slow and when my first collection updates I also need to update second as well. I'm sure it can be done more consistent way.
Hi I am very new to HBase database. I downloaded some twitter data and stored into MongoDB. Now I need to transform that data into HBase to speed-up Hadoop processing. But I am not able to create it's scheme. Here I have twitter data into JSON format-
{
"_id" : ObjectId("512b71e6e4b02a4322d1c0b0"),
"id" : NumberLong("306044618179506176"),
"source" : "Facebook",
"user" : {
"name" : "Dada Bhagwan",
"location" : "India",
"url" : "http://www.dadabhagwan.org",
"id" : 191724440,
"protected" : false,
"timeZone" : null,
"description" : "Founder of Akram Vignan - Practical Spiritual Science of Self Realization",
"screenName" : "dadabhagwan",
"geoEnabled" : false,
"profileImageURL" : "http://a0.twimg.com/profile_images/1647956820/M_DSC_0034_normal.jpg",
"biggerProfileImageURL" : "http://a0.twimg.com/profile_images/1647956820/M_DSC_0034_bigger.jpg",
"profileImageUrlHttps" : "https://si0.twimg.com/profile_images/1647956820/M_DSC_0034_normal.jpg",
"profileImageURLHttps" : "https://si0.twimg.com/profile_images/1647956820/M_DSC_0034_normal.jpg",
"biggerProfileImageURLHttps" : "https://si0.twimg.com/profile_images/1647956820/M_DSC_0034_bigger.jpg",
"miniProfileImageURLHttps" : "https://si0.twimg.com/profile_images/1647956820/M_DSC_0034_mini.jpg",
"originalProfileImageURLHttps" : "https://si0.twimg.com/profile_images/1647956820/M_DSC_0034.jpg",
"followersCount" : 499,
"profileBackgroundColor" : "EEE4C1",
"profileTextColor" : "333333",
"profileLinkColor" : "990000",
"lang" : "en",
"profileSidebarFillColor" : "FCF9EC",
"profileSidebarBorderColor" : "CBC09A",
"profileUseBackgroundImage" : true,
"showAllInlineMedia" : false,
"friendsCount" : 1,
"favouritesCount" : 0,
"profileBackgroundImageUrl" : "http://a0.twimg.com/profile_background_images/396759326/dadabhagwan-twitter.jpg",
"profileBackgroundImageURL" : "http://a0.twimg.com/profile_background_images/396759326/dadabhagwan-twitter.jpg",
"profileBackgroundImageUrlHttps" : "https://si0.twimg.com/profile_background_images/396759326/dadabhagwan-twitter.jpg",
"profileBannerURL" : null,
"profileBannerRetinaURL" : null,
"profileBannerIPadURL" : null,
"profileBannerIPadRetinaURL" : null,
"miniProfileImageURL" : "http://a0.twimg.com/profile_images/1647956820/M_DSC_0034_mini.jpg",
"originalProfileImageURL" : "http://a0.twimg.com/profile_images/1647956820/M_DSC_0034.jpg",
"utcOffset" : -1,
"contributorsEnabled" : false,
"status" : null,
"createdAt" : NumberLong("1284700143000"),
"profileBannerMobileURL" : null,
"profileBannerMobileRetinaURL" : null,
"profileBackgroundTiled" : false,
"statusesCount" : 1713,
"verified" : false,
"translator" : false,
"listedCount" : 6,
"followRequestSent" : false,
"descriptionURLEntities" : [ ],
"urlentity" : {
"url" : "http://www.dadabhagwan.org",
"start" : 0,
"end" : 26,
"expandedURL" : "http://www.dadabhagwan.org",
"displayURL" : "http://www.dadabhagwan.org"
},
"rateLimitStatus" : null,
"accessLevel" : 0
},
"contributors" : [ ],
"geoLocation" : null,
"place" : null,
"favorited" : false,
"retweet" : false,
"retweetedStatus" : null,
"retweetCount" : 0,
"userMentionEntities" : [ ],
"retweetedByMe" : false,
"currentUserRetweetId" : -1,
"possiblySensitive" : false,
"urlentities" : [
{
"url" : "http://t.co/gR1GohGjaj",
"start" : 113,
"end" : 135,
"expandedURL" : "http://fb.me/2j2HKHJrM",
"displayURL" : "fb.me/2j2HKHJrM"
}
],
"hashtagEntities" : [ ],
"mediaEntities" : [ ],
"truncated" : false,
"inReplyToStatusId" : -1,
"text" : "Spiritual Quote of the Day :\n\n‘I am Chandubhai’ is an illusion itself and from that are \nkarmas charged. When... http://t.co/gR1GohGjaj",
"inReplyToUserId" : -1,
"inReplyToScreenName" : null,
"createdAt" : NumberLong("1361801697000"),
"rateLimitStatus" : null,
"accessLevel" : 0
}
Here how to divide data into columns and column-family? I thought to make one "twitter" column-family that contain source, getlocation, place, retweet etc... and another "user" column-family and that contain name, location etc... (user's data). i.e new column family for each inner level sub-document.
Is this approach is correct? Now How I will differentiate urlentity for "user" column-family and "twitter" column-family?
And how to handle those keys that contain list of sub-documents (for e.g. urlentity)
There are many ways to model this in HBase ranging from storing everything in a single column to having a different table for each sub entity with several other tables for "indexing".
Generally speaking you model the data in hbase based on you read and write access patterns. fo r example column family are stored in different files on disk. A reason to divide data into two column families is if there are a lot of cases where you need data from one and not the other. etc.
There's a good presentation about HBAse schema design by Ian Varley from HBaseCon 2012 you can find the slides here and the video here
Can anyone tell me why this query would return this record?
query:
db.sales.findOne({"qualified": true, created_at: {$gte:1334534400}, campaign: {$exists:1}})
record:
{
"_id" : ObjectId("4f8b6ef8da45bb3e600001fe"),
"grand_total" : 9.99,
"order_id" : "T003974723",
"items" : [
{
"price" : 9.99,
"id" : "23958754",
"name" : "UO Daisy Sunglasses",
"qty" : 1
}
],
"created_at" : 1334537675,
"visitor_id" : "1332652531389",
"unique_session_id" : "i-0f7e196e-110800",
"session_id" : "i-0f7e196e-110800",
"qualified" : true,
"needler_id" : 357,
"ip_address" : "76.103.214.29",
"in_session" : true
}
I THOUGHT I was asking mongo to give me back a record that had 'campaign' was a property. However, as you can see, this record has no property called 'campaign'
Which version of mongodb are you using? Apparently using $exists: 1 (instead of $exists: true is only supported from v1.9 on:
https://jira.mongodb.org/browse/SERVER-2322