Mongo DB Time Series Performance on just secondary Index - mongodb

I am building a mongoose model to store survey response data. There are, however, different types of surveys with different response rates. One type of survey has frequent answers (perhaps every few seconds) and data is normally queried in chunks of time, eg from startDate to endDate of the response. However, some surveys only get responses maybe a 20 times a month, and sometimes I would want to get all the data for that survey just based on the survey_id, and, not using any time field constraints.
So my question is, do secondary indexes on time series collections work as well as they would on a non-time series collection?
My model looks like this:
const responseSchema = mongoose.Schema(
{
metaData: {
type: new mongoose.Schema({
survey_id: { type: mongoose.Schema.Types.ObjectId, ref: "survey", required: true },
}),
required: true,
},
createdAt: Date,
answers: { type: Map, of: mongoose.Mixed },
},
{
timeseries: {
timeField: "createdAt",
metaField: "metaData",
granularity: "seconds",
},
}
);
responseSchema.plugin(ts);
responseSchema.index({ "metaData.survey_id": 1, "createdAt": 1 });
I would expect normal querys using the createdAt field as filters to work well, but what if I only query by survey_id and don't use the time field. Will that still work well? or do I get performance degradation by not using the time field with a time series collection.
querys of this collection will always be based on the survey_id

Related

mongoose indexing? grouping?

I'm kinda new to mongoose, and I'm not sure if it's a right term.
what I'm building is a community site (like redit), and I have a schema like below
const postSchema = new mongoose.Schema({
content: {
type: String,
required: true,
},
title: {
type: String,
required: true,
},
userId: {
type: mongoose.Schema.Types.ObjectId,
required: true,
ref: 'User',
},
board: {
type: String,
required: true,
enum: ['board1','board2'],
},
created_at: {
type: Date,
default: Date.now,
},
updated_at: {
type: Date,
},
})
there are many kinds of 'board'
and I'm not sure if it can be 'indexed'.
purpose of it is for getting posts faster
for example in sql (assume that board column is indexed)
--> select * from post where board = 'board1' ;
I'm confusing about the terms, need some direction..
Short answer:
You need to create an index on the field board by doing:
db.post.createIndex(
{ board: 1 } ,
{ name: "borad index" }
)
Long answer:
Indexing in mongodb uses memory in order to save running time.
Let's take an example: say you have all words in English in your DB. And you are reading a book and from time to time you need to search for a word to check its meaning.
How would you do that? A dictionary. You'll sort the words alphabetically and then you could easily search for every word you wanted.
Indexing apply the same concept. When you create an index on the field board it takes all its values, sort them and save it in a table (and reference for each entry the full document from your collection).
Now when you search for select * from post where board = 'board1' it first use the memorized table of sorted boards, finds the ones that equal to board1 and then by the reference gives you the full documents that belongs to it. You can continue reading here.

Which approach is better?

I want to create a collection for user's rating, I have doubts between 2 structures schemas.
First schema:
var Rating = new mongoose.Schema({
userID: {
type: String,
minlength: 1,
required: true,
trim: true
},
ratings: [{
rate: {
type: Number
}
}]
});
Second schema:
var Rating = new mongoose.Schema({
userID: {
type: String,
required: true,
},
rating: {
type: Number,
required: true
},
});
The first schema will cause that every rating the be pushed into the array of ratings and the second will cause inserting multiple documents of the same userID and each document contains its rating.
I would like to know which approach is recommended between the two, increasing the array or increasing documents each time the user get rating.
It depends on the details of your project (there is no the one super good and universal schema).
The first structure is closer to the MongoDB ideology. But do not forget about the document size limitation (16MB, except if you are using GridFS). This structure is better if you do not have a big amount of information (items in the ratings field). Because all ratings will be in one document it means that your indexes will be optimal small (one user - one document).
The second schema is better for situation when ou have a big amount of ratings (related to the document size limit).
Also you can use two collections. One for aggregated data (final results after calculations, something like as cache) and another for detailed information. As mentioned before - the best solution depends on the details of the project
I recoment you to read this article 6 Rules of Thumb for MongoDB Schema Design

Best way to store and organize data in MongoDB

I have a users in MongoDB and each user has an interface allowing them to set their current state of hunger being a combination of "hungry", "not hungry", "famished", "starving", or "full"
Each user can enter a multiple options for any period of time. For example, one use case would be "in the morning, record how my hunger is" and the user can put "not hungry" and "full". They can record how their hunger is at any time in the day, and as many times as they want.
Should I store the data as single entries, and then group the data by a date in MongoDB later on when I need to show it in a UI? Or should I store the data as an array of the options the user selected along with a date?
It depends on your future queries, and you may want to do both. Disk space is cheaper than processing, and it's always best to double your disk space than double your queries.
If you're only going to map by date then you'll want to group all users/states by date. If you're only going to map by user then you'll want to group all dates/states by user. If you're going to query by both, you should just make two Collections to minimize processing. Definitely use an array for the hunger state in either case.
Example structure for date grouping:
{ date: '1494288000',
time-of-day: [
{ am: [
{ user: asdfas, hunger-state: [hungry, full] },
{ user: juhags, hunger-state: [full] }
],
pm: [
{ user: asdfas, hunger-state: [hungry, full] },
{ user: juhags, hunger-state: [full] }
]}]}
It depends on how you are going to access it. If you want to report on a user's last known state, then the array might be better:
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
hunger: ['HUNGRY','FAMISHED'],
}
The timestamps of multiple records might not align perfectly if you are passing in the output from new Date() (note the second record is 99 ms later):
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
hunger: 'HUNGRY',
}
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.099Z',
hunger: ['FAMISHED',
}
You should probably look at your data model though and try to get a more deterministic state model. Maybe:
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
isHungry: true,
hunger: 'FAMISHED',
}

Sort populated record in sails waterline

I created a Sails application with two models Publication and Worksheet. They are having a one-to-one relationship. Sails-postgresql is the adapter I'm using. I'm using waterline orm to fire query to the database. I'm When I am trying to load publications data along with worksheet and then sort the records depending on a field in the Worksheet using sort() I'm getting an error.
My model is:
Publication.js
module.exports = {
attributes: {
id: {
type: 'integer'
unique: true
},
worksheetId: {
type: 'integer',
model : 'worksheet'
},
status: {
type: 'string',
defaultsTo: 'active',
in : ['active', 'disabled'],
}
}
}
Worksheet.js
module.exports = {
attributes: {
id: {
type: 'integer',
unique: true
},
name: 'string',
orderWeight: {
type: 'integer',
defaultsTo: 0
}
}
}
So now I want to load all the publication where status is "active" and populate worksheet in the data.
So I'm executing the query:
Publication.find({
where: {
status: 'active'
}
})
.populate('worksheetId').limit(1)
.exec(function (error, publications) {
})
And I'm getting a data like :
{
id : 1,
status : "active",
worksheetId : {
id : 1
name : "test",
orderWeight : 10
}
}
So till now it's all working fine. Now I want to increase the limit to 10 and want to sort the data depending on "orderWeight" which is in the populated data. Initially I sorted the whole data depending on publication id and the query worked.
Publication.find({
where: {
status: 'active'
}
})
.populate('worksheetId').sort('id ASC').limit(10)
.exec(function (error, publications) {
})
So I fired similar query to sort the data on "orderWeight"
Publication.find({
where: {
status: 'active'
}
})
.populate('worksheetId').sort('worksheetId.orderWeight ASC').limit(10)
.exec(function (error, publications) {
})
And this query is giving me error that worksheetId.orderWeight is not a column on the publication table. So I want to fire this sort query on the populated data not on the publication table.
Please let me know how I can get my expected result.
Apart from sort() method I also want to run some find command to the populated data to get those publication where the worksheet name matches with certain key as well.
Basically, what you're trying to do, is query an association's attribute. This has been in the waterline roadmap since 2014, but it's still not supported, so you'll have to figure out a workaround.
One option is to query the Worksheet model, and populate the Publication, since sails doesn't let you query across models without using raw queries (i.e. .sort('worksheetId.orderWeight ASC') doesn't work). Unfortunately, you might have to move the active flag to the Worksheet. For example:
Worksheet.find({
status: 'active'
})
.populate('publication') // you should also add publication to Worksheet.js
.sort('orderWeight ASC')
.limit(10)
Alternatively, you could combine Worksheet and Publication into one model, since they're one-to-one. Probably not ideal, but sails.js and Waterline make it very difficult to work with relational data - I'd estimate that half of the queries in the project I'm working on are raw queries due to sails' poor support of postgres. The framework is pretty biased towards using MongoDB, although it claims to "just work" with any of the "supported" DBs.

Is there a MongoDB maximum bson size work around?

The document I am working on is extremely large. It collects user input from an extremely long survey (like survey monkey) and stores the answers in a mongodb database.
I am unsurprisingly getting the following error
Error: Document exceeds maximal allowed bson size of 16777216 bytes
If I cannot change the fields in my document is there anything I can do? Is there some way to compress down the document, by removing white space or something like that?
Edit
Here is the structure of the document
Schema({
id : { type: Number, required: true },
created: { type: Date, default: Date.now },
last_modified: { type: Date, default: Date.now },
data : { type: Schema.Types.Mixed, required: true }
});
An example of the data field:
{
id: 65,
question: {
test: "some questions",
answers: [2,5,6]
}
// there could be thousands of these question objects
}
One thing you can do is to build your own mongoDB :-). Mongodb is an open source and the limitation about the size of a document is rather arbitrary to enforce a better schema design. You can just modify this line and build it for yourself. Be careful with this.
The most straight forward idea is to have each small question in a different document with a field which reference to its parent.
Another idea is to limit number of documents in the parent. Lets say you limit is N elements then the parent looks like this:
{
_id : ObjectId(),
id : { type: Number, required: true },
created: { type: Date, default: Date.now }, // you can store it only for the first element
last_modified: { type: Date, default: Date.now }, // the same here
data : [{
id: 65,
question: {
test: "some questions",
answers: [2,5,6]
}
}, ... up to N of such things {}
]
}
This way modifying number N you can make sure that you will be in 16 MB of BSON. And in order to read the whole survey you can select
db.coll.find({id: the Id you need}) and then combine the whole survey on the application level. Also do not forget to ensureIndex on id.
Try different things, do a benchmark on your data and see what works for you.
You should be using gridfs. It allows you to store documents in chunks. Here's the link: http://docs.mongodb.org/manual/reference/gridfs/