Is there any way to force a schema to be respected? - mongodb

First, I'd like to say that I really love NoSQL & MongoDB but I've got some major concerns with its schema-less aspect.
Let's say I have 2 tables. Employees and Movies.
And... I have a very stupid data layer / framework that sometimes like to save objects in the wrong tables.
So one day, a Movie gets saved in the Employees table. Like this:
> use mongoTests;
switched to db mongoTests
> db.employees.insert({ name : "Max Power", sex : "Male" });
> db.employees.find();
{ "_id" : ObjectId("4fb25ce6420141116081ae57"), "name" : "Max Power", "sex" : "Male" }
> db.employees.insert({ title : "Fight Club", actors : [{ name : "Brad Pitt" }, { name : "Edward Norton" }]});
> db.employees.find();
{ "_id" : ObjectId("4fb25ce6420141116081ae57"), "name" : "Max Power", "sex" : "Male" }
{ "_id" : ObjectId("4fb25db834a31eb59101235b"), "title" : "Fight Club", "actors" : [ { "name" : "Brad Pitt" }, { "name" : "Edward Norton" } ] }
This is VERY wrong.
Let's switch the context, think about Movies, and CreditCards (for whatever reason, in this context credit cards would be stored in clear text inside the DB). This is SUPER WRONG?
The code would probably explode because it's trying to use an object
structure and receives another totally unknown structure.
Even worst, the code actually works and the webstore visitors
actually see credit cards information in the "Rent a movie" list.
Is there anything, built-in that would prevent such threat to ever happen? Like some way to "force" a schema to be respected for only some tables?
Or is there any way to force MongoDB to make a schema mandatory? (Can't create new fields in a table, etc)
EDIT: For those who thinks I'm trolling, I'm really not, this is an important question for me and my team because this is a big decision whether or not we're going to use NoSQL.
Thanks and have a nice day.

The schema-less aspect is one of the major positives.
A DB with a schema doesn't fully remove this kind of issue - e.g. there could be a bug in a system that uses a RDBMS that puts the wrong data in the wrong field/table.
IMHO, the bigger concern would be, how did that kind of bug make it through dev, testing and out into production?!
Having said that, you could set up a process that checks the "schema" of documents within a collection (e.g. look at newly added documents, check whether they have fields you would expect to see in there) - then flag up for investigation. There is such a tool (node.js) here (I think, I've never used it):
http://dhendo.github.com/node-mongodb-schema-validator/
Edit:
For those finding this question in future, so the link in my comment doesn't go overlooked, there's a jira item for this kind of thing here:
http://jira.mongodb.org/browse/SERVER-3536

Related

RKI_COVID19/FeatureServer - how to use that?

I am looking for a rest way to get more information out of a COVID19 map.
I noticed that arcgis provides plenty of topics and tutorials for developers.
I just don't know which tutorials are helping me to understand the FeatureServer.
I had two questions, can I query the below table with the rest api?
Like finding out what fields are in it, and what data.
If I have access to a serviceItemId - can I do anything useful with it?
"tables" : [
{
"id" : 0,
"name" : "RKI_COVID19",
"parentLayerId" : -1,
"defaultVisibility" : true,
"subLayerIds" : null,
"minScale" : 0,
"maxScale" : 0
}
]
Now I know how the query works.
https://developers.arcgis.com/rest/services-reference/query-related-records-feature-service-.htm
I found answers to what I was looking for.
Try this to access all data (query=1=1) in formatted son (pjson):
https://services7.arcgis.com/mOBPykOjAyBO2ZKk/arcgis/rest/services/RKI_COVID19/FeatureServer/0/query?where=1%3D1&outFields=*&f=pjson
Or have a look at this notebook that implements the data retrieval from there:
https://github.com/starschema/COVID-19-data/blob/master/notebooks/RKI_GER_COVID19_DASHBOARD.ipynb

Find and Delete documents in MongoDB using a Windows GUI tool

Okay, I'm a SQL Server based DBA, but there's a biz-critical app that uses MongoDB as a core component, and the problem is that it's grown too large. (The "large-ness" isn't critical yet, I'm trying to be proactive!)
Specifically, it has a "message log" collection which is over 400 GB, where the date-stamp is actually stored as a 2-element array [Int64, Int32], the 0th element being some measure of time (and the 1th element is just always '0').
So for example, a document:
{
"_id" : ObjectId("55ef63618c782e0afcf346cf"),
"CertNumber" : null,
"MachineName" : "WORKERBEE1",
"DateTime" : [
NumberLong(635773487051900000),
0
],
"Message" : "Waited 00:00:30.0013381 to get queue lock: Specs to verify",
"ScopeStart" : false
}
And just because 2 is better than 1, another example document:
{
"_id" : ObjectId("55ef63618c782e0afcf323be"),
"CertNumber" : null,
"MachineName" : "WORKERBEE2",
"DateTime" : [
NumberLong(635773487056430453),
0
],
"Message" : "Waited 00:00:30.0012345 to get queue lock: Specs to verify",
"ScopeStart" : false
}
I need to figure out two things:
What the heck does that "DateTime" really mean? It's not Unix Epoch time (seconds nor milliseconds); and even if I strip off the trailing 0's, it represents (in millis) 6/20/2171, so, unless we're building a time machine here, it makes no sense. If I strip off the last 6 digits, it means 2/23/1990, but even that doesn't seem likely, as this application has only existed since the early 2000's. (AFAIK)
Assuming we figure out #1, can we use some kind of command to remove (delete) all documents in the collection that are older than, say, 1/1/2016?
Again, I'm a SQL guy, so try to explain using analogs in that vein, e.g. "this is like your WHERE clause" and such.
PS: Yes, I read thru questions such as Find objects between two dates MongoDB and How do I convert a property in MongoDB from text to date type? , but so far nothing has jumped out at me.

MongoDB get last 10 activities

In My social network I want to get the feed for member A , member A is following lets say 20 category/member.
when a category/member(followed by member A) do an activity it is inserted into a collection called recent_activity :
{
"content_id": "6", // content id member A is following
"content_type_id": "6",// content type (category , other member)
"social_network_id": "2", // the action category did (add/like/follow)
"member_id": "51758", //Member A
"date_added": ISODate("2014-03-23T11:37:03.0Z"),
"platform_id": NumberInt(2),
"_id": ObjectId("532ec75f6b1f76fa2d8b457b"),
"_type": {
"0": "Altibbi_Mongo_RecentActivity"
}
}
I want when member A login into the system to get last 10 activities for the categories/member
my problem :
How to get Only 10 activities for all categories/members.
It is better to do it in one query or to do a for loop.
For this use case, I'd suggest to invert the logic and keep a separate object of the last 10 activities for member A that is kept up-to-date all the time. While that solution is more write-heavy, it makes reading trivially simple and it can be extended very easily. I'd like to blatantly advertise a blog post I wrote a while ago about news feeds with mongodb which outlines this approach.
This 'fan-out' approach might seem overly complex at first, but when you think about importance filtering / ranking (a la facebook), push messages for particularly important events (facebook, twitter) or regular digest emails (practically all), you will get one location in your code to perform all this logic.
I think I commented that T'm not really seeing the selection criteria. So if you are "outside" of a single collection, then you have problems. But if your indicated fields are the things you want to "filter" by, then just do this:
db.collection.find({
"social_network_id": "2",
"content_type_id": "6",
"content_id": "6",
"member_id": { "$ne": "51758" }
})
.sort({ "$natural": -1 })
.limit(10);
So what does that do? You match the various conditions in the data to do the "category match" (if I understood what was meant), then you make sure you are not matching entries by the same member.
The last parts do the "natural" sort. This is important because the ObjectId is monotinic, or math speak for "ever increasing". This means the "newest" entries are always the "highest" value. So descending order is "latest" to "oldest".
And the very final part is a basic "limit". So just return the last 10 entries.
As long as you can "filter" within the same collection in whatever way you want, then this should be fine.

Multilanguage Content in Rails

I'm about to start a new Project and need some advise.
For example, if i have a Model named "Page" that has "Posts" - how can i store more than one language when i create a new post and show only posts in a language when i click a - let's say - flag-icon at the top.
I have read a lot about l18n but as i understood - this is the way if i want to translate static messages like errors etc. ?
Hope somebody could explain a given strategy to to this in a clean way.
Thanks!
Like you said, localization and internationalization (abbreviated l10n and i18n, respectively) typically refer to the localization of the software product itself, rather than the content.
There are different strategies on how to manage content in multiple languages, and it does depend a lot on what you want to achieve. Suppose you operate a multilingual blog. However, some content is not relevant to an international audience, so you don't want to supply an English version (assuming your not a native English speaker, but I guess the point is clear).
Now it seems to make sense to simply not display that blog post in the English version of the blog. Hence, I'd suggest
Post {
"_id" : ObjectId('...'),
"PostGroupId: ObjectId('...'),
"Title" : "A Blog Post Title",
"Text" : "<h1>Lorem ipsum</h1> lots of text",
"Language" : "en",
"Published" : // and so on...
}
You can now easily query for all or specific posts in a given language: db.Posts.find({"language" : "en"}).sort({"Published" : -1});
Depending on your needs, you might want to add a grouping object for the posts to associate translations of posts to each other explicitly, using denormalized data:
PostGroup
{
"_id" : ObjectId('...'),
// ...
"Posts" : [{"lang" : "en", "id" : ObjectId('...')},
{"lang" : "de", "id" : ObjectId('...')} ]
// -- or simpler --
"AvailableLanguages" : ["en", "it", "fr"]
}

Why did this line delete everything in my MongoDB database?

Ok, so I'm trying to roll out a small update to my site. One update includes querying upon a field that may or may not exist. This doesn't work as I want, so I decided to just make it so that the field always exists in my database. I used this line at the MongoDB shell:
> db.entries.update({Published: null},{$set: {Published: true}},false,true);
Now, I'm not fully understanding how this caused every entry object where Published is null to be deleted. I mean, it literally was deleted. I tried looking up some IDs, and .findOne will return null for them.
How does that line work? I thought it would take every entry where Published is null(doesn't exist), and set Published to true.
Reading about operator behavior is better than guessing operator behavior. Search for null is different from performing a check for existence.
MongoDB has a dedicated $exists operator:
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24exists
To be honest, I'm not sure why it now works with changes, or at least, why it deleted everything with that command.
My ending command looked like this:
db.entries.update({Published: {$exists: false},$atomic: true},{$set:{"Published":true}},false,true);
I thought it would take every entry where Published is null(doesn't exist), and set Published to true.
OK, so these are two different things.
Published is null:
{ Published : null, post : 'blah' }
Published does not exist:
{ post : 'blahblah' }
You may want to post this question over at the MongoDB user group (developers check it very often) at http://groups.google.com/group/mongodb-user
Updates do not delete documents. In fact, the update you ran does what you intended, for example, if you wanted y to always have a value:
> db.foo.insert({x:1})
> db.foo.insert({x:2})
> db.foo.insert({y:null})
> db.foo.insert({y:1})
> db.foo.update({y:null},{$set : {y:true}}, false, true)
> db.foo.find()
{ "_id" : ObjectId("4db02aabbe5a5418fb65d24c"), "y" : true }
{ "_id" : ObjectId("4db02aafbe5a5418fb65d24d"), "y" : 1 }
{ "_id" : ObjectId("4db02aa1be5a5418fb65d24a"), "x" : 1, "y" : true }
{ "_id" : ObjectId("4db02aa4be5a5418fb65d24b"), "x" : 2, "y" : true }
There must have been another operation that did the delete. There might be a record of it in the logs (or there might not... it depends how long it took). It's impossible to tell from the info here what caused the deletions, but the update isn't the culprit here.