Generate a JSON schema from an existing MongoDB collection - mongodb

I have a MongoDB collection that contains a lot of documents. They are all roughly in the same format, though some of them are missing some properties while others are missing other properties. So for example:
[
{
"_id": "SKU14221",
"title": "Some Product",
"description": "Product Description",
"salesPrice": 19.99,
"specialPrice": 17.99,
"marketPrice": 22.99,
"puchasePrice": 12,
"currency": "USD",
"color": "red",
},
{
"_id": "SKU14222",
"title": "Another Product",
"description": "Product Description",
"salesPrice": 29.99,
"currency": "USD",
"size": "40",
}
]
I would like to automatically generate a schema from the collection. Ideally it would not which properties are present in all the documents and mark those as required. Detecting unique columns would also be nice, though not really all that necessary. In any event I would be modifying the schema after it's automatically generated.
I've noticed that there are tools that can do this for JSON. But short of downloading the entire collection as JSON, is it possible to do this using the MongoDb console or a CLI tool directly from the collection?

You could try this tool out. It appears to do exactly what you want.
Extract (and visualize) schema from Mongo database, including foreign
keys. Output is simple json file or html with dagre/d3.js diagram
(depending on command line options).
https://www.npmjs.com/package/extract-mongo-schema

Related

mLab REST API - Query by string

I recently set up a new database/collection with a simple Json document on mLab. I enabled the data API and am experimenting with filtering the results via the query parameter. The complete document is too large to host within my app, unfortunately.
Here is what the document looks like:
{
"_id": {
"$oid": "5c59f496hv7ec06f4f560f4c"
},
"songs": [
{
"title": "title 1",
"artist": "musician 1",
"album": "fake album",
"minsec": "2:04",
"songid": "11100"
},
{
"title": "fake title",
"artist": "musician 1",
"album": "album 2",
"minsec": "2:57",
"songid": "11102"
},
{
"title": "title 3",
"artist": "musician 2",
"album": "album 3",
"minsec": "3:06",
"songid": "11078"
},
{
"title": "title 4",
"artist": "fake musician",
"album": "album 4",
"minsec": "2:28",
"songid": "11103"
}
]
}
I would love to be able to search the document with a string that would return any object from the array that has a value that includes that string. For example, searching for 'fake' which would return the first, second and fourth object with the following url:
https://api.mlab.com/api/1/databases/<my-db>/collections/<my-collection>?q=fake&apiKey=<my-apikey>
It appears that mLab's data API only processes queries("q=") using json notation, and even knowing that I still can't figure out how to return anything other than an empty array or a
"Could not parse JSON parameter, please double-check syntax and encoding"
error.
Thanks for any help you guys can offer!
Update I inserted each song object as an individual document, rather than a single large document, and can filter them using using specific queries...still unsure of how to implement a more wildcard solution to filtering results with values that include/contain the query.
For anyone seeking assistance with mlab related issues, I would suggest reaching out to their very helpful support team.
Final Update I figured out how to use regex to filter my results satisfactorily:
{$regex: '(?i).*<string>.*'}
if using the mlab data API, use regex to filter the results:
https://api.mlab.com/api/1/databases/<my-db>/collections/<my-collection>?apiKey=<my-apikey>&q={'<key>':$regex:'(?i).*<value>.*'}
the i option turns case sensitivity off.

MongoDB - how to properly model relations

Let's assume we have the following collections:
Users
{
"id": MongoId,
"username": "jsloth",
"first_name": "John",
"last_name": "Sloth",
"display_name": "John Sloth"
}
Places
{
"id": MongoId,
"name": "Conference Room",
"description": "Some longer description of this place"
}
Meetings
{
"id": MongoId,
"name": "Very important meeting",
"place": <?>,
"timestamp": "1506493396",
"created_by": <?>
}
Later on, we want to return (e.g. from REST webservice) list of upcoming events like this:
[
{
"id": MongoId(Meetings),
"name": "Very important meeting",
"created_by": {
"id": MongoId(Users),
"display_name": "John Sloth",
},
"place": {
"id": MongoId(Places),
"name": "Conference Room",
}
},
...
]
It's important to return basic information that need to be displayed on the main page in web ui (so no additional calls are needed to render the table). That's why, each entry contains display_name of the user who created it and name of the place. I think that's a pretty common scenario.
Now my question is: how should I store this information in db (question mark values in Metting document)? I see 2 options:
1) Store references to other collections:
place: MongoId(Places)
(+) data is always consistent
(-) additional calls to db have to be made in order to construct the response
2) Denormalize data:
"place": {
"id": MongoId(Places),
"name": "Conference room",
}
(+) no need for additional calls (response can be constructed based on one document)
(-) data must be updated each time related documents are modified
What is the proper way of dealing with such scenario?
If I use option 1), how should I query other documents? Asking about each related document separately seems like an overkill. How about getting last 20 meetings, aggregate the list of related documents and then perform a query like db.users.find({_id: { $in: <id list> }})?
If I go for option 2), how should I keep the data in sync?
Thanks in advance for any advice!
You can keep the DB model you already have and still only do a single query as MongoDB introduced the $lookup aggregation in version 3.2. It is similar to join in RDBMS.
$lookup
Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. The $lookup stage does an equality match between a field from the input documents with a field from the documents of the “joined” collection.
So instead of storing a reference to other collections, just store the document ID.

Only query field if value has been passed from API in MongoDB

I have created an API in Mule and have a number of queryParameters specified in the RAML file that I would like to use to query my store MongoDB collection.
A snippet of the collection looks like this:
{
"stores": [{
"storeId": 1234,
"storeName": "Shop Around Ltd",
"opens": "09:00:00.000Z",
"closes": "17:00:00.000Z",
"departments": ["clothing", "computers", "toys", "kitchen and home"],
"address": {
"street": "street1",
"city": "New York",
"state": "New York",
"zipCode": "10002"
}
}]
}
My problem is that the query parameters used to query the collection may be different each time so how can I write a dynamic query for MongoDB so it only queries on the fields that have been passed in with values instead of writing a query for each combination of query parameters? E.g. I may want to query based on zip code for one query so all the other fields will be blank and the next time I may want to query on the departments and whether the store will be open at 11am.
The query will be called from Mule.
Thanks

Xcode: model adding entities

I am experimenting with API that returns some fields with underscore like _id. I am not able to map this field in the -xcdatamodel. The attribute must begin with letter.
I've also tried to map this field as "id" and provide in the "User Info" session a Key/Value like id : _id but without success.
Do you have a solution for this problem? As i know there are many APIs that have fields with underscore.
Other non underscore fields are mapped without problems.
{
"__v": 0,
"_avRateDelay": 5,
"_avRateRecommend": 5,
"_avRateStaff": 5,
"_id": "530f733df222bf594b190e0a10",
"_reviews": 1,
"active": 1,
"address": {
"city": "Little Rock",
"country": "USA",
"other": "",
"state": "AZ",
"street": "2701 E Roosevelt Rd",
"zip": "72206"
},
"location": {
"lat": 34.721175,
"lng": -92.24168600000002
},
"name": "Certainteed 69"
}
Don't use id or _id in Objective-C. id is a reserved word. Since many servers like to use that I recommend that you write mapping code so that it is mapped from the server id to something like identifier.
Since you need to write code to parse the fields anyway there is no hardship to look for that key and change it. You can even store the mapping in the NSEntityDescription and set up code to look for other mappings and change them. That way you can change other server styled values like created_at to their Objective-C counterparts like createdAt.
The key/values are editable directly in the model editor and then accessible via the -entity property on the NSManagedObject.

mongodb best practice: nesting

Is this example of nesting generally accepted as good or bad practice (and why)?
A collection called users:
user
basic
name : value
url : value
contact
email
primary : value
secondary : value
address
en-gb
address : value
city : value
state : value
postalcode : value
country : value
es
address : value
city : value
state : value
postalcode : value
country : value
Edit: From the answers in this post I've updated the schema applying the following rules (the data is slightly different from above):
Nest, but only one level deep
Remove unneccesary keys
Make use of arrays to make objects more flexible
{
"_id": ObjectId("4d67965255541fa164000001"),
"name": {
"0": {
"name": "Joe Bloggs",
"il8n": "en"
}
},
"type": "musician",
"url": {
"0": {
"name": "joebloggs",
"il8n": "en"
}
},
"tags": {
"0": {
"name": "guitar",
"points": 3,
"il8n": "en"
}
},
"email": {
"0": {
"address": "joe.bloggs#example.com",
"name": "default",
"primary": 1,
"il8n": "en"
}
},
"updates": {
"0": {
"type": "news",
"il8n": "en"
}
},
"address": {
"0": {
"address": "1 Some street",
"city": "Somecity",
"state": "Somestate",
"postalcode": "SOM STR",
"country": "UK",
"lat": 49.4257641,
"lng": -0.0698241,
"primary": 1,
"il8n": "en"
}
},
"phone": {
"0": {
"number": "+44 (0)123 4567 890",
"name": "Home",
"primary": 1,
"il8n": "en"
},
"1": {
"number": "+44 (0)098 7654 321",
"name": "Mobile",
"il8n": "en"
}
}
}
Thanks!
In my opinion above schema not 'generally accepted', but looks like great. But i suggest some improvements thats will help you to query on your document in future:
User
Name
Url
Emails {email, emailType(primary, secondary)}
Addresses{address, city, state, postalcode, country, language}
Nesting is always good, but two or three level nesting deep can create additional troubles in quering/updating.
Hope my suggestions will help you make right choice of schema design.
You may want to take a look at schema design in MongoDB, and specifically the advice on embedding vs. references.
Embedding is preferred as "Data is then colocated on disk; client-server turnarounds to the database are eliminated". If the parent object is in RAM, then access to the nested objects will always be fast.
In my experience, I've never found any "best practices" for what a MongoDB record actually looks like. The question to really answer is, "Does this MongoDB schema allow me to do what I need to do?"
For example, if you had a list of addresses and needed to update one of them, it'd be a pain since you'd need to iterate through all of them or know which position a particular address was located. You're safe from that since there is a key-value for each address.
However, I'd say nix the basic and contact keys. What do these really give you? If you index name, it'd be basic.name rather than just name. AFAIK, there are some performance impacts to long vs. short key names.
Keep it simple enough to do what you need to do. Try something out and iterate on it...you won't get it right the first time, but the nice thing about mongo is that it's relatively easy to rework your schema as you go.
That is acceptable practice. There are some problems with nesting an array inside of an array. See SERVER-831 for one example. However, you don't seem to be using arrays in your collection at all.
Conversely, if you were to break this up into multiple collections, you would have to deal with a lack of transactions and the resulting race conditions in your data access code.