Making unique reproducible ids based on strings (for mongoose model) - mongodb

I pull data (a list of properties) from a city's website and it contains info like address, property owner, and longitude and latitude.
I need to make a unique id out of this and the properties will be stored in a database. The id needs to be reproducible because I will be checking occasionally to see if there are new properties so the only way to cross compare is by being able to make the id on the spot.
Is this a common problem to run into when pulling data and checking if there is anything new? I am using mongoose as the database and I am not sure if a non relational database is the best solution for my problem but I know I can make it work.
The current plan is to just combine longitude and latitude and call that the unique id. The city's data unfortunately doesn't have any unchanging unique id so I have to hack something together for comparing their data to what I have saved in the database.
Any ideas on what a better less hacky method is? Does mongoose have a different way to deal with this?

Related

Firestore Structure for public/private fields

I'm after some advice on a Firestore DB structure. I have an app that has a Firestore db and allows a single user (under the one UID) to create a profile for each member of their family (each profile is a document within the collection). In each of the documents, there are the personal details of the family member (as fields. For example, field1 = firstname, field2 = last name, field3 = phone number and so on). This works well but there is one other detail I need to attribute to each and every field within each profile. I need to be able to set a private or public flag against each individual field (for example: firstname has public flag, last name has private flag, Phone number has private flag and so on..). It would be nice if each field could have nested fields underneath (such as a "private" bool field) but that's not how Firestore works. It seems to be Collection/Document/Collection/Document/and so on...
If I didn't need to private/public flag, I would not have an issue. My data would fit perfectly to the Firestore structure.
Does anyone have any suggestions on how I might best achieve this outcome?
Cheers and thanks in advance...
Family Profiles current structure without flags
You can use structure above. With this structure you can fetch private data and public data separately whenever you need. But I have to tell you if you want to show only first name to other users in your app you can use queries on what to show to users. And also always use unique ids to store data rather than hardcoded Names such as JaneDoe or JoeDoe. Otherwise you can face some problems in the future regarding fetching data from firestore.
If you have questions feel free to ask
Take a look at the official documentation of Firebase. The information provided there will help you to understand what could be the most suitable solution for work with the data structure on this service. On the other hand for your question, it depends of your use case, will be useful if you could provide us with more context about why would your implementation needs to be as you wanted.
Also, since your concerns are related about how to manage the privacy of your data check this document too.
I hope this information will help you

To relate one record to another in MongoDB, is it ok to use a slug?

Let's say we have two models like this:
User:
_ _id
- name
- email
Company:
- _id
_ name
_ slug
Now let's say I need to connect a user to the company. A user can have one company assigned. To do this, I can add a new field called companyID in the user model. But I'm not sending the _id field to the front end. All the requests that come to the API will have the slug only. There are two ways I can do this:
1) Add slug to relate the company: If I do this, I can take the slug sent from a request and directly query for the company.
2) Add the _id of the company: If I do this, I need to first use the slug to query for the company and then use the _id returned to query for the required data.
May I please know which way is the best? Is there any extra benefit when using the _id of a record for the relationship?
Agree with the 2nd approach. There are several issues to consider when deciding on which field to use as a join key (this is true of all DBs, not just Mongo):
The field must be unique. I'm not sure exactly what the 'slug' field in your schema represents, but if there is any chance this could be duplicated, then don't use it.
The field must not change. Strictly speaking, you can change a key field but the only way to safely do so is to simultaneously change it in all the child tables atomically. This is a difficult thing to do reliably because a) you have to know which tables are using the field (maybe some other developer added another table that you're not aware of) b) If you do it one at a time, you'll introduce race conditions c) If any of the updates fail, you'll have inconsistent data and corrupted parent-child links. Some SQL DBs have a cascading-update feature to solve this problem, but Mongo does not. It's a hard enough problem that you really, really don't want to change a key field if you don't have to.
The field must be indexed. Strictly speaking this isn't true, but if you're going to join on it, then you will be running a lot of queries on it, so you'll need to index it.
For these reasons, it's almost always recommended to use a key field that serves solely as a key field, with no actual information stored in it. Plenty of people have been burned using things like Social Security Numbers, drivers licenses, etc. as key fields, either because there can be duplicates (e.g. SSNs can be duplicated if people are using fake numbers, or if they don't have one), or the numbers can change (e.g. drivers licenses).
Plus, by doing so, you can format the key field to optimize for speed of unique generation and indexing. For example, if you use SSNs, you need to check the SSN against the rest of the DB to ensure it's unique. That takes time if you have millions of records. Similarly for slugs, which are text fields that need to be hashed and checked against an index. OTOH, mongoDB essentially uses UUIDs as keys, which means it doesn't have to check for uniqueness (the algorithm guarantees a high statistical likelihood of uniqueness).
The bottomline is that there are very good reasons not to use a "real" field as your key if you can help it. Fortunately for you, mongoDB already gives you a great key field which satisfies all the above criteria, the _id field. Therefore, you should use it. Even if slug is not a "real" field and you generate it the exact same way as an _id field, why bother? Why does a record have to have 2 unique identifiers?
The second issue in your situation is that you don't expose the company's _id field to the user. Intuitively, it seems like that should be a valuable piece of information that shouldn't be given out willy-nilly. But the truth is, it has no informational value by itself, because, as stated above, a key should have no actual information. The place to implement security is in the query, ensuring that the user doing the query has permission to access the record / specific fields that she's asking for. Hiding the key is a classic security-by-obscurity that doesn't actually improve security.
The only time to hide your primary key is if you're using a poorly thought-out key that does contain useful information. For example, an invoice Id that increments by 1 for each invoice can be used by someone to figure out how many orders you get in a day. Auto-increment Ids can also be easily guessed (if my invoice is #5, can I snoop on invoice #6?). Fortunately, Mongo uses UUIDs so there's really no information leaking out (except maybe for timing attacks on its cryptographic algorithm? And if you're worried about that, you need far more in-depth security considerations than this post :-).
Look at it another way: if a slug reliably points to a specific company and user, then how is it more secure than just using the _id?
That said, there are some instances where exposing a secondary key (like slugs) is helpful, none of which have to do with security. For example, if in the future you need to migrate DB platforms and need to re-generate keys because the new platform can't use your old ones; or if users will be manually typing in identifiers, then it's helpful to give them something easier to remember like slugs. But even in those situations, you can use the slug as a handy identifier for users to use, but in your DB, you should still use the company ID to do the actual join (like in your option #2). Check out this discussion about the pros/cons of exposing _ids to users:
https://softwareengineering.stackexchange.com/questions/218306/why-not-expose-a-primary-key
So my recommendation would be to go ahead and give the user the company Id (along with the slug if you want a human-readable format e.g. for URLs, although mongo _ids can be used in a URL). They can send it back to you to get the user, and you can (after appropriate permission checks) do the join and send back the user data. If you don't want to expose the company Id, then I'd recommend your option #2, which is essentially the same thing except you're adding an additional query to first get the company Id. IMHO, that's a waste of cycles for no real improvement in security, but if there are other considerations, then it's still acceptable. And both of those options are better than using the slug as a primary key.
Second way of approach is the best,That is Add the _id of the company.
Using _id is the best way of practise to query any kind of information,even complex queries can be solved using _id as it is a unique ObjectId created by Mongodb. Population is the process of automatically replacing the specified paths in the document with document(s) from other collection(s). We may populate a single document, multiple documents, plain object, multiple plain objects, or all objects returned from a query.

Sails/ Waterline: Ids of associations in query result not accessible without populate

I am to a fair degree familiar with Sails and Waterline.
Situation:
We have a Model Playlist with a many-to-many association to the Model Song on sails-mongo.
When we query for all playlists, we do not want to sideload all associated songs, we just need the ids of the associated songs so that we can load them lazily later.
When we do a populate (with Ember blueprints: populateEach()) we of course get the ids, but it takes around 1s for loading all the Playlists.
Without populate it is just around 50ms.
Complication:
After getting the results with query.exec WITHOUT populate, the ids of the associated songs are not included and not sent back to the requester.
But I can log the ids of associated records to the console via iterating over Object.keys(matchingRecord).
I can set them to a new property of the matchingRecord, all without populating.
However, I cannot explicitly set them to their property name.
Using the original property name is required for the interaction with the Ember frontend.
I tried to mess around with Object.defineProperty - no success. I guess the set/get functions are overwritten to prevent that.
Questions:
How can I make those id arrays of visible/prevent that they are hidden/removed?
Are there any other ideas from your side to maneuver around this issue?
Thank you,
Manuel

Should I use ObjectID or uid(implemented by myself) to identify user?

I am new to mongodb and database.
Implement a function to make uid and use the local ObjectId.
Which is better?
You should leave ObjectID generation to the clients/drivers. This makes sure that generated IDs are unique among many things, such as time, server and process. Using the standard ObjectID also means that methods implemented by drivers (such as getTimestamp()) work.
However, if you are thinking of using your own type of ID for the _id field (ie, not the standard ObjectID type), then that makes a viable choice. For example, if you want to store information about a twitter user, then using the user's twitter ID as _id value makes perfect sense. Personally, I try to rely on the ObjectID type as little as I have to, as often collections will have a field in each document already that uniquely identifies each document.
This depends on three things:
Its source
Where and how are you using the user ID?
Personal opinion.
My personal opinion is that the object ID is good enough, however, getting back to the first and second point.
If this ID comes or is to be used in another database like an SQL database you might find using an incrementing ID a good idea, but SQL and other techs do fully support the object ID in the hexadecimal form.
If this ID is something that can be used much like an account number (think of your account number for car insurance when you phone them up) you might find an object ID too difficult for your users to remember/recounter as such a more human friendly ID might be more applicable here.
So it really depends on how this ID is being used.

Saving JSON data on iPhone

I'm trying to find the best way the save data obtained from JSON.
The website which hosts the data is: "JSON data".
Since I will be using the data in places where I won't have a connection to the internet, I want to save this data on the iPhone itself, with an ability to update when I do have an internet connection.
I'll want to display this data in a table view, and I'll need to be able to filter/search this data. This search will either be on the City, or on the store ID ("no:" in the data). Clicking the row will show a detail view of the store.
I was thinking of storing the data in an SQL table. I'm however unsure of the best way to update the data, and I don't know how to filter the data on two different columns(City/ID)?
Also, if you know a better approach I'd love to hear it!
Your data appears to be a table of addresses, with some sort of "detail" records associated with each address. This is a classic master/detail database. Normally you'd have one table with a record for each address, and assign some sort of unique ID to each address. Then have a second table that's keyed by unique ID to contain all of the detail records.