Using a string vs an id as a foreign key in Mongodb - mongodb

I have a collection users whose documents will belong to a company (and each company can have many users). Because I set a unique index on the company name, can I use the name as the foreign key inside the user document, or is it recommended to use the id instead?

If name is unique and is guaranteed to never change, then you can use it, no problem. Although there were cases in my practice when names turned out to be not-so-unique and not-so-immutable (damn requirement changes). So, just to be extra safe, use the id.

Related

Deciding primary key for DynamoDB

I have 3 fields to store in DynamoDB: identity-1, identity-2, score.
identity-1 and identity-2 are always unique in the table, i.e. no two entries can have same identity-1 or identity-2.
We want to allow entries to either have one of identity-1 or identity-2 or have both. Example:
identity-1
identity-2
score
a1
b1
s1
a2
s2
b3
s3
Access patterns are as follows:
Query identity-2 from identity-1
Query score from identity-1
Query score from identity-2
How do I define primary key in such case?
This is a "many:1" problem and there's a few ways to tackle it with DynamoDB. The simple answer here is to leverage Global Secondary Indexes (GSI). For every "identity" you wanted to do a direct look up from, you'd create a GSI.
GSI-1 would include Identity-1 as the hash key and you'd include Identity-2 and any other identities as a non-key attribute to include. You'd create a GSI for each identity you wanted to query directly on. You could also include the score as a non-key attribute if you wanted to directly look up score from any identity without having to resolve to the primary key (which we'll talk about).
The thing to consider with GSI's, though, is that they consume extra storage and throughput. If you create a GSI which includes all your attributes for every identity, you'd be paying for an additional copy of your table for each identity.
The other issue, so far, is that you haven't chosen a Primary Key for your table. You'll need a field to be your primary key and if none of your identities is non-nullable, you'll need a field which will be. It's often convenient to just call it what it is, so we'll call it pk.
You've got a few choices for pk here. Once is to define pk as a composite of your identities. For example: item.pk = item["identity-1"] || item["identity-2"]. Then you could do a query on the table for the identity == pk and if you don't find anything, you could then look up the index for the given identity. This works fine for your simple example, but as you wanted to do more complex things (such as many different identity types), you might find it to be a bit of a headache.
From past experience, my recommendation would be to adjust your approach slightly, however, and have an "users" table and a "scores" table. "users" would have a pk of a guid unique for every user and all their identities (call it "user_id"), you could then create a GSI for that table for every identity back to user_id. Then scores would then use "user_id" as the pk as well with no need for an index. Your application would always resolve to a "user_id" when a user was logged in or otherwise identified - then you can search for score without needing to track identity and you can look up all the associated identities or other user information without needing to create a very "fat" index of every identity->every other identity.

Attribute creation from one single table

I have one table.
I have to make attributes only from the fields on that table.
I have to use these attributes on one report.
I wanted to ask that all the attributes I have made are keys. Is this fine? If not, how do I resolve this issue?
The Keys are like primary, foreign keys in RDBMS. They define the joins
So long as you do not have other tables involved in the design, this is fine.
Ideally attributes are made only for dimensions
e.g
you could make attribute called Issue with forms(Issue id, Issue desc, Issue date) with Issue id as the ID form that drives the join with the other tables
All attributes should not be keys. Every key denotes that the tool is interpreting them as primary keys. Set proper relationship (parent-child) between the attributes and you will see keys only for the child attribute(s).

Does a weak entity need a partial key?

Does a weak entity need a partial key? Or can you just use its parent key as its primary key.
i.e Order and OrderItem. Order has a PK OrderPK, whilst OrderItem has no partial key.
Is this considered bad practice?
The OrderItem table should have an OrderID field that makes a FK reference back to the Orders table. This assures each item is for a valid order.
Then there is usually another field with distinguishes each item which would be used together with the OrderID field to form the primary key for the item.
This could be an intrinsic value or values that is unique for each item within an order. SKU or PartNum might be just such a value, assuming that multiple occurrences of the same item would be merged into one entry. To find this value, just ask yourself what minimum amount of data would you need to uniquely identify one item from another within the same order. However, it may not be possible. A disadvantage of this scheme is that you could be using dynamic data for a key field. The SKU of a particular item could well change some time in the future.
Or there could be a sequential value (1, 2, 3,...) for each item in an order. A disadvantage with this scheme is the sequential values cannot be system generated. Each sequence is unique for each order and this must be generated by trigger or application code.
Or there could be a system-generated sequential value unique to all the items for all the orders and this field could be the lone primary key. Per-order sequential values could still be generated by row_number functions in queries, but this means a particular item could have different values in different queries. That may or may not be a problem.
At this point, only you know enough about your system to choose the best option. But it is generally necessary for users to be able to select one specific item of one specific order, so some sort of key definition for each item is usually necessary.

are not-Long primary keys possible?

Is it possible to define not-Long primary key?
Motivation: I have a set of XML files to convert to rdb. String attributes are used as unique keys.
Not possible.
From docs:
What should you do when you need to specify the id yourself?
Nothing. You shouldn’t do that. The id property is supposed to be generated and managed by db only. If you need to specify some external unique identifier, like, for instance, Amazon’s ASIN, just add an appropriate field to your entity and specify it as unique on SORM instantiation.

Query to database with 'primary key' on GoogleAppEngine?

I've made a guestbook application using Google App Engine(GAE):python and the client is running on iPhone.
It has ability to write messages on the board with nickname.
The entity has 3 fileds:
nickname
date
message
And I'm about to make another feature that user can post reply(or comment) on a message.
But to do this, I think there should a 'primary key' to the guestbook entity, so I can put some information about the reply on a message.
With that three fields, I can't get just one message out of database.
I'm a newbie to database. Does database save some kind of index automatically? or is it has to be done by user?
And if it's done automatically by database itself(or not), how can I get just one entity with the key??
And I want to get some advise about how to make reply feature generally also. Thanks to read.
Every entity has a key. If you don't assign a key_name when you create the entity, part of the key is an automatically-assigned numeric ID. Properties other than long text fields are automatically indexed unless you specify otherwise.
To get an entity if you know the key, you simply do db.get(key). For the replies, you probably want to use a db.ReferenceProperty in the reply entity to point to the parent message; this will automatically create a backreference query in the message to get replies.
Each entity has a key, it contains information such as the kind of entity it is, it's namespace, parent entities, and the most importantly a unique identifier (optionally user specifiable).
You can get the key of an entity using the key method that all entities have.
message.key()
A key can be converted to and from a URL-safe string.
message_key = str(message.key())
message = Message.get(message_key)
If the key has a user-specified unique identifier (key name), you can access it like this
message.key().name()
Alternatively, if a key name was not specified, an id will be automatically assigned.
message.key().id()
To assign a key name to an entity, you must specify it when creating the entity, you are not able to add/remove or change the key name afterwards.
message = Message(key_name='someusefulstring', content='etc')
message.put()
You will then be able to fetch the message from the datastore using the key name
message = Message.get_by_key_name('someusefulstring')
Use the db.ReferenceProperty to store a reference to another entity (can be of any kind)
It's a good idea to use key name whenever possible, as fetching from the datastore is much faster using them, as it doesn't involve querying.