Best Practices - Should meta data and functional defining data be intermixed?

Best Practices - Should meta data and functional defining data be intermixed? - metadata

Consider the case of a simple news article web application that has a DB table column of "Status" that is accessible by a radio button set of:
Status - [x] Publish [ ] Draft [ ] Archive
...where "Publish" shows an article publicly and "Draft" and "Archive" do not. Functionally "Draft" and "Archive" do the same thing but carry additional meta data meanings. The two functional states of "show" and "hide" along with the meta data of "publish", "draft" and "archive" are intermixed in the same column of "status".
Is this a good practice? While this is a very simple case, larger cases might reveal flaws with such a practice (or not...).

Functional states are about behavior - they do not need to be modeled in your database. If your business logic only cares about "showing" articles with a status of "Published" - there's no reason to double the complexity of your data with a Show column.
At the point that you decide your business logic needs additional data to make the decision whether to show or hide an article (perhaps an IsApproved flag), then you can store that data.
Looking at it from a different angle - if you were to add another column of "Show", then what would an article with a status of "Draft" and "Show" = 1 do? According to your business rules, that is an invalid state.

In this instance, I would say that this is the appropriate functionality.
We've all seen WTF's in the media where someone accidentally hit show[x]
and draft[x] at the same time.
The way it is now, it is impossible to accidentally show a draft. This is important in newspapers as reporters are notorious for stuff like:
John Doe, of StackOverflow said, "---I can't remember what that ugly f*cker said - Check the tape and fill in later"
Which probably shouldn't be printed.

Related

How to build complex relationships in CoreData correctly?

I am dealing with CoreData, for training, I decided to create a small application for recording user income and expenses. CoreData tutorials all contain To-Do-List examples, and I haven't found any good examples that would help me.
// MARK: - Grammar
// I want to apologize for grammatical errors in the text. Unfortunately,
// English is not my native language, so in some places I used a translator.
If something is not clear, I will definitely try to explain it again.
When I began to think over how I would implement the application, I assumed that the most convenient way would be to save all user operations and make calculations in the application in the right places. So far, abstract, since It seems to me that this has little to do with the question, if you need to be more precise, I can provide a complete idea.
So, I'm going to save the user model, which will have the following data:
User operations (Operation type) - all operations will be saved, each operation includes the category for which the operation was performed, as well as the amount in currency.
User-selected categories (Category Type) - Categories that will be used for expenses or income when adding an operation.
Wallets (Type Wallet) - User's wallets, Everything is simple, the name, and the balance on it.
Budget Units (BudgetUnit Type) - These are user budgets, contains a category, and a budget for it. For example: Products - 10.000 $
When I started building dependencies in CoreData, I got a little strange behavior.
That is, the user has a relationship on the same category model as the Budget Unit and Operation. Something tells me that it won't work that way.
I want the user categories to be independent, he selected them, and I'm going to display them on the main screen, and each operation will have its own category model
In the picture above, the category model is used 3 times, the same model.
This is roughly how I represent the data structure that I would like to see. Different models have their own category model, independently of the others.
I think it could be implemented using 3 different models with the same values, but it seems to me that this approach is considered wrong.
So how do you properly implement the data model so that everything works as expected? I would be grateful for any help!
--- EDIT ---
As a solution to the problem, I can create multiple entities as Category (Example bellow)
But I don't know if this is good practice

I looked into several other open source projects and saw a solution to the problem.
I hope this helps someone in the future.
There is no need to save the categories for the user, you can simply save the categories in the application by adding the IsSelected and ID parameter to them in order to change these parameters when you select a category, and immediately understand which ones you need to display.
For budgets and operations (transactions) , we only need to save the category ID to immediately display the correct one.
For example:
Thanks #JoakimDanielson and #Moose for helping. It gave me a different view of the subject.

Watson Assistant: Can I define Intent using Entities in the Examples?

How to I create an #Intent which looks something like this:
How much is a #ProductType?
Whereas the #ProductType is an simple Entity which consists of:
Soft Drinks: Coke, Pepsi, Sprite, Fanta
Fruits: Apple, Banana, Watermelon
I tried adding an Intent with above settings, but it doesn't seem to work. Is such ability natively supported in IBM Watson? Or otherwise, do I need to manually handle in the Dialog, using Conditions and stuffs? Please kindly advise.

The training is based on regular language and typical sentences or phrases. So #ProductType is not what you want in the phrase, but any of the fruits or drinks.
By defining the entities, Watson Assistant later learns the connection and to identify the entities and intents.
To get started, you define the intents and entities. Both can be imported from lists. Then you add the dialog which references the different types.

This blog should give insight to all the ways to train an entity and how it is used within intents.
https://medium.com/ibm-watson/all-about-entities-dictionaries-and-patterns-with-watson-assistant-part-1-5ef7254df76b
There are a number of possible pipelines you can choose from.
1. Indirect references: this is the preferred method.
Use natural language in your intent training data. "I want to buy a pear"
Watson will automatically see the other values you have related to pear and use those as intent training as well. This will be the fastest and simplest way to manage your data
2. Direct references: this should only be used if absolutely necessary
Directly reference the entity in your intent data. "I want to buy an #pear"
Nothing is done in the UI to tell you this works, but it does. This tells Watson the entity is a very important term and will increase the weight, as well as reference all synonyms with high weight. This is more effort for you to go through your entire workspace and relabel everything this way, hence why it is not recommended unless absolutely necessary. By doing this, you also tell watson that when the system sees various fruits without the # symbol, to ignore them as entities which is not ideal
3. Contextual entities. This is highlighting them like in your screenshot.
Note the UI has been updated so there is no an annotation mode instead of just highlighting. This builds a model around the entity, and is good for things like names or locations, but not necessary for a small list of items like crayons in a box, or fruit in a store. This will ignore all of the dictionary values youve created and only look at the model. It should be used according to the blog above when the use case is ideal.

What #data_henrik answered was partially correct. But it doesn't seem like Watson Assistant "automatically" learns the preferred #Entity just by simply inputting the pure (plain-text) Examples into the #Intent. In fact, that step was required. But we still need to do one more step.
After keying in the good plain-text Examples into the #Intent, we then still need to "right click" on the text-string of the possible #Entity entry, and then choose (teach Watson) the correct #Entity name from the dropdown list appeared.
Only then Watson starts to understand such; this #Intent uses that #Entity, I suppose.
Thank you #data_henrik, and appreciate your hint.

Practical usage of noSQL [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I’m starting a new web project and have to decide what database to use. I know, the question is very long but please bear with me on this.
I am very familiar with relational databases and have used frameworks like hibernate to get my data from the DB into Objects. But I have no experience with noSQL DBs. I am aware of the concepts of Document, Key-Value, etc. types.
While I do my research one question pops out every time and I don’t know how someone would handle this in noSQL DBs like MongoDB or any other Document-Typed noSQL DB where consistency takes top priority.
For example: let’s assume that we are creating a small shopping management system where customers can buy and sell stuff.
We have:
CUSTOMERs
ORDERs
PRODUCTs
A single CUSTOMER can have multiple ORDERs and an ORDER can have multiple PRODUCTs.
In a traditional RDBMS I would of course have 3 tables.
In the first version of our application, the front end for the customer should display his/her personal data, ORDERs and all the PRODUCTs he or she bought per order. Also which products are available for sale. So I guess in noSQL I would model the CUSTOMER class like this:
{
"id": 993784,
"firstname": "John",
"lastname": "Doe",
"orders": [
{
"id": 3234,
"quantity": 4,
"products": [
{
"id:" 378234,
"type": "TV",
"resolution": "1920x1080",
"screenSize":37,
"price": 999
}
]
}
],
"products": [
{
"id:" 7932,
"type": "car",
"sold": false,
"horsepower": 90
}
]
}
But later I want to extend my application to have 3 different UIs instead of only the first one:
The CUSTOMER Dashboard where a customer can view all his/her orders.
The PRODUCT Dashboard where a customer can add or remove products in his/her store.
THE SOLD Dashboard where a customer can view all sold PRODUCTs ready for shipping.
One very important thing to consider (the reason why I even bother asking this question): I want to be flexible with the classes like PRODUCT because products can have different properties. For Example: A TV has screen size and resolution while a car has horsepower and other properties. And if a user adds a new product, he or she should be able to dynamically add those properties depending on what he/she knows about it.
Now to some practical use cases of two fictional users Jane and John:
Let's say, Jane buys from John. Does that mean i have to create the PRODUCTs two times? One time as a child of Jane's ORDER and another time to stay in the "products" property of John?
Later Jane wants to view all products that are available from any user. Do i have to load every user to query the "products" property to generate a list of all products?
In version 2 of the application i want to enable John to view all outgoing orders (not orders he made but orders from other users who bought stuff from him) instead of viewing all sold products. How would this be done in noSQL? Would i now need to create an "outgoing" array of orders and duplicate them? (an outgoing order of Jane is an incoming order of John)
Some of you may say that noSQL is not right for this use case but isn’t that very common? Especially when we do not know what the future brings? If it does not fit for this use case, what use case would it fit into? Only baby applications (I guess not)? Wasn’t noSQL designed for more complex and flexible data?
Thank you very much for your advises and opinions!
EDIT 1:
Because this question was put on hold because of the unprecise question:
I made a very clear and simple example. So my question is not general about the use of noSQL but how to handle this specific example. How would a experienced noSQL user handle this use case? How to model this data? A recommendation to simply not use noSQL at all for this use case is also a valid answer to me.
I simply want to know how to use a noSQL database but still be able to manage entities and avoid redundancy.
For example: Are MongoDB's DBRefs/Manual refs a good way to achieve this? Performance issues because of multiple queries? What else to think about? I guess these questions can probably be answered quite well.

There probably isn't the one right answer to your question. But I'll make a start.
While it is technically possible in NoSQL to store some business entity together with all entities that are transitively linked with it (like Customer, Order, Product), it is't always clever to do so. The traditional reasons for separating entities, namely redundancies and therefore update and delete anomalies, don't just go away because a different platform is used.
So if you stored the product description with every customer who buys or sells this product, you will get update anomalies. If you have to change the screen size from 37 to 35, you'll have to find all customer records containing this product, which can be quite cumbersome.
Also, building up such a deep nested structure favors one direction of evaluating those structures over all other directions. If you put all orders and products into the customer document, this is very fine for getting a comprehensive view for a customer: whatever she bought throughout her lifetime. But if you want to query your database by orders (which orders need to be fulfilled tonight?) or products (who ordered product 1234?) you'll have to load tons of data that are of no interest to this query.
Similar questions are due to storing all orders with a customer. Old orders will sometimes still be of interest, so they may not be deleted. But do you want to load lots of orders everytime you load the customer?
This doesn't mean not to make use of the complex structuring made possible by a document store. As a rule of thumb, I would suggest: As long as the nested information belongs to the same business entity, put it into one document. If, e.g., the product description has some hierarchic structure, like nested sections consisting of text, pics, and videos, they may all go into one document. But entities with a totally different life cycle, like customers, orders, and suppliers, should be kept separate. Another indicator is references: A product will frequently be referenced as a whole, e.g. when it is ordered by a customer or ordered from a supplier. But the different parts of the product description may possibly never be referenced from the outside.
This rule of thumb wasn't completely precise, and it's not supposed to be. One person's business entity is another person's dumb attribute. Imagine the color of a car: For the car owner, it's just a piece of information describing a car. For the manufacturer, it's a business entity, having an availability, a price, one or more suppliers, a way of handling it, etc.
Your question also touches the aspect of dynamically adding attributes. This is often praised as one of the goodies of NoSQL, but it's no free lunch. Let's assume, as you mentioned, that the user may add attributes. That's technically possible, but how will these attributes be processed by the system? There won't be a specific view, nor specific business rules, for those attributes. So the best the system can do is offer some generic mechanism for displaying those attributes that were defined at runtime and never reflected in the program code.
This doesn't mean the feature is useless. Imagine your product description may be complex, as described above. You might build a generic mechanism to display (and edit) descriptions made up of sections, texts, images, etc., and afterwards the users may enter descriptions of unlimited width and depth. But in contrast, imagine your user will add a tiny delivery date attribute to the order. Unless the system knows specifically how to interpret this date, it will just be a dumb piece of information without any effect.
Now imagine not the user, but the developer adds new attributes. She has the opportunity to enhance the code at the same time, e.g. building some functionality around delivery dates. But this means that, although the database doesn't require it by its own, a new release of the software needs to be rolled out to make use of the new information.
The absence of a database scheme even makes the programmer's task more complicated. When a relational table has a certain column, you may be sure that each of its records has this column. If you want to make sure that it has a meaningful value, make it not null, and you may be sure that each record contains a value of the correct data type. Nothing like that is guaranteed by schemaless databases. So, when reading a record, defensive programming is needed to find out which parts are present, and whether they have the expected content. The same holds for database maintenance via administrative tools. Adding an attribute and initializing it with a default value is a 2-liner in SQL, or a couple of mouse clicks in pgadmin. For a schemaless database, you will write a short program on your own to achieve this.
This doesn't mean that I dislike NoSQL databases. But I think the "schemaless" characteristic is sometimes overestimated, and I wouldn't make it the main, or only, reason to employ such a database.

More RESTful naming

The more I search Google and SO the more RESTful naming seems like more of a black art than a standard. I would like to lay out a scenario I have and my current line of thinking and ask those of you with REST experience to weigh in.
I have a "Packet" object that you could think of as a physical folder or binder. Inside this packet are one or more forms. My system represents these as a resource called PacketForms. Each form record has a SortIndex column that defines what order the forms are displayed/printed in.
When I display the list of forms in the application, there are up/down arrows that allow the user to change the form's SortIndex. So, now I'm ready to implement this action.
My first thought was to have an operation specifically for promoting/demoting a form in the sort order. If I go with this approach, based on what I've seen here, it seems that I should think about the sort index itself as a resource. So, I could express my intention in a query string like so, right?
PUT /PacketFormSortIndex/5?action=Promote
But I've also thought why not just update the PacketForm itself and let the back-end look for changes in the SortIndex. Rather than a promote/demote approach, if a SortIndex is changed I will swap it with the form that currently has that index. So, if someone updates the PacketForm with SortIndex=3 to have a value of SortIndex=2, the system would update both records to accomplish the swap.
Personally, I like the atomic nature of the first approach. It has a very specific, clear purpose and the code on the back end is cleaner. But if I propagate that logic across my system I worry a bit about "resource sprawl."
So, I guess I have a two later question here. Which, if either, of these approaches feels more "RESTy" to you? If it is the first, is it appropriate to use the querystring in the manner I proposed or is there a more RESTful way to organize that URL?
For something that is being used so widely, I'm really struggling with the wide variety of information I've been finding so your perspective is much appreciated.

Either if you go for the promote/demote or the other way you are talking just about gramatic of your URLs. Behind the scenes the backend will have to do the business and check which other resource affects the order change.
That said, creating a PacketFormSortIndex doesn't seem very useful. Waht would be the difference between applying a demote/promote action on the Packet or the PacketFormSortIndex. For me seems the same thing semantically, so there is no justification for a separate entity.
And finally, I'd go for any of the following alternatives:
1) PUT /packet/1 I'd send only the fields being updated: {"index": 3}, and the magic would happen behind the scenes...but if i was to be resty you should respond with an array of the resources that where updated:
[ { "id": 1, "index": 3}, {"id":4, "index":4}]
2) the bulk way and the logic of determining which resources are affected is in the frontend
PUT /packet/_bulk and send [ { "id": 1, "index": 3}, {"id":4, "index":4}].
For me, if backend performance is not an issue here, which i guess is not, the best solution is 1.

Benefits of RESTful URL

What are the benefits of
http://www.example.com/app/servlet/cat1/cat2/item
URL
over
http://www.example.com/app/servlet?catid=12345
URL
Could there be any problems if we use first URL because initially we were using the first URL and change to second URL. This is in context of large constantly changing content on website. Here categories can be infinite in number.

In relation to a RESTful application, you should not care about the URL template. The "better" one is the one that is easier for the application to generate.
In relation to indexing and SEO, sorry, but it is unlikely that the search engines are going to understand your hypermedia API to be able to index it.
To get a better understanding in regards to the URLs, have a look at:
Is That REST API Really RPC? Roy Fielding Seems to Think So
Richardson Maturity Model

One difference is that the second URL doesn't name the categories, so the client code and indeed human users need to look up some category name to number mapping page first, store those mappings, use them all the time, and refresh the list when previously unknown categories are encountered etc.. Given the first URL you necessarily know the categories even if the item page doesn't mention them (but the site may still need a list of categories somewhere anyway).
Another difference is that the first format encodes two levels of categorisation, whereas the second hides the number of levels. That might make things easier or harder depending on how variable you want the depth to be (now or later) and whether someone inappropriately couples code to 2-level depth (for example, by parsing the URLs with a regexp capturing the categories using two subgroups). Of course, the same problem could exist if they couple themselves to the current depth of categories listed in a id->category-path mapping page anyway....

In terms of SEO, if this is something you want indexed by search engines the first is better assuming the category names are descriptive of the content under them. Most engines favor URLs that match the search query. However, if category names can change you likely need to maintain 301 redirects when they do.

The first form will be better indexed by search engines, and is more cache friendly. The latter is both an advantage (you can decrease the load on your server) and a disadvantage (you aren't necessarily aware of people re-visiting your page, and page changes may not propagate immediately to the users: a little care must be taken to achieve this).
The first form also requires (somewhat) heavier processing to get the desired item from the URL.
If you can control the URL syntax, I'd suggest something like:
http://www.example.com/app/servlet/cat1/cat2/item/12345
or better yet, through URL rewrite,
http://www.example.com/cat1/cat2/item/12345
where 12345 is the resource ID. Then when you access the data (which you would have done anyway), are able to do so quickly; and you just verify that the record does match cat1, cat2 and item. Experiment with page cache settings and be sure to send out ETag (maybe based on ID?) and Last-Modified headers, as well as checking If-Modified-Since and If-None-Match header requests.

What we have here is not a matter of "better" indexing but of relevancy.
And so, 1st URL will mark your page as a more relevant to the subject (assuming correlation between page/cat name and subject matter).
For example: Let`s say we both want to rank for "Red Nike shoes", say (for a simplicity sake) that we both got the same "score" on all SEO factors except for URL.
In 1st case the URL can be http://www.example.com/app/servlet/shoes/nike/red-nice
and in the second http://www.example.com/app/servlet?itemid=12345.
Just by looking on both string you can intuitively sense which one is more relevant...
The 1st one tells you up-front "Heck yes, I`m all about Red Nike Shoes" while the 2nd one kinda mumbles "Red Nike Shoes? Did you meant item code 12345?"
Also, Having part of the KW in the URL will help you get more relevancy and also it can help you win "long-tail" goals without much work. (just having KW in URL can sometimes be enough)
But the issue goes even deeper.
The second type of URL includes parameters and those can (an 99.9% will) lead to duplicated content issue. When using parameters you`ll have to deal with questions like:
What happens for non-existent catid?
Is there a parameter verification? (and how full proof is it?)
and etc.
So why choose the second version? Because sometime you just don`t have a choice... :)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse