RDBMS - Normalization - rdbms

I've been asked a question in an online assessment test, the question goes like this..
"In normalization, the tables are linked using?":
Relationships
Records
Triggers
Transactions
I haven't heard about any linking in normalization, just wanted to confirm what could be the possible answer?

Linking refers to getting information from across multiple tables (joining). Since you only need the schema for this the only valid answer is a).

Related

Key value oriented database vs document oriented database

I have recently started learning NO SQL databases and I came across Key-Value oriented databases and Document oriented databases. Since they have a similar structure, aren't they saved and retrieved the exact same way? And if that is the case then why do we define them as separate types? Otherwise, how they are saved in the file system?
To get started it is better to pin point the least wrong vocabulary. What used to be called nosql is too broad in scope, and often there is no intersection feature-wise between two database that are dubbed nosql except for the fact that they somehow deal with "data". What program does not deal with data?! In the same spirit, I avoid the term Relational Database Management System (RDBMS). It is clear to most speakers and listeners that RDBMS is something among SQL Server, some kind of Oracle database, MySQL, PostgreSQL. It is fuzzy whether that includes SQLite, that is already an indicator, that "relational database" ain't the perfect word to describe the concept behind it. Even more so, what people usually call nosql never forbid relations. Even on top of "key-value" stores, one can build relations. In a Resource Description Framework database, the equivalent of SQL rows are called tuple, triple, quads and more generally and more simply: relations. Another example of relational database are database powered by datalog. So RDBMS and relational database is not a good word to describe the intended concepts, and when used by someone, only speak about the narrow view they have about the various paradigms that exists in the data(base) world.
In my opinion, it is better to speak of "SQL databases" that describe the databases that support a subset or superset of SQL programming language as defined by the ISO standard.
Then, the NoSQL wording makes sense: database that do not provide support for SQL programming language. In particular, that exclude Cassandra and Neo4J, that can be programmed with a language (respectivly CQL and Cypher / GQL) which surface syntax looks like SQL, but does not have the semantic of SQL (neither a superset, nor a subset of SQL). Remains Google BigQuery, which feels a lot like SQL, but I am not familiar enough with it to be able to draw a line.
Key-value store is also fuzzy. memcached, REDIS, foundationdb, wiredtiger, dbm, tokyo cabinet et. al are very different from each other and are used in verrrrrrrrrrry different use-cases.
Sorry, document-oriented database is not precise enough. Historically, they were two main databases, so called document database: ElasticSearch and MongoDB. And those yet-another-time, are very different software, and when used properly, do not solve the same problems.
You might have guessed it already, your question shows a lack of work, and as phrased, and even if I did not want to shave a yak regarding vocabulary related to databases, is too broad.
Since they have a similar structure,
No.
aren't they saved and retrieved the exact same way?
No.
And if that is the case then why do we define them as separate types?
Their programming interface, their deployment strategy and their internal structure, and intended use-cases are much different.
Otherwise, how they are saved in the file system?
That question alone is too broad, you need to ask a specific question at least explain your understanding of how one or more database work, and ask a question about where you want to go / what you want to understand. "How to go from point A-understanding (given), to point B-understanding (question)". In your question point A is absent, and point B is fuzzy or too-broad.
Moar:
First, make sure you have solid understanding of an SQL database, at the very least the SQL language (then dive into indices and at last fine-tuning). Without SQL knowledge, your are worthless on the job market. If you already have a good grasp of SQL, my recommendation is to forgo everything else but FoundationDB.
If you still want "benchmark" databases, first set a situation (real or imaginary) ie. a project that you know well, that requires a database. Try to fit several databases to solve the problems of that project.
Lastly, if you have a precise project in mind, try to answer the following questions, prior to asking another question on database-design:
What guarantees do you need. Question all the properties of ACID: Atomic, Consistent, Isolation, Durability. Look into BASE. You do not necessarily need ACID or BASE, but it is a good basis that is well documented to know where you want / need to go.
What is size of the data?
What is the shape of the data? Are they well defined types? Are they polymorphic types (heterogeneous shapes)?
Workload: Write-once then Read-only, mostly reads, mostly writes, a mix of both. Answer also the question how fast or slow can be writes or reads.
Querying: How queries look like: recursive / deep, columns or rows, or neighboor hood queries (like graphql and SQL without recursive queries do). Again what is the expected time to response.
Do not forgo to at least the review deployement and scaling strategies prior to commit to a particular solution.
On my side, I picked up foundationdb because it is the most versatile in those regards, even if at the moment it requires some code to be a drop-in replacement for all postgresql features.

Relational database, multiple master detail

I am designing a relational database for storing application information.
Each application can contain participants of different types: legal persons and physical persons.
Form of a legal person and physical persons are different and contains different fields.
I want to have a common pk for participants therefore I created three tables:
participants - here I store common information;
participants_phys_dtls - here I store fields of physical person;
participants_lgl_dtls - here I store fields of legal person;
Cons of this approach is that I have difficult structure and to get information
of participants I have to join three different tables using left join.
An Alternative solution is to unify these three tables into one (participants).
Cons of this solution is that the table is big and ambiguous.
Please advise me which solution to choose and why, or some other better solution for this problem.
You should go for the first approach you mentioned in your question that splits into three tables. This keeps you away from different anomalies like update, delete, insert etc.
This is a good read about normalization. Go through this link and you will know how normalization works and how we can normalize our DB Design. Hope it helps :)

Documentation for node.js sequelize index lookup

I'm having trouble finding information about how to look up records by an index using sequelize/postgres for node.js.
The only documentation of indexes appears to be here: http://sequelizejs.com/documentation#migrations-functions
To illustrate what I'm asking, let's take a simple model where there is a person there are Persons, Projects, and Tasks. Each person references a number of assigned tasks, and each project has a number of assigned tasks. Each task has a back-reference to the project and person. We'll assume that each person only has one task per project.
Let's say I have a person and project, and I need to find if there is a task associated. I've tried implementing this through an index on task of person/project.
I've found through searches that you can also create indexes through the slightly unintuitive syntax:
global.db.sequelize.getQueryInterface().addIndex('Tasks',
['ProjectId', 'PersonId'],
{
indexName: 'IndexName',
indicesType: 'UNIQUE'
}
This seems to work, and the index is created. However, I can't find a reference anywhere in the docs or even on the internet about how to use this index to find the task.
Any suggestions?
You have a fundamental misunderstanding of how a RDBMS is supposed to work.
It is supposed to pick the best indexes for each query based upon the pattern of database access required. This is performed by the "planner" in the RDBMS.
Some terms you will find useful to search against as you use PostgreSQL:
- Primary Key
- Foreign Key
- Constraint (both the above are these)
- EXPLAIN ANALYSE (or ANALYZE depending on your dialect of English)
- http://explain.depesz.com/ - a useful site to colour the above explains
- pg_dump / pg_restore - make sure you can use these tools to backup your database
Finally, make yourself a good hot cup of tea or coffee and sit down and at least skim through the PostgreSQL manuals. At least it will give you an idea of where to find further information.
Good Luck!
True, I'm coming from Cache's database structure, which very few people actually use.
I think the best answer to the question is that you just do the lookup as normal, and PostgreSQL takes care of the rest. Good to know!

Which NoSQL databases support text array columns (and indexes on this columns) like the postgreSQL text[] type?

I need to move data from a postgreSQL to a NoSQL database, in the process we are evaluating different NoSQL databases and Cassandra came up as a possibility but from the documentation it seems like Cassandra doesn't support having a text array as a column type, is this correct? Which NoSQL databases support this type of columns and support indexes on this type of columns?
For example to store this and have an index on a column with this type of data:
City:['Washington','Washington DC']
Thanks in advance!
Not exactly an answer to your question (not enough reputation to comment (?!?)), but understanding that your problem is scale, and you are coming from PostgreSQL, have you tried PostgresXC yet? That may be a much easier transition than to NoSQL. NoSQL databases, as I assume you know, have very different performance characteristics and nuances that might actually do more harm than good. Postgres-XC is a multi-master write-scalable fork of PostgreSQL and sits somewhere between 9.1 and 9.2 from a PostgreSQL feature standpoint and it is an active project. 9.2 conformance was slated this month or last if I recall correctly. It's relatively easy to set up for what it is - you'll build 2 GTM's, one as a primary and one as a failover, give them enough memory. Then you can scale horizontally by adding pairs of coordinators and data nodes, 1 coordinator and 1 data node per server. Your application tier can talk to any of the coordinators, transactions are shipped to the appropriate coordinators and you can specify the distribution of your data by table - either replicated for small reference tables or distributed for large ones. If you design your queries well, you can get massive performance improvement because your queries can be shipped and executed simultaneously on multiple coordinator/data node pairs.
I know you are looking for NoSQL, but I mention this because we too had a vertical vs horizontal scale problem and in the end I found it was easier to build NoSQL capability into a relational system than it was to build relational capability into a NoSQL system. And of course it all depends on your data, sometimes NoSQL is absolutely the best choice. Sometimes it can be a major headache too, for example some NoSQL databases have problems with filesystem growth so whereas you thought you bought horizontal scalability you wound up eating your SAN out of house and home.
Anyway, hope that helps! I would have left it as a comment but stackoverflow has that strange reputation thing going on.
I forgot to mention also, with Postgres-XC you can specify on which columns you wish to distribute and by what kind of algorithm. I typically distribute by hash, and make sure of two things, first that hash can be generated application-side so that I don't have to do joins on tables that are gadzillions of rows and second that the hash keeps the distribution level across servers correct but while also keeping related information together on the same server so as to increase the shippability of queries. That is, if you have a customer table and a customer orders table, distribute both on a hash of some customer unique information that is in both tables and make sure you can generate that application-side. I hope that makes sense, I'm not sure if I did a good job explaining. If you would like further clarification on that please let me know, the docs are a bit scattered on XC right now, so a lot of what I related is OJT.

NoSQL databases alternatives/patterns/equivalences to Relational DB

From what I saw in this video...
http://www.youtube.com/watch?v=LhnGarRsKnA
pretty much all the traditional RDBM operations (JOINS, GROUP BY, HAVING, etc) can be done in NoSQL databases through a combination of MapReduce/denormalization techniques.
Is there any article/document that has all these equivalences clearly described. Something like... the equivalence of a JOIN in a NoSQL database would be... bla bla
I just can't find this kind of documentation anywhere :(
There's a reason you can't find that type of documentation. For a start, unlike SQL, there is no such thing as a standard NoSQL database. Have you tried searching for specific NoSQL data stores?
Also, trying to convert relational operations to non relational systems will just get you into trouble. Instead you need to look at what you are trying to do with those relational operations. For example is group by for sorting a list with categories or for handling heirarchical objects when you don't have multiple value fields? Is join assembling a single object stored in multiple tables or calculating a set intersection?
One of the biggest strengths of SQL is that any data can be represented in a standard way and any query can be run on that data. It won't necessarily be the best fit for that data, but it is standard and there is a single correct answer for almost any question. NoSQL is mostly about being able to optimize your data store for what you actually need by sacrificing the generality of SQL. That may be performance, handling a large dataset, handling inconsistent data, or just simpler code. In short you need to understand your requirements and the tradeoffs involved in optimizing for them rather than just choosing SQL by default.
Your best option is to pick a datastore that fits what you need (a good comparison of high level features is at
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis ) and look for some examples of how systems are designed using that data store. With luck you will find something close to what you are working on.
This article has a some good info on NoSQL vs using SQL
http://www.develop.com/mongoDB
Be very careful... you probably don't want to join a 1 trillion row table with something. Think about patterns of partitioning your data. For example with playOrm you can partition your data and do S-SQL(Scalable SQL) so you can do all the joins you want to within a partition which in many OLTP apps is exactly what you need as if your customers are businesses, each business is in it's own partition and you can join all you want all the tables related to that one business.
Here is a list of patterns to help you gain ground in the noSql world....(it's a work in progress).
https://github.com/deanhiller/playorm/wiki/Patterns-Page