Postgres or MongoDB - mongodb

I have to make a website. I have the choice between Postgres and MongoDB.
Case 1 : Postgres
Each page has one table and each table has only one row, for one page (each page is not structured like another)
I have a timelined page with medias (albums and videos)
So I have multiple medias (pictures, videos), and I display it as well by an album of pictures page and a videos page.
Therefore I have a medias table, linked with an album table (many-to-many), and a type column for determining if it's picture or video.
Case 2 : MongoDB
I'm completely new to NoSQL and I don't know how to store the data.
Problems that I see
Only one row for a table, that disturb me
In the medias table, I can have an album with videos, I'd like to avoid it. But if I cut this table in pictures table and videos table, How can I do a single call to have all the medias for the timelined page.
That's why I think it's better to me to make the website with MongoDB.
What is the best solution, Postgres or MongoDB? What do I need to know if it's MongoDB? Or maybe something escape me for Postgres.

It will depend on time, if you don't have time to learn another technology, the answer is to going straight forward with the one you know and solve the issues with it.
If scalability is more important, then you'll have to take a deeper look to your architecture and know very well how to scale postgresql.
Postgresql can handle json columns for unstructured data, I use it and it's great. I will have a single table with the unstructured data in a column name page_structure, so you'll have one single big indexed table instead of a lot of one row tables.
It's relative easy to query just what you want so no need no separate tables for images and videos, in order to be more specific, you'll need to provide some scheme.

I think you are coming to the right conclusion of using a NoSql database because you are not sure about the columns in a table for a page and thats the reason you are creating different tables for different pages. I will still say to make columns a bit consistent over the records. Anyways, by using MongoDB, you can have different records (called documents in MongoDB) with different columns based on attributes of your page in a single Collection (Tables in SQL). You can have pictures and videos collections separately if you want and wire them with your page collection using some foreign key like page_id. Or you can call page collection to get all the attributes including an array containing the IDs of all videos or pictures by which you can retrieve corresponding videos and pictures of a particular page like illustrated below,
Collections
Pages [{id, name, ...., [video1, video2,..], [pic1, pic2, pic78,...]}, id, name, ...., [video1_id, video2_id,..], [pic1_id, pic2_id, pic78_id,...]},...]
Videos [{video1_id, content,... }, {video2_id, content,...}]
Pictures [{pic1_id, content,... }, {pic2_id, content,...}]

I suggest you use the Clean Code architecture. personally, I believe that you MUST departure your application logic and data access functions aside so they can both work separately. your code must not rely on your database. I rather code the way that I can migrate my data to every database I'd like it would still work.
think about when your project gets big and you want to try cashing to solve a problem. if your data access functions are not separated from your business logic code you can not easily achieve that.
I agree with #espino316 about using the project you are already familiar with.
and also with #Actung about you should consider learning a database like MongoDB but in some training projects first, because there are many projects that the best way to go is to use NoSQL.
just consider that might find out about this 2 years AFTER you deployed your website. or the opposite way, you go for MongoDB and you realize the best way to go was to use Postgres or IDK MySQL, etc.
I think the best way to go is to make the migration easy for yourself.
all the best <3

Related

PostgreSQL: JSON column or one-to-many table for config options

We currently have a table which stores information about users. Some of the columns hold information such as user ID, name etc., but many other columns (booleans, integers and varchars etc) hold configuration options for each user.
This has over time resulted in the width of the table becoming quite big and I think the time has come to migrate this to something new, so I want to remove all the "option"-related columns to a separate data structure.
The typical way of doing this, from my experience, would be to have a new table which would simply have option_id and option_name, and a second new table which would contain user_id, option_id, option_value, for example.
However, a colleague suggested using the new jsonb column type as an alternative, but I don't know if I like the idea of storing relational data in a non-relational way. From a Java point of view, it's pretty much the same as far as I can tell - it'll just be turned into a POJO and then cached on the object.
I should mention the number of users will be quite low, only going into the thousands, and number of columns could and will go into the hundreds.
Does anyone have advice on the best way forward here?
Technically, you have already de-normalized your database structure by adding columns to a table that are irrelevant to some of the entities stored therein.
Using JSON is just another way to de-normalize, cramming a bunch of values into a single row-column field. The excellent binary support for JSON in Postgres (the jsonb data type) then lets you index elements within those JSON documents, as a way to quickly access those embedded values. This is quite screwy from a relational point of view, but is handy for some situations.
Either approach is commonly done for this kind of problem, and is not necessarily bad. In general, de-normalizing is often a pay-now-or-pay-later kind of solution. But for something like user preferences, there may not be a pay-later penalty, as there often is with most business-oriented problem domains.
Nevertheless, you should consider a normalized database structure.
By the way, this kind of table-structure Question might be better asked in the sister site, http://DBA.StackExchange.com/.
I suggest searching Stack Overflow, that DBA site, and the wider Internet for discussions of database design for storing user preferences. Like this.

What is the best way to store data where one column has values that repeat ranging anywhere from 1-300+ times?

I've used web scraping to grab approximately 10,000 movies and all their associated review pages URLs, and the next step for me is to grab every single one of those reviews so that I can get the overall positive/negative reviews using sentiment analysis.
I'm writing all this in Python and am using the Pandas library as my means of pre-processing and structuring all the data. Already I have around 36,000 rows containing the name of the movie in one column and the URLs in the other, with the movie name being repeated over and over again, and with the average reviews per page being 20 I'm looking at roughly 720,000 rows when all things are said and done.
This is for the final project of the college course I'm taking, and throughout my schooling I've come to fear data redundancy in databases. I will eventually be writing all of this to a PostgreSQL database so users can query any movie to get back the prediction, and I'm having a hard time overlooking the fact that these movie titles are being repeated so often.
I was wondering if there was a better way to go about this (which could also hopefully save me some processing time), any help would be greatly appreciated!
I feel like this is more of a direct question than a code issue, but if necessary I can provide any relevant code.
If all the information you have about each movie, there is no redundancy (in the relational sense) , since this is the unique identifier.
You could save some space by having a separate movie table that contains an artificial numeric ID and the name and reference the ID from the main table, but that will make your queries more complicated and seems unnecessary for a small table like this.
What I would be more concerned about is whether the movie name is a good identifier at all: what if two movies have the same name? In this age of remakes, that is not a rarity.

JSON or relational tables for complex user profiles

I am trying to design a Postgres database for holding a variety of information about users and see two obvious ways to go about it - specifically, the different many-many relations.
Store the basic user data in a user_info table. In separate tables, store the many-many relations like what schools someone attended, places they worked at, and so on. There will be a large number of such tables, (it is easy to add things like what places someone visited, what books they've read, etc. etc. I expect this to grow to a rather large list of tables).
In the main user_info table, store a JSON blob (properly organized of course) with all this additional info.
Which of these two options should I choose? Naturally, read performance is more important. I know that JSON is generally slower than ordinary relational tables but I am unsure if looking up info from a lot of different tables (as in option 1) will be slower than getting a single json blob and displaying it in the browser. As a further note, the JSONB format, in Postgres, actually has good indexing options.
Update:
Following some comments that a graphdb is what needs to be used: I should clarify the question is not about the choice of technology (rdbms vs graph db). But about the choice of data type given the technology (rdbms).
NoSQL is great for when you don't know what data you're going to store or how it's going to be used, or it fits well with the list/hash model. Relational databases are great for when you have a lot of certainty about the data, how it will be used, and when it fits into the relational model. I would suggest a hybrid approach, especially given PostgreSQL 9.2's JSON performance improvements.
Make traditional relationships for things you know are solid.
Make use of JSON for data that you want to capture but aren't sure you need.
For simple lists, make use of PostgreSQL arrays or JSON rather than join tables.
Abstract this all behind model classes.
As you gain more knowledge about the data, change how its stored.
For example, make tables for People, Schools, Work and Places and join tables between them. Fields like People.name and Places.address are normal columns. Things like "list of a person's pets" store it as an array of TEXT or a JSON field until you feel you need a Pets table. Any extra information you don't immediately know what you're going to do with like "how big is a school's endowment" put into a JSON metadata column.
Using model classes allows you to refactor your database without worrying about every piece of code that touches the database. Just be sure that all code which makes assumptions about the table structure goes into model methods.

Database design: Postgres or EAV to hold semi-structured data

I was given the task to decide whether our stack of technologies is adequate to complete the project we have at hand or should we change it (and to which technologies exactly).
The problem is that I'm just a SQL Server DBA and I have a few days to come up with a solution...
This is what our client wants:
They want a web application to centralize pharmaceutical researches separated into topics, or projects, in their jargon. These researches are sent as csv files and they are somewhat structured as follows:
Project (just a name for the project)
Segment (could be behavioral, toxicology, etc. There is a finite set of about 10 segments. Each csv file holds a segment)
Mandatory fixed fields (a small set of fields that are always present, like Date, subjects IDs, etc. These will be the PKs).
Dynamic fields (could be anything here, but always as a key/pair value and shouldn't be more than 200 fields)
Whatever files (images, PDFs, etc.) that are associated with the project.
At the moment, they just want to store these files and retrieve them through a simple search mechanism.
They don't want to crunch the numbers at this point.
98% of the files have a couple of thousand lines, but there's a 2% with a couple of million rows (and around 200 fields).
This is what we are developing so far:
The back-end is SQL 2008R2. I've designed EAVs for each segment (before anything please keep in mind that this is not our first EAV design. It worked well before with less data.) and the mid-tier/front-end is PHP 5.3 and Laravel 4 framework with Bootstrap.
The issue we are experiencing is that PHP chokes up with the big files. It can't insert into SQL in a timely fashion when there's more than 100k rows and that's because there's a lot of pivoting involved and, on top of that, PHP needs to get back all the fields IDs first to start inserting. I'll explain: this is necessary because the client wants some sort of control on the fields names. We created a repository for all the possible fields to try and minimize ambiguity problems; fields, for instance, named as "Blood Pressure", "BP", "BloodPressure" or "Blood-Pressure" should all be stored under the same name in the database. So, to minimize the issue, the user has to actually insert his csv fields into another table first, we called it properties table. This action won't completely solve the problem, but as he's inserting the fields, he's seeing possible matches already inserted. When the user types in blood, there's a panel showing all the fields already used with the word blood. If the user thinks it's the same thing, he has to change the csv header to the field. Anyway, all this is to explain that's not a simple EAV structure and there's a lot of back and forth of IDs.
This issue is giving us second thoughts about our technologies stack choice, but we have limitations on our possible choices: I only have worked with relational DBs so far, only SQL Server actually and the other guys know only PHP. I guess a MS full stack is out of the question.
It seems to me that a non-SQL approach would be the best. I read a lot about MongoDB but honestly, I think it would be a super steep learning curve for us and if they want to start crunching the numbers or even to have some reporting capabilities,
I guess Mongo wouldn't be up to that. I'm reading about PostgreSQL which is relational and it's famous HStore type. So here is where my questions start:
Would you guys think that Postgres would be a better fit than SQL Server for this project?
Would we be able to convert the csv files into JSON objects or whatever to be stored into HStore fields and be somewhat queryable?
Is there any issues with Postgres sitting in a windows box? I don't think our client has Linux admins. Nor have we for that matter...
Is it's licensing free for commercial applications?
Or should we stick with what we have and try to sort the problem out with staging tables or bulk-insert or other technique that relies on the back-end to do the heavy lifting?
Sorry for the long post and thanks for your input guys, I appreciate all answers as I'm pulling my hair out here :)

Query to Database MySQL iOS application

I have a database mysql and it contains 1000 records of people (Name, Age, Adress, Country). I want to make an application that :
Randomly displays ONE of the records in the database.
Randomly displays ONE of the records according to a specific 'Country'.
My usual approach to this is : parse ALL data to the iphone using JSON, then manipulate the data. Because of the volume of data that the user needs to download (1000 records) and because i only need just ONE of the records to be displayed, it seems wasteful to download all 1000 records and then sort the data. What other approach can i use? (not sure but is it possible to query the database directly?)
It would be very unsafe to query the DB directly and the usual approach you mentioned would be a very bad idea. So, the general approach, safe and optimized, for this scenario is to write some web-service, can be in PHP or ASP.net or etc., end-points which can optionally take some parameters, like the specific country, as argument and return you data in JSON format and then you can display it. This is how I'd approach to such problem. Here's a tutorial on writing some basic web-services with PHP.
http://davidwalsh.name/web-service-php-mysql-xml-json
Hope that'd help.