How to design a schema for a website similar to monster.com? [closed] - database-schema

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am developing a website similar to a job seeker's site. There is also a 3rd party involved here as well (like Job Seeker, Employer and Job/Recruitment_Agency). The 3rd party arranges the transaction between the first 2 parties.
What approach i would follow? How the schema would look like? Also can someone refer some good links or sample schemas as well. Thanks

Assuming you want to work with relational databases:
Classic schema (like an orm-generated one):
Table 1: Job_Seeker
Table 2: Employer
Table 3: Recruitment_Agency
Table 4: Job_Seeker__Employer__Recruitment_Agency
foreign keys:
*job_seeker_id
*employer_id
*recruitment_agency_id
And ensure indexes over those three foreign keys to achieve better performances.
This way you can immediately query what you want, starting from any table. For example you can use the Table 4 to start any join between Table 1 and Table 2 adding conditions on Table 3
Pseudocode for selecting all the records coming from agency #4:
join table1, table2 on job_seeker_id=table1.id, employer_id=table2.id having recruitment_agency_id=4
Which requires "only" one join.
Then you could add other intermediate tables if you only need to join two tables, but that depends entirely on what you are doing.
Obviously in the long run joins are not really fast, are memory consuming and will jeopardize scalability in (especially if you join on multiple tables with many columns), therefore what you could do if your website is supposed to scale a lot is to denormalize your schema (i.e. avoid referencing other rows but rather copying them into the one you search).
The extreme option of denormalization and scalability would be with a schema free database (let's say a document based like mongodb for sake of simplicity). Then my documents would be of this kind:
{
"_id":1,
"job_seeker":{"name":"John Doe","address","New York"},
"employer" : {"name":"Mario Rossi","address","Rome"},
"agency" : {"name":"My agency","address","London"}
}
Obviously non-rel database are ok for you only if you don't have really strange queries (which would still be possible with Map/Reduce but would be tedious to write).
For a general reference I especially like this article.
For a more intense knowledge of relational databases this is a must-have-book (AFAIK it's the reference text for the first database course at Stanford).

You write tests describing what the application should do. Do that for each user story (As a job seeker, I want to register in order to be able to be contacted by a recruitment agency). Prioritize the user stories. Each story makes for a few small changes in the database scheme.

Employer
- Id
- profile
JobSeeker
- Id
- Educational info
- Professional info
Job/Recruitment_Agency
- Id
- Join date
Transactions:
- Id
-Customer ID: ID of customers in transaction.
-Completed
-Amount (Credit)
-Reciept ID: the ID returned from Payment Gateway.
-Created Date: set the created date of the transaction
-Payment Method: set payment method for the transaction
-Detail: enter the detail information of the transaction (optional)
- Employer_id
- JobSeeker_id
- Job/Recruitment_Agency_id

Related

Explain database design to me, and relational vs non relational designs [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm naturally a front-end guy but database design and back-end dev has piqued my interest the last few days and I'm lingering very confused. I'd like to fully grasp these concepts so my head doesn't hurt anymore.
I'm used to dealing with an ORM like active record. When I imagine my user objects through rails I picture an object of a person being related to a row in the people table. Ok basic.
So, I read that non-relational databases like mongodb aren't just cool because they're "fast with big data" but they're also cool because they apparently make developing more natural with an oop language (why?). Then I also read that most design patterns probably aren't truly relational. Ok, this is where I get lost.
1) What are some fundamental examples of relational design and non-relational design?
2) similar to above, what are examples of structured data vs unstructured (is this just rephrasing above?)
So, given those things I feel (in my immediate ignorance) that almost every type of project I've attempted to model against has been relational. But maybe I'm just using semantics over technicality. For example, posts and comments. Relational to each other. Add users in there. it seems most apps these days have data that is always useful to reach through other data/objects. That's relational isn't it?
How about something describing something less typical.
Let's say I was building a workout tracker app. I have users, exercises, workouts, routines and log_entries.
I create a routine to group workouts. A workout is a group of exercises. I record my reps and weight through log entries for my exercises. Is this relational data or non relational? Would mongo be great for modeling this or awful?
I hear things like statistics come into play. How does that affect the above example? What statistics are people usually speaking of?
Let's say I added tracking other things, like a users weight, height, body fat and so on. How does that affect things?
Thank you for taking the time to help me understand.
Edit: can anyone explain why it may be easier to develop for one over the other. Is using something like mongo more agile both because it "clicks" more once you "ge it" and also because you don't need to run migrations?
Another thing, when using an abstraction like an ORM - does it really matter? If so when? To me the initial value would be the ease of querying data and modeling my objects. Whatever lets me do that easier is what I'd be happy with. I truthfully do find myself scratching my head a lot when trying to model data.
OK, let me give it a stab...
(DISCLAIMER: I have very little practical experience with non-relational databases, so this will be a bit "relation-centric".)
Relational databases use "cookie-cutters" (tables) to produce large numbers of uniform "cookies" (rows in these tables). Non-relational databases give you greater latitude as to how to shape your "cookies", but you loose the power of set-based SQL. A little bit like static vs. dynamic typing.
1) What are some fundamental examples of relational design and non-relational design?
A tuple is an ordered list of attributes, a relation is a set of tuples of the same "shape", a table is a physical representation of a relation. If something fits this paradigm, it is "relational".
Financial transactions tend to be relational, while (say) text documents don't. There is a lot of gray area in between.
2) similar to above, what are examples of structured data vs unstructured (is this just rephrasing above?)
I don't know. How do you define "structured"?
it seems most apps these days have data that is always useful to reach through other data/objects. That's relational isn't it?
Sure. Relational databases don't have "pointers" per-se, but foreign keys fulfill essentially the same role. You can always JOIN the data the way you see fit, although JOINS are usually done over FKs.
Let's say I was building a workout tracker app. I have users, exercises, workouts, routines and log_entries.
It looks like this would fit nicely in the relational model.
Let's say I added tracking other things, like a users weight, height, body fat and so on. How does that affect things?
You'll probably need additional columns and/or tables. This still looks relational to me.
What statistics are people usually speaking of?
Perhaps they are talking about index statistics, that help the cost-based query optimizer pick a better execution plan?
why it may be easier to develop for one over the other
Right tool for the job, remember? If you know the structure of your data in advance and it fits the relational model, use a relational database.
Also, relational databases tend to be extraordinarily good at managing huge amounts of data, accessed concurrently by many users (though this might be true for some non-relational databases as well).
Another thing, when using an abstraction like an ORM - does it really matter?
ORMs tend to make easy things easier and hard things harder. Again, it's a matter of balance. For me, being able to write a finely-tuned SQL that extracts exactly the fields I need trumps the tedium of writing boiler-plate CRUD, so I'm not a very big fan of ORMs. Other people might disagree.
I truthfully do find myself scratching my head a lot when trying to model data.
Well, it is a big topic. If you are willing to put the time and effort in it, start with the ERwin Methods Guide, and also take a look at: Use The Index, Luke!

Database choice for stock data [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I'm wondering if NoSQL is an option for this scenario:
The input are hourly stock data (sku, amount, price and some more specific) from several sources. Older versions will just get droped. So we won't get over 1 mio. data sets in the near future and there won't be any business intelligence queries like in data warehouses. But there will be aggregations, at least for the minimal price of a group of articles which has to get updated if the article with the minimal price of a group is sold out. In addition to these bulk writes on a frequent base there will be single decrements on the amount of an article which can happen at any time.
The database would be part of a service which needs to give fast responses to requests via REST. So there needs to be some kind of caching. There is no need for stong consistency, but durabiltity.
Further wishlist:
should scale well for growing request load
inexpensive technologies in terms of money and complexity (no Oracle cluster)
no proprietary languages (no PL/SQL)
MongoDB with its aggregation framework seems promising. Can you think of alteratives? (I do not stick to NoSQL!)
I would start with Redis, and here is why:
"there needs to be some kind of caching" => and that is what Redis is best at. If for any reason you decide that you need "more", you can add "more", but still keep whatever you already developed in Redis as a cache for that "more"
One Redis is fast. Two Redises are faster. Three Redises are a unit faster than two, etc..
Learning curve is quite flat, and fun => since set theory really is fun
Increments / Decrements / Min / Max is a Redis' native talk
Redis integration with XYZ (you mentioned a need for a REST API) is all over google and github
Redis is honest <= actually one of my favorite features of Redis
MongoDB would work at first, so will ANY other major NoSQL, but why!?
I would go with Redis, and if you decide later you need "more", I would first look at "Redis + SQL db (Postgre/MySQL/etc..)", it will give you both of two worlds => "Caching / Speed" and the "Aggregation Power" in case you aggregations would need to go above and beyond Min/Max/Incr/Decr.
Whoever tells you PostgreSQL "is not fast enough for writing" does not know it.
Whoever tells you that MySQL "is not scalable enough" does not know it (e.g. Facebook runs on MySQL).
As I am already on the roll :) => whoever tells you MongoDB has "replica sets and sharding" does not wish you well, since replica sets and sharding only look sexy from the docs and hype. Once you need to reshard / reorg replica sets, you'll know the price of a wrong shard key selection and magic chunk movements...
Again => Redis FTW!
Well, it seems to me like the MongoDB is the best choice.
It has not only aggregation features but map/reduce queries possibilities for statistics calculation purposes. It may be scaled via replica sets and sharding, has the atomic updates for increments (decrements is just the negative increments).
Alternatives:
CouchDB - not fast enough on reading
Redis - is key/value db. you will need to program articles logic on the application level
MySQL - is not scalable enough
PostgreSQL - could be good alternative if scaled using pgbouncer but is not fast enough on writing

what is wrong with a relational DB and why would you switch to mongoDB [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I was looking trough their website and I can't understand the problem that they are solving. What is the problem with the relational DB? How can be data stored in JSON documents any faster than the data stored in an SQL database?
In a fully normalized relational DB, every insertion will often require several look-ups in other tables (and its own table) to maintain data integrity (FKs). This is generally a good thing, but takes time. It's also often the case that you need to update several rows in different tables at once, leading to even more look-ups and transactional overhead.
Querying the database will also often need to look at many different tables and merge them.
A mongoDB document on the other hand is a much simpler construct. Every collection is like a big un-normalized table but where all fields are optional (but still indexable), so there is very little space overhead (compared to a relational DB with the same setup).
It offers flexibility and speed at the cost of complex querying and removing data integrity logic from the server to the client (database client, not end user client ;)).
Both has its uses, but the question that has normally been "do we need something different from a Relational DB?" should nowadays be "do we need something more complex than a document DB?" imo, and the vast majority of projects will not.
I think if you're happy with relational database for you task, you needn't switch to mongoDb. I think mongodb is supposed to make scaling out simpler than for rdbms. For some tasks I think I think you can get benefits from flexible schema in mongodb as well. I think it mostly make sense to discuss using some database for a concrete task.

When to replace RDBMS/ORM with NoSQL [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What kind of projects benefit from using a NoSQL database instead of rdbms wrapped by an ORM?
Examples:
Stackoverflow similiar sites?
Social communities?
forums?
Your question is very general. NoSQL describes a collection of database techniques that are very different from each other. Roughly, there are:
Key-value stores (Redis, Riak)
Triplestores (AllegroGraph)
Column-family stores (Bigtable, Cassandra)
Document-oriented stores (CouchDB, MongoDB)
Graph databases (Neo4j)
A project can benefit from the use of a document database during the development phase of the project, because you won't have to design complex entity-relation diagrams or write complex join queries. I've detailed other uses of document databases in this answer.
If your application needs to handle very large amounts of data, the development phase will likely be longer when you use a specialized NoSQL solution such as Cassandra. However, when your application goes into production, it will greatly benefit from the performance and scalability of Cassandra.
Very generally speaking, if an application has the following requirements:
scale horizontally
work with data model X
perform Y operations
the application will benefit from using a NoSQL solution that is geared towards storing data model X and perform Y operations on the data. If you need more specific answers regarding a certain type of NoSQL database, you'll need to update your question.
Benefits during development (e.g. easier to use than SQL, no licensing costs)?
Benefits in terms of performance (e.g. runs like hell with a million concurrent users)?
What type of NoSQL database?
Update
Key-value stores can only be queried by key in most cases. They're useful to store simple data, such as user sessions, simple profile data or precomputed values and output. Although it is possible to store more complex data in key-value pairs, it burdens the application with the responsibility of maintaining 'manual' indexes in order to perform more advanced queries.
Triplestores are for storing Resource Description Metadata. I don't anything about these stores, except for what Wikipedia tells me, so you'll have to do some research on that.
Column-family stores are built for storing and processing very large amounts of data. They are used by Google's search engine and Facebook's inbox search. The data is queried by MapReduce functions. Although MapReduce functions may be hard to grasp in the beginning, the concept is quite simple. Here's an analogy which (hopefully) explains the concept:
Imagine you have multiple shoe-boxes filled with receipts, and you want to calculate your total expenses. You invite some of your friends over and assign a person to each shoe-box. Each person writes down the total of each receipt in his shoe-box. This process of selecting the required data is the Map part.
When a person has written down the totals of (some of) his receipts, he can sum up these totals. This is the Reduce part and can be repeated multiple times until all receipts have been handled. In the end, all of your friends come together and sum up their total sums, giving you your total expenses. That's the final Reduce step.
The advantage of this approach is that you can have any number of shoe-boxes and you can assign any number of people to a shoe-box and still end up with the same result. Each shoe-box can be seen as a server in the database's network. Each friend can be seem as a thread on the server. With MapReduce you can have your data distributed across many servers and have each server handle part of the query, optimizing the performance of your database.
Document-oriented stores are explained in this question, so I won't discuss them here.
Graph databases are for storing networks of highly connected objects, like the users on a social network for example. These databases are optimized for graph operations, such as finding the shortest path between two nodes, or finding all nodes within three hops from the current node. Such operations are quite expensive on RDBMS systems or other NoSQL databases, but very cheap on graph databases.
NoSQL in the sense of different design approaches, not only the query language. It can have different features. E.g. column oriented databases are used for large amount of data warehouses, which might be used for OLAP.
Similar to my question, there you'll find a lot of resources.

Fastest and stable non-sql database? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What is the fastest and most stable non-sql database to store big data and process thousands requests during the day (it's for traffic exchange service)? I've found Kdb+ and Berkeley DB. Are they good? Are there other options?
More details...
Each day server processes > 100K visits. For each visit I need to read corresponding stats from DB, write log to DB and update stats in DB, aka 3 operations with DB per visit. Traffic is continuously increasing. Thus DB engine should be fast. From one side DB will be managed by demon written on C, Erlang or any other low-level language. From another side DB will be managed by PHP scripts.
The file system itself is faster and more stable than almost anything else. It stores big data seamlessly and efficiently. The API is very simple.
You can store and retrieve from the file system very, very efficiently.
Since your question is a little thin on "requirements" it's hard to say much more.
What about Redis?
http://code.google.com/p/redis/
Haven't try it yet did read about it and it seem to be a fast and stable enough for data storage.
It also provides you with a decent anti-single-point-failure solution, as far as I understand.
Berkely DB is tried and tested and hardened and is at the heart of many mega-high transaction volume systems. One example is wireless carrier infrastructure that use huge LDAP stores (OpenWave, for example) to process more than 2 BILLION transactions per day. These systems also commonly have something like Oracle in the mix too for point in time recovery, but they use Berkeley DB as replicated caches.
Also, BDB is not limited to key value pairs in the simple sense of scalar values. You can store anything you want in the value, including arbitrary structures/records.
What's wrong with SqlLite? Since you did explicitly state non-sql, Berkeley DB are based on key/value pairs which might not suffice for your needs if you wish to expand the datasets, even more so, how would you make that dataset relate to one another using key/value pairs....
On the other hand, Kdb+, looking at the FAQ on their website is a relational database that can handle SQL via their programming language Q...be aware, if the need to migrate appears, there could be potential hitches, such as incompatible dialects or a query that uses vendor specifics, hence the potential to get locked into that database and not being able to migrate at all...something to bear in mind for later on...
You need to be careful what you decide here and look at it from a long-term perspective, future upgrades, migration to another database, how easy would it be to up-scale, etc
One obvious entry in this category is Intersystems Caché. (Well, obvious to me...) Be aware, though, it's not cheap. (But I don't think Kdb+ is either.)
MongoDB is the fastest and best nosql database. Have a look at this performance benchmark.