If you have a data object with large features (100+), is it better to expand this data by adding columns or by storing it in row format? [closed] - database-normalization

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
So, if you can imagine a person was simultaneously enrolled in 100 different courses and had just received final grades for all courses, would it be better practice to store that information like so (wide columns):
personID
Math
Science
English
1
90
88
98
2
91
98
90
(and ...97 other columns)
Or like this:
personID
Grade
Subject
1
90
Math
1
88
Science
1
98
English
This brings the columns down significantly (from 100 to 3).

creating 100+ columns on any table is always bad idea. there is limit on no of columns in any database.
read this to get better idea. as you can think of following way.
StudentsMarkDetails
StudentsMarkDetailsID
personID
SubjectId
MarksObtained
ExamID
1
1
3
64
1
2
2
4
36
1
3
1
4
36
2
SubjectMaster
SubjectId
SubjectName
1
Maths
2
Science
3
English

Definitely option two - few fixed number of columns.
The reasons are many. Here’s a few:
maintenance: if you have 100 columns now, tomorrow you’ll have 101. Adding a column requires a schema change, which is painful
access: to get a value, you need to code the column name. This a form of maintenance problem in the code that accesses it, be it queries or app code
queries: basic queries become impossible. Write the query that returns the average grade for a student and the problem will be immediately obvious
Here are two solid rules I made for myself that I never break:
Prefer more rows over more columns.
Prefer more columns over more tables.

It is better to store the data with one row per person and per course. Why? Here are some reasons:
Different people may have different courses.
Most databases have a limit on the number of columns in a table.
You may want to store additional information per score, such as the date/time it was entered.
It is easy to add new courses or to change existing scores, if necessary.

Related

Is there any PostgreSQL magic to get results by range instead of rows? [duplicate]

This question already has answers here:
Find gaps of a sequence in PostgreSQL tables
(1 answer)
Group rows by an incrementing column in PostgreSQL
(1 answer)
Group sequential integers postgres
(3 answers)
Closed 10 months ago.
I'm sorry for the title, I don't know how to clearly summarize the problem.
That's probably why I couldn't find an answer when searching by myself.
Feel free to improve it.
Anyways, let's say I have a query returning primary id's.
SELECT id FROM ...
Instead of having results presented with one row for each id like this:
id
-----
1
2
3
45
182
183
184
I would like to know if there's any access to some internal state based on the index that would return this:
ranges
---------
1-3
45
182-184
The whole point here is NOT to have a nice presentation, I can do that.
Besides it would add a treatment after having run the query, I want the opposite.
I'd like to know if it exists some king of shortcut that would accelerate the query by not having to return each row individually.
Maybe something related to extracting data directly from the indexes used in the WHERE clause.
I'm not aware of a generic SQL way to do that but I would love to know if there's some postgres feature for this.
If the answer is "no", it's ok. I just had to ask...

reshape and merge in stata

I have three data sets:
First, called education.dta. It contains individuals(students) over many years with their achieved educations from yr 1990-2000. Originally it is in wide format, but I can easily reshape it to long. It is presented as wide under:
id educ_90 educ_91 ... educ_00 cohort
1 0 1 1 87
2 1 1 2 75
3 0 0 2 90
Second, called graduate.dta. It contains information of when individuals(students) have finished high school. However, this data set do not contain several years only a "snapshot" of the individ when they finish high school and characteristics of the individual students such as backgroung (for ex parents occupation).
id schoolid county cohort ...
1 11 123 87
2 11 123 75
3 22 243 90
The third data set is called teachers.dta. It contains informations about all teachers at high school such as their education, if they work full or part time, gender... This data set is long.
id schoolid county year education
22 11 123 2011 1
21 11 123 2001 1
23 22 243 2015 3
Now I want to merge these three data sets.
First, I want to merge education.dta and graduate.dta on id.
Problem when education.dta is wide: I manage to merge education and graduation.dta. Then I make a loop so that all the variables in graduation.dta takes the same over all years, for eksample:
forv j=1990/2000 {
gen county j´=.
replace countyj´=county
}
However, afterwards when reshaping to long stata reposts that variable id does not uniquely identify the observations.
further, I have tried to first reshape education.dta to long, and thereafter merge either 1:m or m:1 with education as master, using graduation.dta.
However stata again reposts that id is not unique. How do I deal with this?
In next step I want to merge the above with teachers.dta on schoolid.
I want my final dataset in long format.
Thanks for your help :)
I am not certain that I have exactly the format of your data, it would be helpful if you gave us a toy dataset to look at using dataex (and could even help you figure out the problem yourself!)
But to start, because you are seeing that id is not unique, you need to figure out why there might be multiple ids in any of the datasets. Can someone in graduate.dta or education.dta appear more than once? help duplicates will probably be useful to explore the data in this way.
Because you want your dataset in long format I suggest reshaping education.dta to long first, then doing something like merge m:1 id using "graduate.dta" (once you figure out why some observations are showing up more than once) and then, finally something like merge 1:1 schoolid year using "teacher.dta" and you will have your final dataset.

Crystal reports crosstab customization

I will try to explain my problem as simple as I can.
I would like to have a crosstab in my report for the questions/answers of a questionnaire. Rows for the questions, columns for the answers and in every cell the sum of the answers.
However, every question has its own group of answer and I would like something like
Very So an So Don't Ask
How happy are you? 10 9 1
Love it Nah It's ok
Do you like rain? 1 3 1
The number represent how many people answered Very or Love it or..you get the point.
What I need is to know if this is possible and if anyone can point out some guidelines to do that!Thank you in advance!
EDIT: (hope this helps) I have a FeedbackT table, and the AnswerT that contains some anwers that don't concern me right now.The AnswerT table is connected with the Answer2T table (I know, it's a mess) and this table is connected with the ActualAnswerT which contains the answers (very low, low, medium...) and another table QuestionQroupT that contains some info about the group that the answers belong to.
The ActualAnswerT contains as many rows as the people who have taken a questionnaire. If 5 people answered a questionnaire, for Question 'A' I could have 3 'Low', 2 'High' from AnswerGroup1, for Question 'B' I could have 1 'No', 3 'Yes', 1 'I don't know' from AnswerGroup2 and so on..
If you use a union query in the order of question, answer1, answer2, answer3 then you'd have a consistent virtual table you can create a cross tab from.

Select value in table in tableau

I am quite new to Tableau, so have patience with me :)
I have two tables,
Table one (T1) contains all my data with the first row being Year-Week, like 2014-01, 2014-02, and so on. Quick question regarding this, how do I make Tableau consider this as a date, and not as string?
T1 contains a lot of data that looks like this:
YearWeek Spend TV Movies
2014-01 5000 42 12
2014-02 4800 41 32
2014-03 2000 24 14
....
2015-24 7000 45 65
I have another table (T2) that contains information regarding some values I want to multiply with the T1 columns, T2 looks like:
NAME TV Movies
Weight 2 5
Response 6 3
Ad 7 2
Version 1 0
I want to create a calculated field (TVNEW) that takes the values from T1 of TV, and adds Response(TV) to it, and times it with the weight(TV),
So something like this:
(T1[TV]+T2[TV[Response]])*T2[TV[Weight]]
This looks like this for the rows:
(42+6)*2
(41+6)*2
(24+6)*2
...
(45+6)*2
So the calculation should take a specific value from T2, and do the calculation for each value in T1[TV]
Thanks in advance
The easy answer to your question will be: No, not natively.
What you want to do sounds like accessing a 2 dimensional array and that's not really the intention of Tableau. Additionally you have 2 completely independent tables without a common attribute to JOIN on. Tableau is just not meant to work that way.
I cannot think of a way to dynamically extract that value (I assume your example is just that, an example; and in your case you don't just use two values in the calculation, otherwise you could create 2 parameters that you can use in your calculated fields)
When I look at your tables it looks like you could transpose and join them that they ideally look like this: (Edit: Comment says transposing is not an option)
Medium Value YearWeek Spend
Movies 12 2014-01 5,000
Movies 32 2014-02 4,000
Movies 14 2014-03 2,000
Movies 65 2015-24 7,000
TV 42 2014-01 5,000
TV 41 2014-02 4,000
TV 24 2014-03 2,000
TV 45 2015-24 7,000
and
Medium Weight Response Ad Version
TV 2 6 7 1
Movies 5 3 2 0
Depending on the systems you work with you could already put it in one CSV or table so you wouldn't have to do a JOIN in Tableau.
Now you can create the first table natively in Tableau (from Version 9.0 onwards), if you open your data source, in the Data Source Preview choose the columns TV and Movies, click on the small triangle and then on Pivot. (At this point you can also choose the YearWeek column click on the triangle and Split to create a seperate field for Year and Week. You won't be able to assign the type date to it put that shouldn't give you any disadvantages.)
For the second table I can think of two possibilities:
you have access to a tool that can transpose your table (Excel can do that see: Convert matrix to 3-column table ('reverse pivot', 'unpivot', 'flatten', 'normalize') Once you have done that you can open it in Tableau and join the two tables on Medium
You could create calculated fields depending on the medium:
Field: Weight
CASE [Medium]
WHEN 'TV' THEN 2
WHEN 'Movies' THEN 5
END
And accordingly for Response, Ad and Version
Obviously that is only reasonable if you really just need a handfull of values.
Once this is done it's only a matter of creating a calculated field with
([Value]+[Response])*[Weight]
And this will calculate all the values for your table

Join two data sets by year in Tableau [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have two datasets which looks like:
# 2013_data.tsv
year state age
2013 CA 22,5
2013 OH 19,3
2013 IL 45,5
2013 TX 33
# 2012_data.tsv
year state age
2012 CA 23
2012 OH 21,5
2012 CA 44,3
2012 TX 34,4
I want to use year as a pager on the Tableau map.
How can I join this separate data sources?
You could blend on year, but if the year is always different in each data source then the blend will not match on anything and you will get no results.
I am guessing that each data source (tsv file) has the same format (same number of columns and column names). In that case you can extract each data source with tableau desktop and then add the data from each source to get a master extract. (you are basically appending the data extracts):
and you will get all the data in one extract:
from here it is simple to combine the years in one visualization.
Also, since this is SO, I will point out that you can do this programatically with the extract API (see https://www.tableau.com/learn/tutorials/on-demand/extract-api-introduction).
Your best approach in this case is to put all the data into one table before using tableau. (It looks like what you really want is a union instead of a join)
Another approach is to put two tables in the same database, or two tabs on the same spreadsheet,and use custom SQL to union all them together. Or you can append multiple tables into a single Tableau extract as emh described.
If you are conceptually joining tables instead of unioning them, you could also use data blending.