will select always start with the first (oldest) record in a table - postgresql

I made a table like
record
----------+
1 | one |
----------+
2 | two |
----------+
3 | three |
----------+
4 | four |
----------+
5 | five |
----------+
There isn't an ID column, those are just the row numbers I see beside each row in DBVisualizer. I added the rows in the order 1, 2, 3, 4, 5. Is this"
SELECT
*
FROM
sch.test_table limit 1;
certain always to get one, ie start with the "oldest" record? Or is will that change in large datasets?

No, as per the SQL specification the order is indeterminate when not using order by. You're working with a set of data, and sets are not ordered. Also, the size of the set should not matter.
The Postgresql documentation says:
If ORDER BY is not given, the rows are returned in whatever order the
system finds fastest to produce.
Which means that the rows might come back in the expected order, or they might not - there are no guarantees.
The bottom line is that if you want deterministic results you have to useorder by.

Related

SQL 3 NF normalization

I have a table with 4 columns to display the bill of buying an item, column 1 is the primary key, column 2 and 3 are the price and the amount of the item, column 4 is the sum of the price, which is calculated by multiply the value in column 2 and 3. Do i need to delete column 4 to make sure that there is no transitive functional dependency in the table.
+---------+-------+--------+------+
| bill_id | price | amount | sum |
+---------+-------+--------+------+
| 1 | 2 | 5 | 10 |
| 2 | 3 | 5 | 15 |
+---------+-------+--------+------+
No, you don't need to. NFs are guidelines - very important and recommended, but just guidelines. While violation of 1NF is almost always a bad design decision, you may choose to violate 3NF and even 2NF provided you know what you are doing.
In this case, depending on the context, you may choose not to have a physycal "sum" column, and have the value calculated on the fly, although this could easily raise performance issues. But please don't even think of creating a new table which would have all possible combinations of quantity and unit price as a compound PK!
If you look at any ERP on the market you will see they store quantity, unit price, and total amount (and much more!) for every line of the quotes, orders, and invoices they handle.

Tableau - Return Name of Column with Max Value

I am new to Tableau visualization and need some help.
I have a set of shipping lanes which have whole numbers values based on the duration for the shipment.
Ex:
| Lane Name | 0 Day | 1 Day | 2 Day | 3 Day | 4 Day |
| SFO-LAX | 0 | 30 | 60 | 10 | 0 |
| JFK-LAX | 0 | 10 | 20 | 50 | 80 |
For each Lane Name, I want to return the column header based on the max value.
i.e. for SFO-LAX I would return '2 Day', for JFK-LAX I would return '4 Day', etc.
I then want to set this as a filter to only show 2 Day and 3 Day results in my Tableau data set.
Can someone help?
Two steps to this.
The first step is pretty easy, pivot your data. Read the Tableau help to learn how to PIVOT your data for analysis - i.e. make it look to Tableau as a longer 3 column data set with Lane, Duration, Value as the 3 columns. Tableau's PIVOT feature will let you view your data in that format (which makes analysis much easier) without actually changing the format of your data files.
The second step is a bit trickier than you'd expect at first glance, and there are a few ways to accomplish it. The Tableau features that can be used for this are LOD calcs, table calcs, filters and possibly sets. These are some of the more powerful but complicated parts of Tableau, so worth your time to learn about, but expect to take a while to spin up on them.
The easiest solution is probably to use one of the RANK() function - start as a quick table calc. Set your partitioning and addressing as desired so that the ranks are computed for the blocks of data that you desire - say partitioning on Lane and addressing or computing by Duration. Then when you are happy with the ranks you see, move the rank calculation to the filter shelf and only display data where rank = 1.
This is a quick solution once you get the hang of it, but it can get slow for very large data sets since the rank calculations are done on the client side, requiring fetching all the data that you end up not displaying. If performance becomes an issue, you might want to look at other solutions to do more of the calculations server side - possibly using LOD calcs or analytic aka windowing queries invoked from custom SQL

Fetching the frequency of a unique number within an array for a list of db queries in sqlalchemy/postgresql

I am having a hard time figuring out a way to find a solution to the above question. Here are the details.
Let's say we have Table X which contains different products, and each product has a unique integer id and a unique code. We also have Table Y, which contains different variants of each product in Table X who have slight variations in a product name, length, width etc. Therefore there's a one-many relationship defined between Table X and Table Y.
The frontend displays all these product variants and allows them to be chosen from a list and export all of the chosen product details to a pdf. However, there's a weird way of choosing them, wherein if a user clicks on a product, then all the product variants who have the same unique code are selected if they are on the same list. That is, basically whenever a product variant is selected, the full product gets selected along with all its variants in the list. However, if the user wants, they can deselect each product variant individually as they please.
Table Z contains a log of all the product variant ids exported. Hence it looks something like this
id | exported_at | exported_by | product_variant_ids
1 | Some datetime | Some user | {1, 2, 3, 4}
2 | Some datetime | Some user | {3, 4}
3 | Some datetime | Some user | {6, 8, 9}
4 | Some datetime | Some user | {1, 6, 7}
5 | Some datetime | Some user | {3, 5, 7}
And so on..
Here you will observe that:
1. On each row, a product variant id can appear only once in the array
2. However, a product variant id can appear at different rows
I want to calculate, for each unique product variant id present across all the rows, the sum of the frequency of them appearing across each row.
So for example, in the dummy table above, product variant id 1 appears exactly twice in the whole table, in row 1 and row 4. Similarly, product variant id 3, in rows 1, 2, and 5.
I have tried to calculate it naively, something like
Naive algorithm in words:
From a set of all unique ids present across the table, for each
row check if each of the unique id in the set is present in each single
row and increase the counter each time it does.
I don't know how to write a subquery for it and hence, I was not able to tryout any queries on the postgres console.
It results in a very slow operation after only 15-20 logs when using an ORM like SQLAlchemy.
Can anyone guide me in the right direction what may be a more efficient solution?

How to count frequency of columns in row in a typedpipe in scalding?

I'm currently working on a mapreduce job using scalding. I'm trying to threshold based on how many times I see a particular value among the rows in my typedpipe. For example, if I had these rows in my typedpipe:
Column 1 | Column 2
'hi' | 'hey'
'hi' | 'ho'
'hi' | 'ho'
'bye' | 'bye'
I would want to append to each row the frequency I saw the value in column 1 and column 2 in every row. Meaning the output would look like:
Column 1 | Column 2 | Column 1 Freq | Column 2 Freq
'hi' | 'hey'| 3 | 1
'hi' | 'ho' | 3 | 2
'hi' | 'ho' | 3 | 2
'bye' | 'bye' | 1 | 1
Currently, I'm doing that by grouping the typed pipe by each column, like so:
val key2Freqs = input.groupBy('key2) {
_.size('key2Freq)
}.rename('key2 -> 'key2Right).project('key2Right, 'key2Freq);
Then joining the original input with key2Freqs like so:
.joinWithSmaller('key2 -> 'key2Right, key2Freqs, joiner = new LeftJoin)
However, this is really slow and seems to me to be pretty inefficient for what is essentially a pretty simple task. It gets especially long b/c I have 6 different keys that I want to get these values for, and I am currently mapping and joining 6 different times in my job. There must be a better way to do this, right?
If the number of the different values in each column is small enough to fit them all into memory, you could .map your columns into Map[String,Int], and then .groupAll.sum to count them all in one go (I am using the "typed api" notation, don't quite remember how exactly this is done in the fields api, but you get the idea). You'll need to use the MapMonoid from algebird, or just write your own if you don't want to add a dependency for this one thing, it is not hard.
You'd then end up with a pipe, containing a single entry for the resulting Map. Now, you can get your original pipe, and do .crossWithTiny to bring the map with counts into it, and then .map to extract individual counts.
Otherwise, if you can't keep all that in memory, then what are you doing now seems like it is the only way ... unless you are actually looking for an approximation of "top hitters", rather than exact counts of the entire universe ... in which case, check out algebird's SketchMap.

Cross tab summary fields don't restrict by column

so I'm working with Crystal Reports 10 and was looking at the cross tab to try and have a nice and neat table of my information. I'm trying to make a report where for each item (as a row), the columns will be the different sizes it comes in and the value of that cell will be the quantity.
So something that looks like this:
Small | Medium | Large
Item 1 1 | 5 | 10
Item 2 5 | 10 | 15
Using the cross tab though, the quantity field I have has to be totalled, averaged, etc. so I can't get the specific breakdown for each size in a nice table like that. Is there any way to tweak the Cross Tab to do this or is there another tool in Crystal Reports that would let me have the quantities per size in that organized fashion?
Thanks for any help you guys can give.
Update:
The cross tab I have tried gives me something that looks like this
Small | Medium | Large
Item 1 16 | 16 | 16
Item 2 30 | 30 | 30
If I put the values in the details section as separate fields, I'm able to get the values to match up properly, but its not the right format. It comes out like this
Item 1 | Small | 1
Item 1 | Medium| 5
Item 1 | Large | 10
Create a Cross-tab
Add {table.size} to the Columns
Add {table.item} to the Rows
Add {table.quantity} to Summarized Fields