Use SQL to export Parent/Child rows into a flat file - tsql

With an Orders table:
(OrderID, date, customerID, status, etc)
and an OrderDetails table:
(ParentID, itemID, quantity, price, etc)
I would like to create a SQL Query that will export a CSV flat file with Order and OrderDetail rows interspersed. For instance, output might look like this (H and D indicate "Header" and "Detail" respectively.):
"H",2345,"6/1/09",856,"Shipped"
"D",2345,52,1,1.50
"D",2345,92,2,3.25
"D",2345,74,1,9.99
"H",2346,"6/1/09",474,"Shipped"
"D",2346,74,1,9.99
"D",2346,52,1,1.50
Not sure where to even start with this. Any ideas? TIA.

You'll want to take advantage of the fact that union all will honor the order by clause at the end on the entire result set. Therefore, if you order by the second column (2!) ascending, and the first column (1!) descending, you'll get the header row, then the detail rows underneath that.
Also, make sure that you have the same number of columns in the two queries. They don't have to be of the same data type, since you're exporting to CSV, but they do have to be the same number. Otherwise, the union all won't be able to pile them onto each other. Sometimes, you'll just have to pad columns with null if you need extra ones, or '' if you don't want the word null in your CSV.
select
'H',
OrderID,
Date,
CustomerID,
Status
from
Headers
union all
select
'D',
ParentID,
ItemID,
Quantity,
Price
from
Details
order by
2 asc, 1 desc

Related

How to limit to just one result per condition when looking through multiple OR/IN conditions in the WHERE clause (Postgresql)

For Example:
SELECT * FROM Customers
WHERE Country IN ('Germany', 'France', 'UK')
I want to LIMIT 1 for each of the countries in my IN clause so I only see a total of 3 rows: One customer for per country (1 German, 1 France, 1 UK). Is there a simple way to do that?
Normally, a simple GROUP BY would suffice for this type of solution, however as you have specified that you want to include ALL of the columns in the result, then we can use the ROW_NUMBER() window function to provide a value to filter on.
As a general rule it is important to specify the column to sort on (ORDER BY) for all windowing or paged queries to make the result repeatable.
As no schema has been supplied, I have used Name as the field to sort on for the window, please update that (or the question) with any other field you would like, the PK is a good candidate if you have nothing else to go on.
SELECT * FROM
(
SELECT *
, ROW_NUMBER() OVER(PARTITION BY Country ORDER BY Name) AS _rn
FROM Customers
WHERE Country IN ('Germany', 'France', 'UK')
)
WHERE _rn = 1
The PARTITION BY forces the ROW_NUMBER to be counted across all records with the same Country value, starting at 1, so in this case we only select the rows that get a row number (aliased as _rn) of 1.
The WHERE clause could have been in the outer query if you really want to, but ROW_NUMBER() can only be specified in the SELECT or ORDER BY clauses of the query, so to use it as a filter criteria we are forced to wrap the results in some way.

Speed up query with multiple conditions

I have a sql table of 5,000,000 entries with 5 columns over which I need to make a 4-condition (TEXT type) select. The table has columns id, name, street, city, zip and my select looks like this
SELECT id FROM register WHERE name=%s AND zip=%s AND city=%s AND street=%s
Problem is, i need to speed up this query, because i need to do 80 000 of this queries and now it takes half a day.
The %s placeholders imply that all four of your columns are varchar or text. If so, then the following index might help:
CREATE INDEX idx ON register (name, zip, city, street, id)
The first four parts of the index cover the WHERE clause, and the fifth part covers the id column which is needed for the SELECT clause.

Sort in non-alpabetical order in postgresql

I'm automating a process at work where the output needs to be in a certain non-alphabetical order depending on a name (internal_product, type text) in addition to a number (type text). First I'm running a subquery where I collect information from four slightly different tables using joins. I then append the result with a union before the outer group by sums units and amounts. The pseudo-query is as follows:
select name, number, internal_product, sum(units), sum(amount) from (
select fields, sum(x)
from t1
join join-conditions
join join-conditions
group by name, number, internal_product
union
.....
select fields, sum(x)
from t5
join join-conditions
join join-conditions
group by name, number, internal_product
) as foo
group by name, number, internal_product
order by number, name;
I tried to change a column in a helper table used in one of the joins to an enum type since it is used in the outer group by (SO-thread) but the column type of course needs to be the same in the join-condition so the modified query was not valid. There are 30 product names so I would like to avoid using a CASE name as suggested by gbn and Guffa.
Are there other ways to apply a certain order in a order by?
It might be overkill or complicated for your case, but you could create a custom collation in postgres to sort the way you want. Have a look at the documentation.
https://www.postgresql.org/docs/11/collation.html

Postgres: Distinct but only for one column

I have a table on pgsql with names (having more than 1 mio. rows), but I have also many duplicates. I select 3 fields: id, name, metadata.
I want to select them randomly with ORDER BY RANDOM() and LIMIT 1000, so I do this is many steps to save some memory in my PHP script.
But how can I do that so it only gives me a list having no duplicates in names.
For example [1,"Michael Fox","2003-03-03,34,M,4545"] will be returned but not [2,"Michael Fox","1989-02-23,M,5633"]. The name field is the most important and must be unique in the list everytime I do the select and it must be random.
I tried with GROUP BY name, bu then it expects me to have id and metadata in the GROUP BY as well or in a aggragate function, but I dont want to have them somehow filtered.
Anyone knows how to fetch many columns but do only a distinct on one column?
To do a distinct on only one (or n) column(s):
select distinct on (name)
name, col1, col2
from names
This will return any of the rows containing the name. If you want to control which of the rows will be returned you need to order:
select distinct on (name)
name, col1, col2
from names
order by name, col1
Will return the first row when ordered by col1.
distinct on:
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.
The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.
Anyone knows how to fetch many columns but do only a distinct on one column?
You want the DISTINCT ON clause.
You didn't provide sample data or a complete query so I don't have anything to show you. You want to write something like:
SELECT DISTINCT ON (name) fields, id, name, metadata FROM the_table;
This will return an unpredictable (but not "random") set of rows. If you want to make it predictable add an ORDER BY per Clodaldo's answer. If you want to make it truly random, you'll want to ORDER BY random().
To do a distinct on n columns:
select distinct on (col1, col2) col1, col2, col3, col4 from names
SELECT NAME,MAX(ID) as ID,MAX(METADATA) as METADATA
from SOMETABLE
GROUP BY NAME

Calculate Mode - "Highest frequency row" DB2

What would be the most efficient way to calculating the mode across tables with joins in DB2..
I am trying to get the value with the most frequency(count) for a given column(ID - candidate key for joined table) on a given date.
The idea is to get the most common (value) from the table which has different (value)s for some accounts (for the same ID and date). We need to make it unique for use in another table.
You can use common table expressions [CTE's], indicated by WITH, to break the logic down into logical steps. First we'll build the summary rows, then we'll assign a ranking to the rows within each group, then pick out the ones that with the highest count of records.
Let's say we want to know which flavor of each item sells the most frequently on each date (perhaps assuming a record is quantity one).
WITH s as
(
SELECT itemID, saleDate, flavor, count(*) as tally
FROM sales
GROUP BY itemID, saleDate, flavor
), r as
(
SELECT itemID, saleDate, flavor, tally,
RANK() OVER (PARTITION BY itemID, saleDate ORDER BY tally desc) as pri
FROM s
)
SELECT itemID, saleDate, flavor, tally
FROM r
WHERE pri = 1
Here the names "s" and "r" refer to the result set from their respective CTE's. These names can then be used as to represent a table in another part of the statement.
The pri column will have the RANK() of tally value on the summary row from the first section "s" within the window of itemID and saleDate. Tally is descending, because we want the largest value first, which will get a RANK() of 1. Then in the main SELECT we simply pick those summary records which were first in their partition.
By using RANK() or DENSE_RANK() we could get back multiple flavors for an itemID, saleDate, if they are tied for first place. This could be eliminated by replacing RANK() with ROW_NUMBER(), but it would arbitrarily pick one of the tied flavors as a winner, and this may not be correct answer for the problem at hand.
If we had a sales quantity column in the table, we could replace COUNT(*) with SUM(salesqty) and find what had sold the most units.