how to create a formatted output directly using databricks SQL query - databricks-sql

We are using select query to fetch data by joining tables in Databricks SQL. With the obtained dataset, we also need to create a header record(which contains static information) and a trailer record (containing details which are dependent on the join output).
Example is given as follows -
Let's assume that there are two Databricks SQL tables "class" and "student", both are joined by a common column " student_id" we are using the following query to obtain marks of all students in each class -
Select
a.student_id
, a.student_name
, a.student_age
, b.class
, b.roll_no
, b.marks
from student as a
inner join class as b
On a.student_id = b.student_id
From the join dataset I need to create the following final output -
Header 202205 some_static_text
Student01 Tom 23 01 01 50
Student02 Dick 21 01 02 40
Student03 Harry 22 01 03 30
Trailer some_text 120
where the last field in trailer reacord(120) is the sum of the last field(b.marks) in the SQL join output.
Is it possible to achieve the entire final output with a single SQL query and without using any other tool/script?
To consider here - our team has only SELECT permission on the databricks tables.
Any help is appreciated.
Thanks.

Related

Can I add two columns of data from one column into a new table?

Im looking to create a table using postgresql, with two of the columns being sales from 2012 and sales from 2013. Currently, I have the following code.
create table sales as
select customer.customerid,
CONCAT(customer.firstname, ' ', customer.lastname) AS full_name,
invoice.invoicedate(where invoice.invoicedate > 'Jan 01 2012' AND invoice.invoicedate < 'Dec 31 2012') AS "2012_Sales",
invoice.invoicedate(where invoice.invoicedate > 'Jan 01 2013' AND invoice.invoicedate < 'Dec 31 2013') AS "2013_Sales" ,
invoice.total
FROM customer
INNER JOIN invoice
ON customer.customerid = invoice.customerid;
Basically I want one column to contain sales from 2012, and another column to contain sales 2013. Im trying to keep it within one query or I would just add columns to the table using the alter table functions.
Any help is greatly appreciated.

PostgreSQL how do I COUNT with a condition?

Can someone please assist with a query I am working on for school using a sample database from PostgreSQL tutorial? Here is my query in PostgreSQL that gets me the raw data that I can export to excel and then put in a pivot table to get the needed counts. The goal is to make a query that counts so I don't have to do the manual extraction to excel and subsequent pivot table:
SELECT
i.film_id,
r.rental_id
FROM
rental as r
INNER JOIN inventory as i ON i.inventory_id = r.inventory_id
ORDER BY film_id, rental_id
;
From the database this gives me a list of films (by film_id) showing each time the film was rented (by rental_id). That query works fine if just exporting to excel. Since we don't want to do that manual process what I need is to add into my query how to count how many times a given film (by film_id) was rented. The results should be something like this (just showing the first five here, the query need not do that):
film_id | COUNT of rental_id
1 | 23
2 | 7
3 | 12
4 | 23
5 | 12
Database setup instructions can be found here: LINK
I have tried using COUNTIF and CASE (following other posts here) and I can't get either to work, please help.
Did you try this?:
SELECT
i.film_id,
COUNT(1)
FROM
rental as r
INNER JOIN inventory as i ON i.inventory_id = r.inventory_id
GROUP BY i.film_id
ORDER BY film_id;
If there can be >1 rental_id in your data you may want to use COUNT(DISTINCT r.rental_id)

Merge multiple tables having different columns

I have 4 tables each tables has different number of columns as listed below.
tableA - 34
tableB - 47
tableC - 26
tableD - 16
Every table has a common column called id, now i need to perform a union the problem is since the columns are not equal length and entirely different i can't do a union.
Based on id only i can get the details from every table, so how should i approach this.
What is the optimized way to solve this, tried full join but that takes too much time.
Tried so far
SELECT * FROM tableA FULL JOIN
tableB FULL JOIN
tableC FULL JOIN
tableD
USING (id)
WHERE tableA.id = 123 OR
tableB.id = 123 OR
tableC.id = 123 OR
tableD.id = 123
Snowflake does have a declared limitation in use of Set operators (such as UNION):
When using these operators:
Make sure that each query selects the same number of columns.
[...]
However, since the column names are well known, it is possible to come up with a superset of all unique column names required in the final result and project them explicitly from each query.
There's not enough information in the question on how many columns overlap (47 unique columns?), or if they are all different (46 + 33 + 25 + 15 = 119 unique columns?). The answer to this would determine the amount of effort required to write out each query, as it would involve adapting a query from the following form:
SELECT * FROM t1
Into an explicit form with dummy columns defined with acceptable defaults that match the data type on tables where they are present:
SELECT
present_col1,
NULL AS absent_col2,
0.0 AS absent_col3,
present_col4,
[...]
FROM
t1
You can also use some meta programming with stored procedures to "generate" such an altered query by inspecting independent result's column names using the Statement::getColumnCount(), Statement::getColumnName(), etc. APIs and forming a superset union version with default/empty values.

Querying a table using KDB

How can i query a table using kdb
I created a table using the code following
q)name:`Iain`Nathan`Ryan`Ross
q)number:98 42 126 98
q)table:([] name; number)
This creates a table:
name number
Iain 98
Nathan 42
Ryan 126
Ross 98
How can a query this table to return results of number which is equal to "98"
Or name which is equal to Iain
This is what I had been using
You can do this using a Q-SQL statement
select from table where number=98
The documentation for these this form of querying can be found at https://code.kx.com/q/ref/qsql/

Export data from db2 with column names

I want to export data from db2 tables to csv format.I also need that first row should be all the column names.
I have little success by using the following comand
EXPORT TO "TEST.csv"
OF DEL
MODIFIED BY NOCHARDEL coldel: ,
SELECT col1,'COL1',x'0A',col2,'COL2',x'0A'
FROM TEST_TABLE;
But with this i get data like
Row1 Value:COL1:
Row1 Value:COL2:
Row2 Value:COL1:
Row2 Value:COL2:
etc.
I also tried the following query
EXPORT TO "TEST.csv"
OF DEL
MODIFIED BY NOCHARDEL
SELECT 'COL1',col1,'COL2',col2
FROM ADMIN_EXPORT;
But this lists column name with each row data when opened with excel.
Is there a way i can get data in the format below
COL1 COL2
value value
value value
when opened in excel.
Thanks
After days of searching I solved this problem that way:
EXPORT TO ...
SELECT 1 as id, 'COL1', 'COL2', 'COL3' FROM sysibm.sysdummy1
UNION ALL
(SELECT 2 as id, COL1, COL2, COL3 FROM myTable)
ORDER BY id
You can't select a constant string in db2 from nothing, so you have to select from sysibm.sysdummy1.
To have the manually added columns in first row you have to add a pseudo-id and sort the UNION result by that id. Otherwise the header can be at the bottom of the resulting file.
Quite old question, but I've encountered recently the a similar one realized this can be achieved much easier in 11.5 release with EXTERNAL TABLE feature, see the answer here:
https://stackoverflow.com/a/57584730/11946299
Example:
$ db2 "create external table '/home/db2v115/staff.csv'
using (delimiter ',' includeheader on) as select * from staff"
DB20000I The SQL command completed successfully.
$ head /home/db2v115/staff.csv | column -t -s ','
ID NAME DEPT JOB YEARS SALARY COMM
10 Sanders 20 Mgr 7 98357.50
20 Pernal 20 Sales 8 78171.25 612.45
30 Marenghi 38 Mgr 5 77506.75
40 O'Brien 38 Sales 6 78006.00 846.55
50 Hanes 15 Mgr 10 80659.80
60 Quigley 38 Sales 66808.30 650.25
70 Rothman 15 Sales 7 76502.83 1152.00
80 James 20 Clerk 43504.60 128.20
90 Koonitz 42 Sales 6 38001.75 1386.70
Insert the column names as the first row in your table.
Use order by to make sure that the row with the column names comes out first.