Merging two data sets by ID in SAS (Data sets inside Library) - merge

Okay so I have two data sets, one is called Customer and other other CustomerOrder. They are linked by 'CustomerID.' I have both data sets in a sas library referenced as 'NewData.' I want to know how you would write the code to merge these two tables using the library reference and the CustomerID in both sets?
Thank you!

This would be the merge in Proc Sql (creating a temporary table called tempCust):
proc sql noprint;
create table work.tempCust as
select * from
NewData.CustomerOrder co, NewData.Customer cust
Where co.CustomerID = cust.CustomerID;
quit;

Related

Is there a way to describe an external/spectrum table via redshift?

In AWS Athena you can write
SHOW CREATE TABLE my_table_name;
and see a SQL-like query that describes how to build the table's schema. It works for tables whose schema are defined in AWS Glue. This is very useful for creating tables in a regular RDBMS, for loading and exploring data views.
Interacting with Athena in this way is manual, and I would like to automate the process of creating regular RDBMS tables that have the same schema as those in Redshift Spectrum.
How can I do this through a query that can be run via psql? Or is there another way to get this via the aws-cli?
Redshift Spectrum does not support SHOW CREATE TABLE syntax, but there are system tables that can deliver same information. I have to say, it's not as useful as the ready to use sql returned by Athena though.
The tables are
svv_external_schemas - gives you information about glue database mapping and IAM roles bound to it
svv_external_tables - gives you the location information, and also data format and serdes used
svv_external_columns - gives you the column names, types and order information.
Using that data, you could reconstruct the table's DDL.
For example to get the list of columns and their types in the CREATE TABLE format one can do:
select distinct
listagg(columnname || ' ' || external_type, ',\n')
within group ( order by columnnum ) over ()
from svv_external_columns
where tablename = '<YOUR_TABLE_NAME>'
and schemaname = '<YOUR_SCHEM_NAME>'
the query give you the output similar to:
col1 int,
col2 string,
...
*) I am using listagg window function and not the aggregate function, as apparently listagg aggregate function can only be used with user defined tables. Bummer.
I had been doing something similar to #botchniaque's answer in the past, but recently stumbled across a solution in the AWS-Labs' amazon-redshift-utils code package that seems to be more reliable than my hand-spun queries:
amazon-redshift-utils: v_generate_external_tbl_ddl
If you don't have the ability to create a view backed with the ddl listed in that package, you can run it manually by removing the CREATE statement from the start of the query. Assuming you can create it as a view, usage would be:
SELECT ddl
FROM admin.v_generate_external_tbl_ddl
WHERE schemaname = '<external_schema_name>'
-- Optionally include specific table references:
-- AND tablename IN ('<table_name_1>', '<table_name_2>', ..., '<table_name_n>')
ORDER BY tablename, seq
;
They added show external table now.
SHOW EXTERNAL TABLE external_schema.table_name [ PARTITION ]
SHOW EXTERNAL TABLE my_schema.my_table;
https://docs.aws.amazon.com/redshift/latest/dg/r_SHOW_EXTERNAL_TABLE.html

How to create CSV and save it in a variable for further processing in postgresql?

Facing kind of a mini challenge here today.
I want to create CSV string from a column in a table in postgresSQL using a SQL query inside a stored function and want to be able to store into another table as single value (and do further processing on that table).
My database engine is postgreSQL.
I have seen lots of examples allowing the user to use COPY TO and COPY FROM but they either return to STDOUT or save to a file.
Copy (Select id From product limit 10) To STDOUT With CSV DELIMITER ',';
Source Data:
Product
id | Name
10 | Product1
21 | Product1
34 | Product1
45 | Product1
17 | Product1
Required/Target Data:
TempTable
value
10,21,34,45,17
Neither of above is useful to my requirement. I want to be able to store the generated CSV into another column of another table.
Similar Code for SQL Server:
I used to do this in SQL Server using the following code.
CREATE FUNCTION [dbo].[CreateCSV] (#MyXML XML)
RETURNS VARCHAR(MAX)
BEGIN
DECLARE #listStr VARCHAR(MAX);
SELECT
#listStr =
COALESCE(#listStr+',' ,'') +
c.value('#Value[1]','nvarchar(max)')
FROM #myxml.nodes('/row') as T(c)
RETURN #listStr
END
In SQL Server, I would generate the CSV by calling the CreatCsv() function within a stored procedure. I am trying to replicate the process in postgresql.
I must admit i am new to PostgreSQL so i need your help in this.
Appreciate a helpful response.
Thanks
Steve
Thanks #a_horse_with_no_name
Turns out i needed
SELECT string_agg(id,',') FROM (Select cast (id as varchar(100)) From product limit 10) AS tab;
Thanks for helping me with that. :)

Using SAS to insert records into DB2 database

To give a background, I am using
- base SAS in mainframe (executed by JCL) and
- DB2 as the database.
I have the list of keys to read DB in a mainframe dataset. I understood that we can join a sas dataset with a DB2 table to read as follows.
%LET DSN=DSN;
%LET QLF=QUALIFIER;
PROC SQL;
CONNECT TO DB2(SSID=&DSN);
CREATE TABLE STAFFTBL AS
(SELECT * FROM SASDSET FLE,
CONNECTION TO DB2
(SELECT COL1, COL2, COL3
FROM &QLF..TABLE_NAME)
AS DB2 (COL1, COL2, COL3)
WHERE DB2.COL1 = FLE.COL1);
DISCONNET FROM DB2;
%PUT &SQLXMSG;
QUIT;
can someone suggest me, if I have a dataset with list of values to be inserted in a mainframe dataset, how should we proceed.
We can read the mainframe dataset and get the values in a SAS dataset. But I am not able to guess on how to use the sas dataset to insert values to DB2.
I know we can do it using COBOL. But I am willing to learn if it is possible using SAS.
Thanks!
Solution:
Have to assign library to write to DB. Please refer to the SAS Manual here
Your above query creates a local SAS dataset in the Work library or wherever your default library is declared. This table is not connected to your backend DB2 database but simply a copy used as import into SAS.
Consider establishing a live connection using an ODBC SAS library. If not ODBC, use the DB2 API SAS has installed. Once connected all tables in specified database will emerge as available SAS datasets in a SAS library and these are not imported copies but live tables. Then run any proc sql append or use proc.append to insert records to table from SAS.
Below are generic examples with DSN or non-DSN which you can modify according to your credentials or database driver type.
* WITH DSN;
libname DBdata odbc datasrc="DSN Name" user="username" password="password";
* WITH DRIVER (NON-DSN) - CHECK DRIVER INSTALLATION;
libname DBdata odbc complete="driver=DB2 Driver; Server=servername;
user=username; pwd=password; database=databasename;";
Append procedures:
* WITH SQL;
proc sql;
INSERT INTO DBdata.tableName (col1, col2, col3)
SELECT col1, col2, col3 FROM SASDATASET;
quit;
* WITH APPEND (ASSUMING COLUMNS MATCH TOGETHER);
proc datasets;
append base = DBdata.tableName
data = SASDATASET
force;
quit;
NOTE: Be very careful not to unintentionally add, modify, or delete any table in the SAS ODBC library as these datasets are live tables, so such changes will reflect in backend DB2 database. When finished with work, do not delete the library (or all tables will be cleaned out), simply unassign it from environment:
libname DBdata clear;
Provided that you have the necessary write access, you should be to do this via a proc sql insert into statement. Alternatively, if you can access the db2 table via a library, it may be possible to use a data step with both a modify and an output / replace statement.

Stored procedure get column information does not return anything?

I am using entity framework with a stored procedure, in which I am generating query dynamically and executing that query. The stored proc query looks like:
Begin
DECLARE #Query nvarchar(MAX)
SET #Query = 'SELECT e.id, e.name, e.add, e.phno from employee'
EXEC sp_executesql #Query
End
In above sql code you can see that i am executing '#Query' variable, and that variable value can be changed dynamically.
I am able to add my stored proc in my edmx file. and then I go to model browser and say Add function import and try to Get column information it does not show anything. but when I execute my stored proc at server it returns all columns with values. Why i am not getting column information at model browser?
The model browser isn't running the stored procedure to then gather the column information from its result - it's trying to grab the column information from the underlying procedure definition using the sys tables.
This procedure, because it's dynamic, will not have an underlying definition and therefore won't be importable into the EDMX like this.
Temporarily change your stored proc to
SELECT TOP 1 e.id, e.name, e.add, e.phno from employee /* ... rest of your proc commented out */
Then add it to the EF and create the function import (it will see the columns).
Then change the proc back to how you had it above.
Try adding SET NOCOUNT ON after BEGIN.... that supresses messages that might cause it to be "confused"

Getting the result columns of table valued functions in SQL Server 2008 R2

For a constants generator I like to get the meta data of result columns for all my table valued functions (what are the names of the columns returned by each table valued function). How can I get them? Do I have to parse the function's source code or is there an interface providing this information?
Thanks for your help
Chris
The following query I use to get the TVFs:
SELECT udf.name AS Name, SCHEMA_NAME(udf.schema_id) AS [Schema]
FROM master.sys.databases AS dtb, sys.all_objects AS udf
WHERE dtb.name = DB_NAME()
AND (udf.type IN ('TF', 'FT'))
AND SCHEMA_NAME(udf.schema_id) <> 'sys'
This information is available in sys.columns
Returns a row for each column of an object that has columns, such as
views or tables. The following is a list of object types that have
columns:
Table-valued assembly functions (FT)
Inline table-valued SQL functions (IF)
Internal tables (IT)
System tables (S)
Table-valued SQL functions (TF)
User tables (U)
Views (V)
SELECT *
FROM sys.columns
WHERE object_id=object_id('dbo.YourTVF')