Split Cell using SSIS - tsql

I'm looking to use SSIS to transform the data held from a single source table. One of the cells has a string of characters. For example:
##/\/\/\/\/\##HHHHHHBBBB##/\/\/\/\/\
There's also another cell on the same row which contains a date.
Basically I want a each character within that string to be transferred to a new table as a row on it's own. The first two characters represent the date given in the other cell. The next two characters represent the following day and so on. So as well as having each character on it's own I would also want to increment the data and store that too.
Any idea how I would go about doing this or even if SSIS is the correct tool to be using.
Many Thanks

I wonder if you'd be better running this through a split-string function in SQL first? That way you'l be getting rows for each character along-side the date, and then you can just output it straight to a destination.
I've created a function to facilitate this:
CREATE FUNCTION [dbo].[udf_SplitStringIntoRows](#text varchar(max))
RETURNS #tbl TABLE ([value] char(1) NOT NULL)
AS
BEGIN
WHILE len(#text) > 0
BEGIN
INSERT INTO #tbl
SELECT left(#text,1)
SET #text = RIGHT(#text,len(#text)-1)
END
RETURN
END
Then, to test the data i created a quick temp table with your data in:
DECLARE #source as TABLE([value] varchar(max), [date] datetime)
INSERT INTO #source
SELECT '##/\/\/\/\/\##HHHHHHBBBB##/\/\/\/\/\', getdate()
UNION
SELECT '##/\/\/\/\/\##HHHHHHBBBB##/\/\/\/\/\', getdate()+1
UNION
SELECT '##/\/\/\/\/\##HHHHHHBBBB##/\/\/\/\/\', getdate()+2
Then cross applied the function to this dataset:
SELECT d.[value], s.date
FROM #source s
CROSS APPLY dbo.[udf_SplitStringIntoRows](s.value) d
Which should give you the source dataset you require to further process in SSIS.

Related

How does SELECT INTO works with SAS

I'm new with SAS and I try to copy my Code from Access vba into SAS.
In Access I use often the SELECT INTO funtion, but it seems to me this function is not in SAS.
I have two tables and I get each day new data and I want to update my table with the new lines. Now I Need to check if some new lines appear -> if yes insert this lines into the old table.
I tried some Code from stackoverflow and other stuff from Google, but I didn't find something which works.
INSERT INTO OLD_TABLE T
VALUES (GRVID = VTGONR)
FROM NEW_TABLE V
WHERE not exists (SELECT V.VTGONR FROM NEW_TABLE V WHERE T.GRVID = V.VTGONR);
Not sure what the purpose of using the VALUES keyword is in your example. PROC SQL uses VALUES() to list static values. Like:
VALUES (100)
SAS just uses normal SQL syntax instead. See for example: https://www.techonthenet.com/sql/insert.php
To specify the observations to insert just use SELECT. You can add a WHERE clause as part of the select to limit the rows that you select to insert. To tell INSERT which columns to insert into list them inside () after the table name. Otherwise it will expect the order that the columns are listed in the select statement to match the order of the columns in the target table.
insert into old_table(GRVID)
select VTGONR from new_table
where VTGONR not in (select GRVID from old_table)
;

Process a row with unknown structure in a cursor

I am new to using cursors for looping through a set of rows. But so far I had prior knowledge of which columns I am about to read.
E.g.
DECLARE db_cursor FOR
SELECT Column1, Column2
FROM MyTable
DECLARE #ColumnOne VARCHAR(50), #ColumnTwo VARCHAR(50)
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO #ColumnOne, #ColumnTwo
...
But the tables I am about to read into my key/value table have no specific structure and I should be able to process them one row at a time. How, using a nested cursor, can I loop through all the columns of the fetched row and process them according to their type and name?
TSQL cursors are not really designed to read data from tables of unknown structure. The two possibilities I can think of to achieve something in that direction are:
First read the column names of an unknown table from the Information Schema Views (see System Information Schema Views (Transact-SQL)). Then use dynamic SQL to create the cursor.
If you simply want to get any columns as a large string value, you might also try a simple SELECT * FROM TABLE_NAME FOR XML AUTO and further process the retrieved data for your purposes (see FOR XML (SQL Server)).
SQL is not very good in dealing with sets generically. In most cases you must know column names, data types and much more in advance. But there is XQuery. You can transform any SELECT into XML rather easily and use the mighty abilities to deal with generic structures there. I would not recommend this, but it might be worth a try:
CREATE PROCEDURE dbo.Get_EAV_FROM_SELECT
(
#SELECT NVARCHAR(MAX)
)
AS
BEGIN
DECLARE #tmptbl TABLE(TheContent XML);
DECLARE #cmd NVARCHAR(MAX)= N'SELECT (' + #SELECT + N' FOR XML RAW, ELEMENTS XSINIL);';
INSERT INTO #tmptbl EXEC(#cmd);
SELECT r.value('*[1]/text()[1]','nvarchar(max)') AS RowID
,c.value('local-name(.)','nvarchar(max)') AS ColumnKey
,c.value('text()[1]','nvarchar(max)') AS ColumnValue
FROM #tmptbl t
CROSS APPLY t.TheContent.nodes('/row') A(r)
CROSS APPLY A.r.nodes('*[position()>1]') B(c)
END;
GO
EXEC Get_EAV_FROM_SELECT #SELECT='SELECT TOP 10 o.object_id,o.* FROM sys.objects o';
GO
--Clean-Up for test purpose
DROP PROCEDURE Get_EAV_FROM_SELECT;
The idea in short
The select is passed into the procedure as string. With the SP we create a statement dynamically and create XML from it.
The very first column is considered to be the Row's ID, if not (like in sys.objects) we can write the SELECT and force it that way.
The inner SELECT will read each row and return a classical EAV-list.

Select Different Table Based on Year

I am attempting to build a view to be used in crystal reports that allows us to look up GL codes. Unfortunately, our ERP creates a new SQL table each year and appends it the last 2 digits onto the table name.
Unless I can find a way to change which table it looks at based off the date I will need to manually change the view every year for each of the views I am creating. Any advice?
This Year: select * from GL000016
Next Year: select * from GL000017
Here is the MSSQL version:
DECLARE #SQLQuery AS NVARCHAR(500)
DECLARE #TableName AS NVARCHAR(100)
SET #TableName = 'GL0000' + RIGHT(CONVERT(CHAR(4), GETDATE(), 120),2)
SET #SQLQuery = 'SELECT * FROM ' + #TableName
EXECUTE sp_executesql #SQLQuery
You could also use a stored procedure depending on the environment. #Tablename will hold the table name if that is all you need (i.e. SELECT #Tablename).
You can use the T-SQL Year function.
Returns an integer that represents the year of the specified date.
https://msdn.microsoft.com/en-us/library/ms186313.aspx
So for this year, the following will return 17.
select (YEAR(GETDATE()) % 100) + 1
Not exactly possible to switch tables dynamically for Views
If you want to switch the table you are selecting from, you'll need the to use IF statements or do Dynamic sql. Considering that you want to do this in a view, both of those are not available to you. So from my perspective, your options are:
Switch to use a stored procedure and use dynamic sql or if statements
Switch to use a function that returns a table (again, dynamic sql or if statements)
A Sql job that periodically runs a stored procedure that uses dynamic sql to re-create the view with the correct GL Account table name.
If you have to use a view, then 3 is probably your option, but it comes with a maintenance and handover overhead. Next person working on this project might be wondering why their view changes keeps getting overwritten.
Create yourself a temporary table that match the common structure of your GL0000XX table.
You then have to use dynamic SQL to query your tables.
CREATE TABLE #GL ....;
DECLARE #year char(2) = YEAR(GETDATE()) % 100;
INSERT INTO #GL
EXEC('SELECT * FROM GL0000' + #year);

Transpose and stack a large csv file?

I have a csv file with about 1,500 fields and 5-6 million rows. It is a dataset with one row for each individual who has received public benefits at some point since ISO week 32 in 1991. Each field represents one week and holds a number relating to the specific benefit received in that particular week. If the individual has received no benefits, the field is left blank (''). In addition to the weekly values there are a number of other fields (ID, sex, birthdate, etc.)
The data set is updated quarterly with an added field for each week in the quarter, and an added row for each new individual.
This is a sample of the data:
y_9132,y_9133,y_9134,...,y_1443,id,sex,dateofbirth
891,891,891,...,110,1000456,1,'1978/01/16'
110,112,112,...,997,2000789,0,'1945/09/28'
I'm trying to convert the data to a tabular format so it can be analysed using PostgreSQL with comlumn store or similar (Amazon Redshift is a possiblity).
The fields beginning with "y_" represents the year and week of the received public benefits. In a tabular format the field name should be converted to a row number or a date, starting with monday in ISO week 32 in 1991 (1991/08/05).
The tablular dataset I'm trying to convert the csv-file to would look like this:
(Week is just a sequential number, starting with 1 for the date '1991/08/05')
week,benefit,ID
1,891,1000456
2,891,1000456
3,891,1000456
...
1211,110,1000456
1,110,2000789
2,112,2000789
3,112,2000789
...
1211,997,2000789
I have written a function in PostgreSQL that works but, it is very slow. The entire conversion takes 15h. I have tried using my laptop with an SSD and 8GB RAM. I also tried it on an Amazon RDS instance with 30GB memory. Still slow. The PostgreSQL function splits the csv in chunks. I've experimented a bit and 100K rows pr. batch seems fastest (yeah, 15h fast).
To be clear, I'm not particularly looking for solution using PostgreSQL. Anything will do. In fact, I'm not sure why I would even use a DB for this at all.
That said, here are my functions in PostgreSQL:
First function: I load part of the csv file into a table called part_grund. I only load the fields with the weekly data and the ID.
CREATE OR REPLACE FUNCTION DREAMLOAD_PART(OUT result text) AS
$BODY$
BEGIN
EXECUTE 'DROP TABLE IF EXISTS part_grund;
CREATE UNLOGGED TABLE part_grund
(id int, raw_data text[],rn int[]);
INSERT INTO part_grund
SELECT raw_data[1300]::int
,raw_data[1:1211]
,rn
FROM grund_no_headers
cross join
(
SELECT ARRAY(
WITH RECURSIVE t(n) AS
(
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < 1211
)
SELECT n FROM t
) AS rn) AS rn;
CREATE INDEX idx_id on part_grund (id);';
END;
$BODY$
LANGUAGE plpgsql;
Second function: Here, the data is transformed using the unnest function.
CREATE OR REPLACE FUNCTION DREAMLOAD(startint int, batch_size int, OUT result text) AS
$BODY$
DECLARE
i integer := startint;
e integer := startint + batch_size;
endint integer;
BEGIN
endint := (SELECT MAX(ID) FROM part_grund) + batch_size;
EXECUTE 'DROP TABLE IF EXISTS BENEFIT;
CREATE UNLOGGED TABLE BENEFIT (
ID integer
,benefit smallint
,Week smallint
);';
WHILE e <= endint LOOP
EXECUTE 'INSERT INTO BENEFIT
SELECT ID
,unnest(raw_data) AS benefit
,unnest(rn) AS week
FROM part_grund
WHERE ID between ' || i || ' and ' || e-1 ||';';
i=i+batch_size;
e=e+batch_size;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql;
As I mentioned above, it works but, it is painfully slow. So, suggestions to a faster way of doing this would be very appreciated.

T-SQL Parse text in select statement

I am currently working on a project to import data from one table to another. I am trying to parse a field that contains a FULLNAME into its parts LAST,FIRST,MI. The names are all in the format of "LAST,FIRST MI" I have written a stored procedure that correctly parses and returns the results as neccessary but I am unsure as to how to encorporate the stored procedure into a single select statement. For instance, current I have:
SELECT FULLNAME From UserInfo
and what I would like to have is something like this:
SELECT Last, First, MI from UserInfo
Currently my stored procedure takes the form of ParseName(FULLNAME, Last as OUTPUT, First as OUTPUT, MI as OUTPUT). How can I call this procedure and have the output variables split into 3 different columns?
Replace your stored procedure with table valued function. You can then apply this function to all the rows.
Below is an example - just put your logic for parsing the name
create FUNCTION dbo.f_parseName(#inFullName varchar(255))
RETURNS
#tbl TABLE (lastName varchar(255), firstName varchar(255), middleName varchar(255))
as
BEGIN
-- put your logic here
insert into #tbl(lastName,firstName,middleName)
select substring(#inFullName,0,10),substring(#inFullName,11,10), substring(#inFullName,21,10)
return
end
apply the function
-- sample data
declare #fullNames table (fullName varchar(255))
insert into #fullNames (fullName) values
('111111111122222222223333333333')
,('AAAAAAAAAABBBBBBBBBBCCCCCCCCCC')
select
fn.fullName
,pn.lastName
,pn.firstName
,pn.middleName
from
#fullNames fn
cross apply dbo.f_parseName(fn.fullName) pn
You could put the results of your stored procedure in a (temporary) table, like this (I added the FULLNAME column to provide a join condition, you would have to adapt your stored procedure to do that):
CREATE TABLE #temp (
FULLNAME NVARCHAR(..)
,Last NVARCHAR(..)
,First NVARCHAR(..)
,MI NVARCHAR(..)
);
INSERT INTO #temp (Last, First, MI)
EXECUTE MySproc;
If you want to be able to execute SELECT Last, First, MI from UserInfo structurally, you'd have to first add three columns to UserInfo for your name information, and then insert the parsed data that you got from your stored procedure.
EDIT
You mention that you use a SELECT ... INTO ... to put the data in a new table. I'm guessing that the new table does not have the FULLNAME column, and then you would be better off using a table valued function (as this answer suggests). If you keep the FULLNAME column however, you can use that to join the temp table with your new table to update the new table as follows:
UPDATE NUI
SET NUI.Last = T.Last, NUI.First = T.First, NUI.MI = T.MI
FROM NewUserInfo AS NUI
INNER JOIN #temp AS T ON NUI.FULLNAME = T.FULLNAME;
You could use this UPDATE method also with another join condition if you do not have the FULLNAME column in your new table, but make sure you run a good test beforehand to check if the join holds.
Hope this helps, good luck!
You could add computed columns to table like this:
alter table UserInfo
add firstName as SUBSTRING(fullName, CHARINDEX(',',fullName,0)+2, LEN(fullName)-CHARINDEX(',',fullName,0)-CHARINDEX(' ', REVERSE(fullName),0)-1)
,lastName as SUBSTRING(fullName, 0, CHARINDEX(',',fullName,0))
,middleInitital as REVERSE(SUBSTRING(REVERSE(fullName),0,CHARINDEX(' ', REVERSE(fullName),0)))
But the best solution would be to do the other way around. Normalize the data with real columns for firstName, lastName and middleInitial and do a computed column for the fullName.
The expressions in the code above may need a little more work, as I am sure they can be written more effectivly. I only made them work to show the idea.
After creating the computed columns you may do this:
select firstName
,lastName
,middleInitital
from UserInfo