Picking up the date from very long string name column in Sybase - tsql

I am working in Sybase with these this table having column 'ID', 'File_Name'
Table1
IDS File_Name_Attached
123 ROSE1234_abcdefghi_03012014_04292014_190038.zip
456 ROSE1234_abcdefghi_08012014_04292014_190038.zip
All I need is to pickup the first date given in file name.
Required:
IDS Dates
123 03012014
456 08012014

You can use SUBSTRING and PATINDEX to find start_index of date:
LiveDemo
CREATE TABLE #table1(IDS int, File_Name_attached NVARCHAR(100));
INSERT INTO #table1
VALUES (123, 'ROSE1234_abcdefghi_03012014_04292014_190038.zip'),
(456, 'ROSE1234_abcdefghi_08012014_04292014_190038.zip');
SELECT
IDS,
[DATES] = SUBSTRING(File_Name_attached,
PATINDEX('%_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_%', File_Name_attached) + 1,
8)
FROM #table1;
Warning
I have no Sybase DB for testing so if this won't work let me know.

Related

Show first and last value in table

I have an excel file with customer's purchasing details (sorted by date).
for example:
customer_id
date
$_Total_purchase
A
1/2/23
5
A
1/3/23
20
A
1/4/23
10
i want to show in table, one row for each customer, so the final table will be:
customer_id
date
purchase_counter
amount_of_last_purchase
amount_of_first_purchase
A
1/4/23
3
10
5
in my table, customer_id is a dimension.
for extracting the date, i use max(date) as measure
for purchase_counter i use count(customer_id)
for extracting 'amount_of_first_purchase', i use firstSortedValue('$_Total_purchase', date)
how can i extract 'amount_of_last_purchase'? is there maybe an aggregation function i can use?
Thanks in advance :)
The simple answer is that you can use -date in you expression and this will return the last record:
FirstSortedValue('$_Total_purchase', -date)
The above will work for the provided data example. When there are more than one customer then Aggr function can help:
First: FirstSortedValue(aggr(sum($_Total_purchase), customer_id, date), date)
Last: FirstSortedValue(aggr(sum($_Total_purchase), customer_id, date), -date)
Another approach (if applied to your case/data) is to flag the first and last records during the data load and use the flags in the measures.
An example script:
RawData:
Load * Inline [
customer_id, date, $_Total_purchase
A, 2/1/23, 5
A, 3/1/23, 20
A, 4/1/23, 10
B, 5/1/23, 35
B, 6/1/23, 40
B, 7/1/23, 50
];
Temp0:
Load
customer_id,
date,
// flag the first record
// if the current row is the beggining of the table then flag as isFirst = 1
// if the customer_id for the current row is different from the previously loaded >-
// customer_id then flag as isFirst = 1
if(RowNo() = 1 or customer_id <> peek(customer_id), 1, null()) as isFirst,
// getting the last is a bit more tricky
// similar logic - if the currrent and previous customer_id are different >-
// or it is the end of the table then get the current customer_id and date >-
// and combine their values. Values are separeted with | ELSE write 0.
// for example: A|4/1/23 or B|7/1/23
if(customer_id <> peek(customer_id) and RowNo() <> 1, peek(customer_id) & '|' & peek(date),
if(RowNo() = NoOfRows('RawData'), customer_id & '|' & date, 0
)) as isLastTemp
Resident
RawData
;
// Get all the data from Temp0 for which isLastTemp is not equal to 0
// split isLastTemp by | -> fist value is customer_id and second is date
// join the result back to the otiginal table
join (RawData)
Load
SubField(isLastTemp, '|', 1) as customer_id,
SubField(isLastTemp, '|', 2) as date,
1 as isLast
Resident
Temp0
Where
isLastTemp <> 0
;
// join Temp0 to the original table
// but only grab the isFirst flag
join(RawData)
Load
customer_id,
date,
isFirst
Resident
Temp0
;
// this table is no longer needed
Drop Table Temp0;
Once the above script is reloaded RawData table will have two more columns - isFirst and isLast:
Then the expressions are simpler:
First: sum( {< isFirst = {1} >} $_Total_purchase)
Last: sum( {< isLast = {1} >} $_Total_purchase)
import pandas as pd
# read excel file
df = pd.read_excel('customer_purchases.xlsx')
# get first value
first_value = df.head(1)
# get last value
last_value = df.tail(1)
you can do with pandas also

Split the string using String_Split() in SQL Server 2016

I need to use STRING_SPLIT in my stage table and import the results into another table.
Stage table:
DECLARE #stage TABLE(ID INT, Code VARCHAR(500))
INSERT INTO #stage
SELECT 1, '123_Potato_Orange_Fish'
UNION ALL
SELECT 2, '456_Tomato_Banana_Chicken'
UNION ALL
SELECT 3, '789_Onion_Mango_Lamb'
Final table:
DECLARE #Final TABLE
(
ID INT,
code VARCHAR(500),
Unit VARCHAR(100),
Vegetable VARCHAR(100),
Fruit VARCHAR(100),
Meat VARCHAR(100)
)
I am using SSIS execute task to transform the stage table data and insert into the final table. The Code column in stage table is string and '_' is used for delimiter. I need to separate the string and display the final table as shown below
ID code Unit Vegetable Fruit Meat
------------------------------------------------------------------
1 123_Potato_Orange_Fish 123 Potato Orange Fish
2 456_Tomato_Banana_Chicken 456 Tomato Banana Chicken
3 789_Onion_Mango_Lamb 789 Onion Mango Lamb
I am trying to use SQL Server 2016 built-in String_Split() function as shown here:
SELECT
ID,
Code, f.value AS Vegetable
FROM
#stage AS s
CROSS APPLY
(SELECT
value,
ROW_NUMBER() OVER(PARTITION BY s.ID ORDER BY s.ID) AS rn
FROM
String_Split(s.Code, '_')) AS f
WHERE
s.ID = 1 AND f.rn = 2
But it only split one string at a time, as my stage data contain millions of records i need to split all the string in the code column and store in the respective column.
Note: I don't want to use temporary table.
thanks
You can add a Derived Column and assuming that the format is consist with what you listed, use the TOKEN function to split the input based on the "_" delimiter and position of each string. From here, you can map each of the outputs to the appropriate destination column. The three statements below split your code column based on the sample data in your question. Note that the output data type of TOKEN is DT_WSTR (Unicode). If you need the non-Unicode data, you'll have to cast it back to DT_STR, which can also be done within the same Derived Column by adding (DT_STR,50,1252) (adjust length as necessary) before each statement.
TOKEN(Code,"_",1)
TOKEN(Code,"_",2)
TOKEN(Code,"_",3)
Like #userfl89 here is another SSIS solution using script component:
Add the 4 output columns to your output0. Make sure you select Code as in input column.
string[] col = Row.Code.ToString().Split('_');
Row.Unit = Int.Parse(col[0]);
Row.Vegetable = col[1];
Row.Fruit = col[2];
Row.Meat = col[3];
Since the accepted answer uses TOKEN(), which is bound to SSIS, I want to provide a SQL-Server-solution too.
You are using v2016, that allows for OPENJSON. When you use this on a JSON-array you'll get a column [key] indicating the position in the array and a column [value] providing the actual content.
It is very easy to transform a CSV-string to a JSON array. The rest ist pivoting by conditional aggregation. Try it out:
DECLARE #stage TABLE(ID INT, Code VARCHAR(500))
INSERT INTO #stage
SELECT 1, '123_Potato_Orange_Fish'
UNION ALL
SELECT 2, '456_Tomato_Banana_Chicken'
UNION ALL
SELECT 3, '789_Onion_Mango_Lamb'
SELECT ID
,Code
,MAX(CASE WHEN [key]=0 THEN CAST([value] AS INT) END) AS Unit
,MAX(CASE WHEN [key]=1 THEN [value] END) AS Vegetable
,MAX(CASE WHEN [key]=2 THEN [value] END) AS Fruit
,MAX(CASE WHEN [key]=3 THEN [value] END) AS Meat
FROM #stage
CROSS APPLY OPENJSON('["' + REPLACE(Code,'_','","') + '"]') A
GROUP BY ID,Code

Cast a PostgreSQL column to stored type

I am creating a viewer for PostgreSQL. My SQL needs to sort on the type that is normal for that column. Take for example:
Table:
CREATE TABLE contacts (id serial primary key, name varchar)
SQL:
SELECT id::text FROM contacts ORDER BY id;
Gives:
1
10
100
2
Ok, so I change the SQL to:
SELECT id::text FROM contacts ORDER BY id::regtype;
Which reults in:
1
2
10
100
Nice! But now I try:
SELECT name::text FROM contacts ORDER BY name::regtype;
Which results in:
invalid type name "my first string"
Google is no help. Any ideas? Thanks
Repeat: the error is not my problem. My problem is that I need to convert each column to text, but order by the normal type for that column.
regtype is a object identifier type and there is no reason to use it when you are not referring to system objects (types in this case).
You should cast the column to integer in the first query:
SELECT id::text
FROM contacts
ORDER BY id::integer;
You can use qualified column names in the order by clause. This will work with any sortable type of column.
SELECT id::text
FROM contacts
ORDER BY contacts.id;
So, I found two ways to accomplish this. The first is the solution #klin provided by querying the table and then constructing my own query based on the data. An untested psycopg2 example:
c = conn.cursor()
c.execute("SELECT * FROM contacts LIMIT 1")
select_sql = "SELECT "
for row in c.description:
if row.name == "my_sort_column":
if row.type_code == 23:
sort_by_sql = row.name + "::integer "
else:
sort_by_sql = row.name + "::text "
c.execute("SELECT * FROM contacts " + sort_by_sql)
A more elegant way would be like this:
SELECT id::text AS _id, name::text AS _name AS n FROM contacts ORDER BY id
This uses aliases so that ORDER BY still picks up the original data. The last option is more readable if nothing else.

SQl Server 2012 autofill one column from another

I have a table where a user inputs name, dob, etc. and I have a User_Name column that I want automatically populated from other columns.
For example input is: Name - John Doe, DOB - 01/01/1900
I want the User_Name column to be automatically populated with johndoe01011900 (I already have the algorithm to concatenate the column parts to achieve the desired result)
I just need to know how (SQL, Trigger) to have the User_Name column filled once the user completes imputing ALL target columns. What if the user skips around and does not input the data in order? Of course the columns that are needed are (not null).
This should do it:
you can use a calculated field with the following calculation:
LOWER(REPLACE(Name, ' ', ''))+CONVERT( VARCHAR(10), DateOfBirth, 112))
In the below sample I have used a temp table but this is the same for regular tables as well.
SAMPLE:
CREATE TABLE #temp(Name VARCHAR(100)
, DateOfBirth DATE
, CalcField AS LOWER(REPLACE(Name, ' ', ''))+CONVERT( VARCHAR(10), DateOfBirth, 112));
INSERT INTO #temp(Name
, DateOfBirth)
VALUES
('John Doe'
, '01/01/1900');
SELECT *
FROM #temp;
RESULT:

converting columns to postgresql database

suppose I have a list of columns in python of equal lengths (in the example below, each column has 4 elements, but in my actual work each column has around 500+ elements):
col0 = [1, 12, 23, 41] # also used as a primary key
col1 = ['asdas','asd', '1323', 'adge']
col2 = [true, false, true, true]
col4 = [312.12, 423.1, 243.56, 634.5]
and I have a postgresql table already defined, with columns: Col0 (integer, also primary key), Col1 (character varying), Col2 (boolean), Col3 (numeric)
I wrote the following code to connect to the postgresql database (which seemed to have worked fine):
import psycopg2
...
conn = psycopg2.connect("dbname='mydb' user='myuser' host='localhost' password = 'mypwd')
cur = conn.cursor()
Now suppose I want to push the columns to postgresql table myt where I want the rows to populated as :
Col1 Col2 Col3 Col4
1 'asdas' true 312.12
12 'asd' false 423.1
...
I saw examples on SO such as this one where the example is for reading from a csv file:
for row in reader:
cur.execute("INSERT INTO test (num, data) VALUES (%s, %s)", (variable1, variable2))
(a) Can I adopt something similar for my case? Would this work:
for i in range(0, len(col0)):
cur.execute("INSERT INTO myt (Col1, Col2, Col3, Col4) VALUES (%??, %s, %??, %f)", (col0[i], col1[i], col2[i], col3[i]))
(b) If yes, what is the type specifier for python integer, boolean, float types , when the corresponding postgresql types are integer, boolean and numeric?
(c) Also, what if I have 40 tables, instead of 4 tables. Do I have to write a long line like this:
"INSERT INTO myt (Col1, Col2, ..., Col40) VALUES (%d, %s, ..., %f)", (col0[i], ...))
a,b:
yes, that will work, psycopg2 uses %s to represent all types, that's just the way it works.(so don't use %?? etc...)
c:
"insert into {t} ({c}) values ({v})".format(
t=tablename,
c=','.join(columns_list),
v=','.join(['%'] * len(columns_list))
gets you closer to a universal insert expression, but you still need to loop...