SSIS convert non Unicode to Unicode but the text is error - unicode

I have a package that copy data from Oracle database to SQL database.
I used data conversion to convert DT_STR -> Unicode string, but my data has font error (It is Vietnamese)
empnm |
----------------+
BUI TH? TH??NG |
CAO V?N TIN |
CHU TH? THOA |
LE NG?C HOA |
L??NG V?N VINH |
NGUY?N V?N THANH|
T? V?N PH??C |
Oracle:
EMPNM VARCHAR2 50
SQL:
empnm nvarchar 50
I can't find anyone has same issue as me. Can you please give me some advices? Thanks
I tried this: SSIS Convert Between Unicode and Non-Unicode Error
I expect text data is normal
EMPNM |DUMP(EMPNM,1016) |
----------------+-----------------------------------------------------------------------------------------------+
BÙI THỊ THƯƠNG |Typ=1 Len=19 CharacterSet=AL32UTF8: 42,c3,99,49,20,54,48,e1,bb,8a,20,54,48,c6,af,c6,a0,4e,47 |
CAO VĂN TIẾN |Typ=1 Len=15 CharacterSet=AL32UTF8: 43,41,4f,20,56,c4,82,4e,20,54,49,e1,ba,be,4e |
CHU THỊ THOA |Typ=1 Len=14 CharacterSet=AL32UTF8: 43,48,55,20,54,48,e1,bb,8a,20,54,48,4f,41 |
LÊ NGỌC HOÀ |Typ=1 Len=15 CharacterSet=AL32UTF8: 4c,c3,8a,20,4e,47,e1,bb,8c,43,20,48,4f,c3,80 |
LƯƠNG VĂN VINH |Typ=1 Len=17 CharacterSet=AL32UTF8: 4c,c6,af,c6,a0,4e,47,20,56,c4,82,4e,20,56,49,4e,48 |
NGUYỄN VĂN THÀNH|Typ=1 Len=20 CharacterSet=AL32UTF8: 4e,47,55,59,e1,bb,84,4e,20,56,c4,82,4e,20,54,48,c3,80,4e,48|
TẠ VĂN PHƯỚC |Typ=1 Len=18 CharacterSet=AL32UTF8: 54,e1,ba,a0,20,56,c4,82,4e,20,50,48,c6,af,e1,bb,9a,43 |

Related

pyspark to_timestamp() handling format of miliseconds SSS

I have distorted Data,
I am using below function here.
to_timestamp("col","yyyy-MM-dd'T'hh:mm:ss.SSS'Z'")
Data:
time | OUTPUT | IDEAL
2022-06-16T07:01:25.346Z | 2022-06-16T07:01:25.346+0000 | 2022-06-16T07:01:25.346+0000
2022-06-16T06:54:21.51Z | 2022-06-16T06:54:21.051+0000 | 2022-06-16T06:54:21.510+0000
2022-06-16T06:54:21.5Z | 2022-06-16T06:54:21.005+0000 | 2022-06-16T06:54:21.500+0000
so, I have S or SS or SSS format for milisecond in data. How can i normalise it into SSS correct way? Here, 51 miliseconds mean 510 not 051.
Using spark version : 3.2.1
Code :
import pyspark.sql.functions as F
test = spark.createDataFrame([(1,'2022-06-16T07:01:25.346Z'),(2,'2022-06-16T06:54:21.51Z'),(3,'2022-06-16T06:54:21.5Z')],['no','timing1'])
timeFmt = "yyyy-MM-dd'T'hh:mm:ss.SSS'Z'"
test = test.withColumn("timing2", (F.to_timestamp(F.col('timing1'),format=timeFmt)))
test.select("timing1","timing2").show(truncate=False)
Output:
I also use v3.2.1 and it works for me if you just don't parse the timestamp format. It is already in the right format:
from pyspark.sql import functions as F
test = spark.createDataFrame([(1,'2022-06-16T07:01:25.346Z'),(2,'2022-06-16T06:54:21.51Z'),(3,'2022-06-16T06:54:21.5Z')],['no','timing1'])
new_df = test.withColumn('timing1_ts', F.to_timestamp('timing1'))\
new_df.show(truncate=False)
new_df.dtypes
+---+------------------------+-----------------------+
|no |timing1 |timing1_ts |
+---+------------------------+-----------------------+
|1 |2022-06-16T07:01:25.346Z|2022-06-16 07:01:25.346|
|2 |2022-06-16T06:54:21.51Z |2022-06-16 06:54:21.51 |
|3 |2022-06-16T06:54:21.5Z |2022-06-16 06:54:21.5 |
+---+------------------------+-----------------------+
Out[9]: [('no', 'bigint'), ('timing1', 'string'), ('timing1_ts', 'timestamp')]
I was using this setting :
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
I have to reset this and it is working as normal.

Translating SUBSTRING_INDEX() in MySQL to PostgreSQL

There are a number of question related to this and the answer is to use split_part(). For example:
emulating MySQL's substring_index() in PGSQL
Mysql`s SUBSTRING_INDEX equivalent in postgresql
I'm not getting the same behavior, however. I'm trying to figure out how to get the following functionality in Postgres.
If you have a string that looks like:
+------------------------------------------+
| string |
+------------------------------------------+
| A_123, B_123, C_123, D_123, E_123, F_123 |
+------------------------------------------+
MySQL will return the following with the given statement:
mysql> select SUBSTRING_INDEX(string, ',', 4) AS test FROM tbl;
+----------------------------+
| test |
+----------------------------+
| A_123, B_123, C_123, D_123 |
+----------------------------+
PostgreSQL will return the following with the given statement:
mysql> select split_part(string, ',', 4) AS test FROM tbl;
+-------+
| test |
+-------+
| D_123 |
+-------+
Is there a similar function or just implementing a function like this?
As a_horse_with_no_name suggested in the comments, this had the desired result:
array_to_string((regexp_split_to_array(string, '\s*,\s'))[:4], ', ')

Unable to insert data into postgresql table using flat file with copy command

My table structure is
company=# \d address
Table "public.address"
Column | Type | Modifiers
----------+-----------------------+-----------
name | character varying(80) |
age | integer |
dob | date |
village | character varying(8) |
locality | character varying(80) |
district | character varying(80) |
state | character varying(80) |
pin | integer |
and i have following data in the flat file(*.txt file).
insert into address(name,age,dob,village,locality,district,state,pin)
values('David',43,'1972-10-23','Elchuru','Addanki','Prakasam','AP',544421);
insert into address(name,age,dob,village,locality,district,state,pin)
values('George',53,'1962-10-23','London','London','LN','LN',544421);
insert into address(name,age,dob,village,locality,district,state,pin)
values('David',28,'1982-10-23','Ongole','Ongole','Prakasam','AP',520421);
Now I am trying load into my table 'address' using following query i psql shell.
copy address from 'C:/P Files/address_data.txt';
Error is:
company=# copy address from 'C:/P Files/address_data.txt';
ERROR: value too long for type character varying(80)
CONTEXT: COPY address, line 1, column name: "insert into address(name,age,dob,village,locality,district,state,pin) values('David',43,'1972-10-23'..."
Please suggest modifications to be done in the above query
You don't have a data file. You have a file with a set of commands.
You can use the psql command to execute the inserts.
A data file would look more like this:
David,43,1972-10-23,Elchuru,Addanki,Prakasam,AP,544421
George,53,1962-10-23,London,London,LN,LN,544421
David,28,1982-10-23,Ongole,Ongole,Prakasam,AP,520421

Postgresql - Split a string by hyphen and group by the second part of the string

I have the data stored in the below format :
resource_name | readiops | writeiops
90832-00:29:3E 3.21 4.00
90833-00:30:3E 2.12 3.45
90834-00:31:3E 2.33 2.78
90832-00:29:3E 4.21 6.00
I want to be able to do a split on resource_name column by "-" and group it by the second part of the split so that the above data looks like below :
array_serial | ldev | readiops | writeiops
90832 00:29:3E 3.21,4.21 4.00,6.00
90833 00:30:3E 2.12 3.45
90834 00:31:3E 2.33 2.78
The resource_name is split into array_serial & ldev .
i have tried using the below query just to get an error .
SELECT
SUBSTRING(resource_name, 0, STRPOS(resource_name, ':')) AS array_serial,
SUBSTRING(resource_name,1, STRPOS(resource_name, ':')) AS ldev
FROM table
GROUP BY SUBSTRING(resource_name, 0, STRPOS(resource_name, ':'))
I am new to postgres . So kindly help .
Use split_part():
with my_table(resource_name, readiops, writeiops) as (
values
('90832-00:29:3E', 3.21, 4.00),
('90833-00:30:3E', 2.12, 3.45),
('90834-00:31:3E', 2.33, 2.78),
('90832-00:29:3E', 4.21, 6.00)
)
select
split_part(resource_name::text, '-', 1) as array_serial,
split_part(resource_name::text, '-', 2) as ldev,
string_agg(readiops::text, ',') as readiops,
string_agg(writeiops::text, ',') as writeiops
from my_table
group by 1, 2;
array_serial | ldev | readiops | writeiops
--------------+----------+-----------+-----------
90832 | 00:29:3E | 3.21,4.21 | 4.00,6.00
90833 | 00:30:3E | 2.12 | 3.45
90834 | 00:31:3E | 2.33 | 2.78
(3 rows)

Accessing postgres data structure

I have a table in my postgres table which has data structured strangely. Here is an example of the data structure:
id | 1
name | name
data | :type: information
| :url: url
| :platform:
| android: ''
| iphone: ''
created_at | 2016-07-29 11:39:44.938359
updated_at | 2016-08-22 12:24:32.734321
How do i change data > platform > android for example?
Just did some more research and found this which did the trick:
postgresql - replace all instances of a string within text field