Pyspark code giving wierd whitespace error - pyspark

When I ran this query, zip code is getting some hidden whitespace which pushes over to data-source column. Tried trim function and trim with carriage return, whitespace is not getting removed. Any suggestions?
Note: window partition was used to get zip code
select
distinct x,
first_value(zip_code)
over(partition by x order by date_time_closed desc) as zip_code
from table

Related

RedShift: some troubles with nested json

I have next JSON:
{"promptnum":4,"corpuscode":"B0014","prompttype":"video","skipped":false,"transcription":"1","deviceinfo":{"DEVICE_ID":"exynos980","DEVICE_MANUFACTURER":"samsung","DEVICE_SERIAL":"unknown","DEVICE_DESIGN":"a51x","DEVICE_MODEL":"SM-A5160","DEVICE_OS":"android","DEVICE_OS_VERSION":"10","DEVICE_CARRIER":"","DEVICE_BATTERY_LEVEL":"70.00%","DEVICE_BATTERY_STATE":"unplugged","Current App Version":"1.1.0","Current App Build":"6"}}
I want to get values from 1-st level and 2-nd level.
1-st level: "promptnum":4,"corpuscode":"B0014","prompttype":"video","skipped":false,"transcription":"1","deviceinfo":...
2-nd level:
"deviceinfo":{"DEVICE_ID":"exynos980","DEVICE_MANUFACTURER":"samsung","DEVICE_SERIAL":"unknown","DEVICE_DESIGN":"a51x","DEVICE_MODEL":"SM-A5160","DEVICE_OS":"android","DEVICE_OS_VERSION":"10","DEVICE_CARRIER":"","DEVICE_BATTERY_LEVEL":"70.00%","DEVICE_BATTERY_STATE":"unplugged","Current App Version":"1.1.0","Current App Build":"6"}
When I parse 1-st level with
SELECT d.*
FROM (
SELECT c.json_parse, c.json_parse.deviceinfo AS device_info
FROM (
SELECT JSON_PARSE(file_attr)
FROM public.dc_ac_files
) AS c) AS d
it's work well.
But when I try to get values from 2-nd level with
SELECT d.*, l.DEVICE_ID
FROM (
SELECT c.json_parse, c.json_parse.deviceinfo AS device_info
FROM (
SELECT JSON_PARSE(file_attr)
FROM public.dc_ac_files
) AS c) AS d, d.device_info AS l
it doesn't work - no errors and no data.
If I know, it's right way to parse nested json, but it doesn't work for me.
Can you help me?
Viktor you have a couple of issues. First the notation "AS d, d.device_info AS l" is used to unnest arrays in your super data. You don't have any arrays to unnest so this is returning zero rows.
Second Redshift defaults to lower case for all column names so DEVICE_ID is being seen as device_id. You can enable case sensitive column names by setting the enable_case_sensitive_identifier connection variable to true and quoting all column names that require upper characters. "SET enable_case_sensitive_identifier TO true;" and changing l.DEVICE_ID to l."DEVICE_ID".
You also have unneeded layers in your query.
Putting all these together you can run:
SELECT l, l.deviceinfo, l.deviceinfo."DEVICE_ID"
FROM (
SELECT JSON_PARSE(file_attr) AS l
FROM public.dc_ac_files
) AS c
You also don't need SUPER data type to perform this. This can be done with json string parsing functions.
SELECT file_attr, json_extract_path_text(file_attr, 'deviceinfo') as deviceinfo, json_extract_path_text(file_attr, 'deviceinfo','DEVICE_ID') as device_id
FROM public.dc_ac_files

syntax error using redshift listagg function

select id, listagg(timestamp,',')
within group (order by timestamp) as timestamp
from activity group by contact_id order by contact_id limit 1;
This is the error I am getting:
syntax error at or near ","
LINE 1: select eloqua_contact_id, listagg(timestamp,',') within grou...
Anything wrong with this query? When i remove the delimiter option i do not get an error and everything returns fine. How do i add commas to separate the list agg column?
I suspect the issue is the column name "timestamp" as that is a data type and reserved word. If you enclose the column name in double quotes it will keep it from being interpreted as a datatype. (best guess)
select id, listagg("timestamp",',')
within group (order by "timestamp") as "timestamp"
from activity group by contact_id order by contact_id limit 1;
Generally not a good idea to name your columns the same as datatypes.

MySQL Sort Query with Special Character

I have to sort one column of mytable in ascending order but problem is mytable contains some special characters related data. Still I want to sort in ascending order so that it display in proper manner in UI.
Can anyone help me with this?
I have tried using
ORDER BY Item DESC
But it gives me first ABC type rows then {ABC} type rows.Means giving special characters in last
You can try this for your problem :
select * from mytable ORDER BY REGEXP_REPLACE(Item,'[^[:alnum:]'' '']', NULL) DESC

Trimming parts of a word but each word is different size

I have a table with values like this:
book;65
book;1000
table;66
restaurant;1202
park;2
park;44444
Is there a way using postgres sql to remove everything, regardless of the length of the word, that includes the semi-colon and everything after it?
I plan on doing a query that goes something like this after I figure this out:
select col1, modified_col_1
from table_1
--modified is without the semi-colon and everything after
You can use substring and strpos() for this:
select col1, substring(col1, 1, strpos(col1, ';') - 1) as modified_col_1
The above will give an error if there are values without a ;
Another option would be to split the string into an array and then just pick the first element:
select (string_to_array(col1, ';'))[1]
from table_1
This will also work if no ; is present

ltrim(rtrim(x)) leave blanks on rtl content - anyone knows on a work around?

i have a table [Company] with a column [Address3] defined as varchar(50)
i can not control the values entered into that table - but i need to extract the values without leading and trailing spaces. i perform the following query:
SELECT DISTINCT RTRIM(LTRIM([Address3])) Address3 FROM [Company] ORDER BY Address3
the column contain both rtl and ltr values
most of the data retrieved is retrieved correctly - but SOME (not all) RTL values are returned with leading and or trailing spaces
i attempted to perform the following query:
SELECT DISTINCT ltrim(rTRIM(ltrim(rTRIM([Address3])))) c, ltrim(rTRIM([Address3])) b, [Address3] a, rtrim(LTRIM([Address3])) Address3 FROM [Company] ORDER BY Address3
but it returned the same problem on all columns - anyone has any idea what could cause it?
The rows that return with extraneous spaces might have a kind of space or invisible character the trim functions don't know about. The documentation doesn't even mention what is considered "a blank" (pretty damn sloppy if you ask me). Try taking one of those rows and looking at the characters one by one to see what character they are.
since you are using varchar, just do this to get the ascii code of all the bad characters
--identify the bad character
SELECT
COUNT(*) AS CountOf
,'>'+RIGHT(LTRIM(RTRIM(Address3)),1)+'<' AS LastChar_Display
,ASCII(RIGHT(LTRIM(RTRIM(Address3)),1)) AS LastChar_ASCII
FROM Company
GROUP BY RIGHT(LTRIM(RTRIM(Address3)),1)
ORDER BY 3 ASC
do a one time fix to data to remove the bogus character, where xxxx is the ASCII value identified in the previous select:
--only one bad character found in previous query
UPDATE Company
SET Address3=REPLACE(Address3,CHAR(xxxx),'')
--multiple different bad characters found by previous query
UPDATE Company
SET Address3=REPLACE(REPLACE(Address3,CHAR(xxxx1),''),char(xxxx2),'')
if you have bogus chars in your data remove them from the data and not each time you select the data. you WILL have to add this REPLACE logic to all INSERTS and UPDATES on this column, to keep any new data from having the bogus characters.
If you can't alter the data, you can just select it this way:
SELECT
LTRIM(RTRIM(REPLACE(Address3,CHAR(xxxx),'')))
,LTRIM(RTRIM(REPLACE(REPLACE(Address3,CHAR(xxxx1),''),char(xxxx2),'')))
...