Convert comma separated non json string to json - postgresql

Below is the value of a string in a text column.
select col1 from tt_d_tab;
'A:10000000,B:50000000,C:1000000,D:10000000,E:10000000'
I'm trying to convert it into json of below format.
'{"A": 10000000,"B": 50000000,"C": 1000000,"D": 10000000,"E": 10000000}'
Can someone help on this?

If you know that neither the keys nor values will have : or , characters in them, you can write
select json_object(regexp_split_to_array(col1,'[:,]')) from tt_d_tab;
This splits the string on every colon and comma, then interprets the result as key/value pairs.
If the string manipulation gets any more complicated, SQL may not be the ideal tool for the job, but it's still doable, either by this method or by converting the string into the form you need directly and then casting it to json with ::json.

If your key is a single capital letter as in your example
select concat('{',regexp_replace('A:10000000,B:50000000,C:1000000,D:10000000,E:10000000','([A-Z])','"\1"','g'),'}')::json json_field;
A more general case with any number of letters caps or not
select concat('{',regexp_replace('Ac:10000000,BT:50000000,Cs:1000000,D:10000000,E:10000000','([a-zA-Z]+)','"\1"','g'),'}')::json json_field;

Related

SalesForce Spark Delimiter issue

I have a glue job, in which am reading table from SF using soql:
df = (
spark.read.format("com.springml.spark.salesforce")
.option("soql", sql)
.option("queryAll", "true")
.option("sfObject", sf_table)
.option("bulk", bulk)
.option("pkChunking", pkChunking)
.option("version", "51.0")
.option("timeout", "99999999")
.option("username", login)
.option("password", password)
.load()
)
and whenever there is a combination of double-quotes and commas in the string it messes up my table schema, like so:
in source:
Column A
Column B
Column C
000AB
"text with, comma"
123XX
read from SF in df :
Column A
Column B
Column C
000AB
"text with
comma"
Is there any option to avoid such cases when this comma is treated as a delimiter? I tried various options but nothing worked. And SOQL doesn't accept REPLACE or SUBSTRING functions, their text manipulation functions are, well, basically there aren't any.
All the information I'm giving need to be tested. I do not have the same env so it is difficult for me to try anything but here is what I foud.
When you check the official doc, you find that there is a field metadataConfig. The documentation of this field can be found here : https://resources.docs.salesforce.com/sfdc/pdf/bi_dev_guide_ext_data_format.pdf
On page 2, csv format, it says :
If a field value contains a control character or a new line the field value must be contained within double quotes (or your
fieldsEscapedBy value). The default control characters (fieldsDelimitedBy, fieldsEnclosedBy,
fieldsEscapedBy, or linesTerminatedBy) are comma and double quote. For example, "Director of
Operations, Western Region".
which kinda sounds like you current problem.
By default, the values are comma and double quotes, so, I do not understand why it is failing. But, apparently, in your output, it keeps the double quotes, so, maybe, it considers only simple quote.
You should try to enforce the format and add in you code :
.option("metadataConfig", '{"fieldsEnclosedBy": "\"", "fieldsDelimitedBy": ","}')
# Or something similar - i could'nt test, so you need to try by yourself

How to delete space in character text?

I wrote a code that automatically pulls time-related information from the system. As indicated in the table is fixed t247 Month names to 10 characters in length. But it is a bad image when showing on the report screen.
I print this way:
WRITE : 'Bugün', t_month_names-ltx, ' ayının'.
CONCATENATE gv_words-word '''nci günü' INTO date.
CONCATENATE date ',' INTO date.
CONCATENATE date gv_year INTO date SEPARATED BY space.
TRANSLATE date TO LOWER CASE.
I tried the CONDENSE t_month_names-ltx NO-GAPS. method to delete the spaces, but it was not enough.
After WRITE, I was able to write statically by setting the blank value:
WRITE : 'Bugün', t_month_names-ltx.
WRITE : 14 'ayının'.
CONCATENATE gv_words-word '''nci günü' INTO date.
CONCATENATE date ',' INTO date.
CONCATENATE date gv_year INTO date SEPARATED BY space.
TRANSLATE date TO LOWER CASE.
But this is not a correct use. How do I achieve this dynamically?
You could use a temporary field of type STRING:
DATA l_month TYPE STRING.
l_month = t_month_names-ltx.
WRITE : 'Bugün', l_month.
WRITE : 14 'ayının'.
CONCATENATE gv_words-word '''nci günü' INTO date.
CONCATENATE date ',' INTO date.
CONCATENATE date gv_year INTO date SEPARATED BY space.
TRANSLATE date TO LOWER CASE.
You can not delete trailing spaces from a TYPE C field, because it's of constant length. The unused length is always filled with spaces.
But after you assembled you string, you can use CONDENSE without NO-GAPS to remove any chains of more than one space within the string.
Add CONDENSE date. below the code you wrote and you should get the results you want.
Another option is to abandon CONCATENATE and use string templates (string literals within | symbols) for string assembly instead, which do not have the annoying habit of including trailing spaces of TYPE C fields:
DATA long_char TYPE C LENGTH 128.
long_char = 'long character field'.
WRITE |this is a { long_char } inserted without spaces|.
Output:
this is a long character field inserted without spaces

how to remove # character from national data type in cobol

i am facing issue while converting unicode data into national characters.
When i convert the Unicode data into national using national-of function, some junk character like # is appended after the string.
E.g
Ws-unicode pic X(200)
Ws-national pic N(600)
--let the value in Ws-Unicode is これらの変更は. getting from java end.
move function national-of ( Ws-unicode ,1208 ) to Ws-national.
--after converting value is like これらの変更は #.
i do not want the extra # character added after conversion.
please help me to find out the possible solution, i have tried to replace N'#' with space using inspect clause.
it worked well but failed in some specific scenario like if we have # in input from user end. in that case genuine # also converted to space.
Below is a snippet of code I used to convert EBCDIC to UTF. Before I was capturing string lengths, I was also getting # symbols:
STRING
FUNCTION DISPLAY-OF (
FUNCTION NATIONAL-OF (
WS-EBCDIC-STRING(1:WS-XML-EBCDIC-LENGTH)
WS-EBCDIC-CCSID
)
WS-UTF8-CCSID
)
DELIMITED BY SIZE
INTO WS-UTF8-STRING
WITH POINTER WS-XML-UTF8-LENGTH
END-STRING
SUBTRACT 1 FROM WS-XML-UTF8-LENGTH
What this code does is string the UTF8 representation of the EBCIDIC string into another variable. The WITH POINTER clause will capture the new length of the string + 1 (+ 1 because the pointer is positioned to the next position after the string ended).
Using this method, you should be able to know exactly how long second string is and use that string with the exact length.
That should remove the unwanted #s.
EDIT:
One thing I forgot to mention, in my case, the # signs were actually EBCDIC low values when viewing the actual hex on the mainframe
Use inspect with reverse and stop after first occurence of #

In PostgreSQL, how can I unwrap a json string to text?

Suppose I have a value of type json, say y. One may obtain such a value through, for example, obj->'key', or any function that returns values of type json.
This value, when cast to text, includes quotation marks i.e. "y" instead of y. In cases where using json types is unavoidable, this poses a problem, especially when we wish to compare the value with literal strings e.g.
select foo(x)='bar';
The API Brainstorm page suggests a from_json function that will intelligently unwrap JSON strings, but I doubt that is available yet. In the meantime, how can one convert JSON strings to text without the quotation marks?
Text:
To extract a value as text, use #>>:
SELECT to_json('foo'::text) #>> '{}';
From: Postgres: How to convert a json string to text?
PostgreSQL doc page: https://www.postgresql.org/docs/11/functions-json.html
So it addresses your question specifically, but it doesn't work with any other types, like integer or float for example. The #> operator will not work for other types either.
Numbers:
Because JSON only has one numeric type, "number", and has no concept of int or float, there's no obvious way to cast a JSON type to a "correct" numeric type. It's best to know the schema of your JSON, extract the text and then cast to the correct type:
SELECT (('{"a":2.01}'::json)->'a'#>>'{}')::float
PostgreSQL does however have support for "arbitrary precision numbers" ("up to 131072 digits before the decimal point; up to 16383 digits after the decimal point") with its "numeric" type. JSON also supports 'e' notation for large numbers.
Try this to test them both out:
SELECT (('{"a":2e99999}'::json)->'a'#>>'{}')::numeric
The ->> operator unwraps quotation marks correctly. In order to take advantage of that operator, we wrap up our value inside an array, and then convert that to json.
CREATE OR REPLACE FUNCTION json2text(IN from_json JSON)
RETURNS TEXT AS $$
BEGIN
RETURN to_json(ARRAY[from_json])->>0;
END; $$
LANGUAGE plpgsql;
For completeness, we provide a CAST that makes use of the function above.
CREATE CAST (json AS text) WITH json2text(json) AS ASSIGNMENT;

hstore value with single quote

I asked similar question here for: hstore value with space. And get solved by user: Clodoaldo Neto. Now I have come across next case with string containing single quote.
SELECT 'k=>"name", v=>"St. Xavier's Academy"'::hstore;
I tried it by using dollar-quoted string constant by reading http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-CONSTANTS
SELECT 'k=>"name", v=>$$St. Xavier's Academy$$'::hstore;
But I couldn't get it right.
How to make postgresql hstore using strings containing single quote?
It seems like there are more such exceptions possible for this query. How to address them all at once?
You can escape the embedded single quote that same way you'd escape any other single quote inside a string literal: double it.
SELECT 'k=>"name", v=>"St. Xavier''s Academy"'::hstore;
-- ------------------------------^^
Alternatively, you could dollar quote the whole string:
SELECT $$k=>"name", v=>"St. Xavier's Academy"$$::hstore;
Whatever interface you're using to talk to PostgreSQL should be taking care of these quoting and escaping issues. If you're using manual string wrangling to build your SQL then you should be using your driver's quoting and placeholder methods.
hstore's internal parsing understands double quotes around keys:
Double-quote keys and values that include whitespace, commas, =s or >s.
Dollar quoting is, as you noted, for SQL string literals, hstore's parser doesn't know what they mean.