I have to create an UNLOAD job for a DB2 table and save the UNload in unicode. That's no problem.
But unfortunately there are contents in the table columns that correspond to the separators.
For example, I would like the combination #! as a separator, but I can't do that in unicode.
Can someone tell me how to do this?
Now my statement looks like this:
DELIMITED COLDEL X'3B' CHARDEL X'24' DECPT X'2E'
UNICODE
thanks a lot for your help
The delimiter can be a single character (not two characters, as you want).
In this case the chosen solution was to find a single character that did not appear in the data.
When that is not possible, consider a non-delimited output format, or a different technique to get the data to the external system (for example via federation or other SQL-based interchange, or XML etc.
Related
What is the easiest way to normalize a text field in postgresql table?
I am trying to find duplicates.
For example, I want to consider O'Reilly a duplicate of oreilly. La Salle should be a duplicate of la'salle as well.
In a nutshell, we want to
lowercase all text,
strip accents
strip punctuation marks such as these [.'-_] and
strip spaces
Can this all be done in one or two simple steps? Ideally using built in postgresql functions.
Cheers
The following will give you what you want, using just standard Postgres functions;
regexp_replace (lower(unaccent(string_in)),'[^0-9a-z]','','g')
See example here. Or if you do not want digits the just
regexp_replace (lower(unaccent(string_in)),'[^a-z]','','g')
Before I begin my question and background information, I'd like to state that I realize that many people have asked a similar question, but none of the answers to their questions have applied to my situation.
Background info: I'm trying to properly format a very large CSV file so that I can import it into a table in my PostgreSQL database. This CSV file only contains two fields, and the delimiter is ;
Problems encountered/attempted solutions
Problem #1: The delimiter is a semicolon, and many of the values in one of the fields contain semicolons. PostgreSQL obviously doesn't like this.
Solution #1: I used sed to change the delimiter to a string of characters that I knew would only occur as a delimiter.
Problem #2: The delimiter can only be a single character.
Solution #2: I changed the delimiter to a unicode character that I knew wouldn't occur as anything other than a delimiter.
Problem #3: The delimiter can only be a single-byte character.
Solution #3: I decided to go back in my steps, and rather than mess with the delimiter, I tried using sed to enclose all field values in double quotes in order to avoid the problem of some of the values containing the delimiter character. More specifically, I tried using the command found in the answer to this question - sed statement to change/modify CSV separators and delimiters
Problem #4: This resulted in many data errors, as any time a delimiter was in one of the values, double quotes were placed around it, and this caused Postgre SQL to attempt to copy values that were far too long and were simply not individual values. This row here is a perfect example of that -
"m[redacted]#[redacted].com";"mk,l.";"/'"
This row in particular made PostgreSQL think that it was copying 3 columns. Not to mention this row -
"[redacted]'";"of'";"all'";"your'";"[redacted]#[redacted].com";"[redacted]#[redacted].com:hapa[redacted]hoha"
Which made PostgreSQL attempt to copy the entire rest of the file into the second field as a single value.
Question
With all of that having been said, my final question is this - how can I enclose every value in the CSV file in double quotes in such a way that it will be properly imported into PostgreSQL?
Right now I'm backed against a wall and would appreciate any advice, even if it isn't a clear answer. I've tried everything I can think of. If an answer is even possible, I'd like one that can apply to CSV files that contain more than two fields, as I have many more CSV files to import after this one.
You state that one of the two fields can contain semicolons. If so (the other field does not ever contain any) then the 1st semicolon abutting this field is the delimiter. If the field containing semicolons as part of the data is first, then you need to find the last semicolon on the line, otherwise the first.
I've never used SED but regex allows you to match on the first or last occurrence of a character thus you can replace this single semicolon with a temporary character or pattern, then you should be able to successfully place quotes around the fields, and finally change the temporary field delimiter back.
I want to create a table in Amazon Redshift with PascalCase notation. How do I achieve this?
E.g.: I want the table name to be "EmployeeDetails" and not as default way in which it gets created as "employeedetails".
Identifiers and names in Redshift are case-insensitive.
Standard and delimited identifiers are case-insensitive and are folded
to lower case. Identifiers must consist of only UTF-8 printable
characters.
Source
I recommend using snake_case, as #a_horse_with_no_name suggested. This is the standard way of doing it.
I have a XML Document file. The part of the file looks like this:
-<attr>
<attrlabl>COUNTY</attrlabl>
<attrdef>County abbreviation</attrdef>
<attrtype>Text</attrtype>
<attwidth>1</attwidth>
<atnumdec>0</atnumdec>
-<attrdomv>
-<edom>
<edomv>C</edomv>
<edomvd>Clackamas County</edomvd>
<edomvds/>
</edom>
-<edom>
<edomv>M</edomv>
<edomvd>Multnomah County</edomvd>
<edomvds/>
</edom>
-<edom>
<edomv>W</edomv>
<edomvd>Washington County</edomvd>
<edomvds/>
</edom>
</attrdomv>
</attr>
From this XML file, I want to create a PostgreSQL table with columns of attrlabl, attrdef, attrtype, and attrdomv. I appreciate your suggestions!
While Erwin is right that this can be done with PostgreSQL tools, I would suggest still going the custom translation yourself as there are a few reasons here.
The first is determining appropriate XML to PostgreSQL type conversions. You probably want to choose these yourself. But this example highlights a very different problem, what to do with nested data structures. You could, for example, store XML fragments. You could store text, json, or the like. You could create other tables and fkey in.
In general I have almost always found the best approach is to simply manually create the tables. This substitutes human judgement for automated mappings and allows you to create better matches than a computer will.
I want to store unicode characters in on of the column of PostgreSQL8.4 datat base table. I want to store non-English language data say want to store the Indic language texts. I have achieved the same in Oracle XE by converting the text into unicode and stored in the table using nvarchar2 column data type.
The same way I want to store unicode characters of Indic languages say (Tamil,Hindi) in one of the column of a table. How to I can achieve that,what data type should I use?
Please guide me, thanks in advance
Just make sure the database is initialized with encoding utf8. This applies to the whole database for 8.4, later versions are more sophisticated. You might want to check the locale settings too - see the manual for details, particularly around matching with LIKE and text pattern ops.