We're migrating some old sql from Teradata to RS. I realized white spaces are being removed, why is it? and how may I preserver them?, thanks.
create table test99 ( a char(1), b char(1), c char(1));
insert into test99 values('*','*','*');
insert into test99 values('*',' ','*');
insert into test99 values('*','*','*');
select a||':'||b||':'||c from test99;
?column?
----------
*:*:*
*::* <------- white space removed
*:*:*
(3 rows)
CHAR is a fixed-length type. If you make a column col1 a CHAR(10) and do:
INSERT INTO col1 VALUES ('cat');
This will be stored as: 'cat ' to disk. All 10 bytes are used.
When you retrieve this data the 'cat ' is retrieved and trimmed so you just get 'cat' back. That trimming is done behind the scenes and it's the nature of the data type you are using.
It follows that when you store ' ' in that CHAR(10) column, what will be stored to disk will be ' ' and when that value is retrieved it will be trimmed and what you will have left is ''.
Instead use a VARCHAR(1). It will take slightly more disk space, but you'll preserve your space.
Related
The maximum size of limited character types (e.g. varchar(n)) in Postgres is 10485760.
description on max length of postgresql's varchar
Please download the file for testing and extract it in /tmp/2019q4, we only use pre.txt to import data with.
sample data
Enter you psql and create a database:
postgres=# create database edgar;
postgres=# \c edgar;
Create table according to the webpage:
fields in pre table definations
edgar=# create table pre(
id serial ,
adsh varchar(20),
report numeric(6,0),
line numeric(6,0),
stmt varchar(2),
inpth boolean,
rfile char(1),
tag varchar(256),
version varchar(20),
plabel varchar(512),
negating boolean
);
CREATE TABLE
Try to import data:
edgar=# \copy pre(adsh,report,line,stmt,inpth,rfile,tag,version,plabel,negating) from '/tmp/2019q4/pre.txt' with delimiter E'\t' csv header;
We analyse the error info:
ERROR: value too long for type character varying(512)
CONTEXT: COPY pre, line 1005798, column plabel: "LIABILITIES AND STOCKHOLDERS EQUITY 0
0001493152-19-017173 2 11 BS 0 H LiabilitiesAndStockholdersEqu..."
Time: 1481.566 ms (00:01.482)
1.What size i set in the field is just 512 ,more less than 10485760.
2.the content in line 1005798 is not same as in error info:
0001654954-19-012748 6 20 EQ 0 H ReclassificationAdjustmentRelatingToAvailableforsaleSecuritiesNetOfTaxEffect 0001654954-19-012748 Reclassification adjustment relating to available-for-sale securities, net of tax effect" 0
Now i drop the previous table ,convert the plabel field as text,re-create it:
edgar=# drop table pre;
DROP TABLE
Time: 22.763 ms
edgar=# create table pre(
id serial ,
adsh varchar(20),
report numeric(6,0),
line numeric(6,0),
stmt varchar(2),
inpth boolean,
rfile char(1),
tag varchar(256),
version varchar(20),
plabel text,
negating boolean
);
CREATE TABLE
Time: 81.895 ms
Import the same data with same copy command:
edgar=# \copy pre(adsh,report,line,stmt,inpth,rfile,tag,version,plabel,negating) from '/tmp/2019q4/pre.txt' with delimiter E'\t' csv header;
COPY 275079
Time: 2964.898 ms (00:02.965)
edgar=#
No error info in psql console,let me check the raw data '/tmp/2019q4/pre.txt' ,which it contain 1043000 lines.
wc -l /tmp/2019q4/pre.txt
1043000 /tmp/2019q4/pre.txt
There are 1043000 lines,how much lines imported then?
edgar=# select count(*) from pre;
count
--------
275079
(1 row)
Why so less data imported without error info ?
The sample data you provided is obviously not the data you are really loading. It does still show the same error, but of course the line numbers and markers are different.
That file occasionally has double quote marks where there should be single quote marks (apostrophes). Because you are using CSV mode, these stray double quotes will start multi-line strings, which span all the way until the next stray double quote mark. That is why you have fewer rows of data than lines of input, because some of the data values are giant multiline strings.
Since your data clearly isn't CSV, you probably shouldn't be using \copy in CSV format. It loads fine in text format as long as you specify "header", although that option didn't become available in text format until v15. For versions before that, you could manually remove the header line, or use PROGRAM to skip the header like FROM PROGRAM 'tail +2 /tmp/pre.txt' Alternatively, you could keep using CSV format, but choose a different quote character, one that never shows up in your data such as with (delimiter E'\t', format csv, header, quote E'\b')
I am working on a data upgrade issue.
A new column has been added to our Existing DB2 table.
In order to upgrade Client's DB2 table, I am unloading the data from the existing DB2 table with a constant value for the new column(lets say D of type SMALLINT) as shown below:
UNLOAD TABLESPACE XYZ.ABC
DELIMITED COLDEL X'2C' CHARDEL X'22'
PUNCHDDN SYSPUN01
UNLDDN SYSREC01 CCSID(367)
FROM TABLE DB2BG111.table_name
(
A POSITION(*) CHAR(2)
, B POSITION(*) SMALLINT
, C POSITION(*) CHAR(4)
, D (new column) CONSTANT X'0000'
)
While unloading the data, we are using following unload parameters:
X'2C': Column Delimiter
X'22': Character Delimiter
CCSID(367): EBCDIC to ASCII conversion
Problem tham I am facing is, DB2 is adding character delimiter X'22' after the value of the column D in the unload record.
Please note column B is an existing column and is declared as SMALLINT, DB2 not adding character del for this in the unload record.
This may be happening because new column D added here is not declared as a SMALLINT and hence it is not treated like an SMALLINT and DB2 is ending a char del for this in the unload record.
I am just looking for a way to get out of this situation, i do not want my new column D to be character delimited in the unload record.
Any suggestions to overcome this would be highly appreciated.
One option would be to use DSNTIAUL to perform the unload, and select the constant in the select list.
SELECT A, B, C, CAST (0 AS SMALLINT) AS D
FROM TABLE DB2BG111.table_name;
Best regards,
Patrick Bossman
I have a database with one column of the type nvarchar. If I write
INSERT INTO table VALUES ("玄真")
It shows ¿¿ in the table. What should I do?
I'm using SQL Developer.
Use single quotes, rather than double quotes, to create a text literal and for a NVARCHAR2/NCHAR text literal you need to prefix it with N
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( value NVARCHAR2(20) );
INSERT INTO table_name VALUES (N'玄真');
Query 1:
SELECT * FROM table_name
Results:
| VALUE |
|-------|
| 玄真 |
First, using NVARCHAR might not even be necessary.
The 'N' character data types are for storing data that doesn't 'fit' in the database's defined character set. There's an auxiliary character set defined as the NCHAR Character set. It's kind of a band aid - once you create a database it can be difficult to change its character set. Moral of this story - take great care in defining the Character Set when creating your database and do not just accept the defaults.
Here's a scenario (LiveSQL) where we're storing a Chinese string in both NVARCHAR and VARCHAR2.
CREATE TABLE SO_CHINESE ( value1 NVARCHAR2(20), value2 varchar2(20 char));
INSERT INTO SO_CHINESE VALUES (N'玄真', '我很高興谷歌翻譯。' )
select * from SO_CHINESE;
Note that both the character sets are in the Unicode family. Note also I told my VARCHAR2 string to hold 20 characters. That's because some characters may require up to 4 bytes to be stored. Using a definition of (20) would give you only room to store 5 of those characters.
Let's look at the same scenario using SQL Developer and my local database.
And to confirm the character sets:
SQL> clear screen
SQL> set echo on
SQL> set sqlformat ansiconsole
SQL> select *
2 from database_properties
3 where PROPERTY_NAME in
4 ('NLS_CHARACTERSET',
5 'NLS_NCHAR_CHARACTERSET');
PROPERTY_NAME PROPERTY_VALUE DESCRIPTION
NLS_NCHAR_CHARACTERSET AL16UTF16 NCHAR Character set
NLS_CHARACTERSET AL32UTF8 Character set
First of all, you should to establish the Chinese character encoding on your Database, for example
UTF-8, Chinese_Hong_Kong_Stroke_90_BIN, Chinese_PRC_90_BIN, Chinese_Simplified_Pinyin_100_BIN ...
I show you an example with SQL Server 2008 (Management Studio) that incorporates all of this Collations, however, you can find the same characters encodings in other Databases (MySQL, SQLite, MongoDB, MariaDB...).
Create Database with Chinese_PRC_90_BIN, but you can choose other Coallition:
Select a Page (Left Header) Options > Collation > Choose the Collation
Create a Table with the same Collation:
Execute the Insert Statement
INSERT INTO ChineseTable VALUES ('玄真');
I want to get records from my local txt file to postgresql table.
I have created following table.
create table player_info
(
Name varchar(20),
City varchar(30),
State varchar(30),
DateOfTour date,
pay numeric(5),
flag char
)
And, my local txt file contains following data.
John|Mumbai| |20170203|55555|Y
David|Mumbai| |20170305| |N
Romcy|Mumbai| |20170405|55555|N
Gotry|Mumbai| |20170708| |Y
I am just executing this,
copy player_info (Name,
City,
State,
DateOfTour,
pay_id,
flag)
from local 'D:\sample_player_info.txt'
delimiter '|' null as ''
exceptions 'D:\Logs\player_info'
What I want is,
For my numeric column, If 3 spaces are there,
then I have to insert NULL as pay else whatever 5 digits numeric number.
pay is a column in my table whose datatype is numeric.
Is this correct or possible to do this ?
You cannot store strings in a numeric column, at all. 3 spaces is a string, so it cannot be stored in the column pay as that is defined as numeric.
A common approach to this conundrum is to create a staging table which uses less precise data types in the column definitions. Import the source data into the staging table. Then process that data so that it can be reliably added to the final table. e.g. in the staging table set a column called pay_str to NULL where pay_str = ' ' (or perhaps LIKE ' %')
I cannot get the like statement to work with space and trailing wildcard.
My query goes as follows:
select * from Table where Field like 'Desc_%'
The data is space-delimited such as, Desc Top, Desc Bottom and so on. The query works when I use the pattern 'Desc_%' but not when I use the pattern 'Desc %'. The field is nvarchar(255).
Any ideas?
EDIT
Turns out the data was tab-delimited and when I copied a value from the 2008 Management Studio it converted the tab to space. Dumb mistake. I did like the [ ] tip so I marked it the answer. Thanks everyone, I'll remember not to trust the copy from the grid results.
Use brackets '[' & ']' to set up a single-character class to match. In your case the SQL should look like this: "select * from Table where Field like 'Desc[ ]%'"
EDIT: add sample, link
CREATE TABLE #findtest (mytext varchar(200) )
insert #findtest VALUES ('Desc r')
insert #findtest VALUES ('Descr')
select * from #findtest where mytext like 'Desc[ ]%'
DROP TABLE #findtest
(1 row(s) affected)
(1 row(s) affected)
mytext
--------
Desc r
(1 row(s) affected)
See this article.
Since an underscore is a single character wildcard, and percent is the multi-char wildcard, they are the same ( "%" and "_%" ). It is as if you are asking for two consecutive wildcards. Not sure if I understand your question, but I am not surprised it does not behave the way you expect.
Consider explicitly stating that you want a space, using its ASCII value?
SELECT * FROM Table WHERE Field Like 'Desc' + CHAR(32) + '%'