Add columns to Postgres, source file changed

Add columns to Postgres, source file changed - postgresql

I get data files from one of our vendors. Each line is a continuous string with most places filled out. They have plenty of sections that are just space characters to be used as filler locations for future columns. I have a parser that formats it into a CSV so I can upload it into postgres. Today the vendor informs us that they are adding a column by splitting one of their filler fields into 2 columns. X and Filler
For example index 0:5 is the name, 5:20 is filler and 20:X is other stuff. They are splitting 5:20 into 5:10 and 10:20 where 10:20 will still be a placeholder column.
NAME1 AUHDASFAAF!##!12312312541 -> NAME1, ,AUHDASFAAF,.....
Is now
NAME1AAAAA AUHDASFAAF!##!12312312541 -> NAME1,AAAAA, ,AUHDASFAAF,......
Modifying my parser to account for this change is the easy part. How do I edit my postgres table to accept this new column from the CSV file? Ideally I dont want to remake and reupload all of the data into the table.

Columns are in the order they are defined. When you add a new column it goes at the end. There's no direct way to add a column in the middle. While insert values (...) is convenient, you should not rely on the order of columns in the table.
There are various work arounds like dropping and recreating the table or dropping and adding columns. These are all pretty inconvenient and you'll have to do it again when there's another change.
You should never make assumptions about the order of columns in the table either in an insert or select *. You can either spell out all the columns, or you can create a view which specifies the order of the columns.
You don't have to write the columns out by hand. Get them from information_schema.columns and edit their order as necessary for your queries or to set up your view.
select column_name
from information_schema.columns
where table_name = ?

Related

Most efficient way to DECODE multiple columns -- DB2

I am fairly new to DB2 (and SQL in general) and I am having trouble finding an efficient method to DECODE columns
Currently, the database has a number of tables most of which have a significant number of their columns as numbers, these numbers correspond to a table with the real values. We are talking 9,500 different values (e.g '502=yes' or '1413= Graduate Student')
In any situation, I would just do WHERE clause and show where they are equal, but since there are 20-30 columns that need to be decoded per table, I can't really do this (that I know of).
Is there a way to effectively just display the corresponding value from the other table?
Example:
SELECT TEST_ID, DECODE(TEST_STATUS, 5111, 'Approved, 5112, 'In Progress') TEST_STATUS
FROM TEST_TABLE
The above works fine.......but I manually look up the numbers and review them to build the statements. As I mentioned, some tables have 20-30 columns that would need this AND some need DECODE statements that would be 12-15 conditions.
Is there anything that would allow me to do something simpler like:
SELECT TEST_ID, DECODE(TEST_STATUS = *TableWithCodeValues*) TEST_STATUS
FROM TEST_TABLE
EDIT: Also, to be more clear, I know I can do a ton of INNER JOINS, but I wasn't sure if there was a more efficient way than that.

From a logical point of view, I would consider splitting the lookup table into several domain/dimension tables. Not sure if that is possible to do for you, so I'll leave that part.
As mentioned in my comment I would stay away from using DECODE as described in your post. I would start by doing it as usual joins:
SELECT a.TEST_STATUS
, b.TEST_STATUS_DESCRIPTION
, a.ANOTHER_STATUS
, c.ANOTHER_STATUS_DESCRIPTION
, ...
FROM TEST_TABLE as a
JOIN TEST_STATUS_TABLE as b
ON a.TEST_STATUS = b.TEST_STATUS
JOIN ANOTHER_STATUS_TABLE as c
ON a.ANOTHER_STATUS = c.ANOTHER_STATUS
JOIN ...
If things are too slow there are a couple of things you can try:
Create a statistical view that can help determine cardinalities from the joins (may help the optimizer creating a better plan):
https://www.ibm.com/support/knowledgecenter/sl/SSEPGG_9.7.0/com.ibm.db2.luw.admin.perf.doc/doc/c0021713.html
If your license admits you can experiment with Materialized Query Tables (MQT). Note that there is a penalty for modifications of the base tables, so if you have more of a OLTP workload, this is probably not a good idea:
https://www.ibm.com/developerworks/data/library/techarticle/dm-0509melnyk/index.html
A third option if your lookup table is fairly static is to cache the lookup table in the application. Read the TEST_TABLE from the database, and lookup descriptions in the application. Further improvements may be to add triggers that invalidate the cache when lookup table is modified.

If you don't want to do all these joins you could create yourself an own LOOKUP function.
create or replace function lookup(IN_ID INTEGER)
returns varchar(32)
deterministic reads sql data
begin atomic
declare OUT_TEXT varchar(32);--
set OUT_TEXT=(select text from test.lookup where id=IN_ID);--
return OUT_TEXT;--
end;
With a table TEST.LOOKUP like
create table test.lookup(id integer, text varchar(32))
containing some id/text pairs this will return the text value corrseponding to an id .. if not found NULL.
With your mentioned 10k id/text pairs and an index on the ID field this shouldn't be a performance issue as such data amount should be easily be cached in the corresponding bufferpool.

are INTO, FROM an JOIN the only ways to get a table?

I'm currently writing a script which will allow me to input a file (generally .sql) and it'll generate a list of every table that's used in that file. the process is simple as it opened the input file, checks for a substring and if that substring exists outputs the line to the screen.
the substring that being checked is tsql keywords that is indicative of a selected table such as INTO, FROM and JOIN. not being a T-SQL wizard those 3 keywords are the only ones i know of that are used to select a table in a query.
So my question is, in T-SQL are INTO, FROM an JOIN the only ways to get a table? or are these others?

There're many ways to get a table, here're some of them:
DELETE
FROM
INTO
JOIN
MERGE
OBJECT_ID (N'dbo.mytable', N'U') where U is the object type for table.
TABLE, e.g. ALTER TABLE, TRUNCATE TABLE, DROP TABLE
UPDATE
However, by using your script, you'll not only get real tables, but maybe VIEW and temporary table. Here're 2 examples:
-- Example 1
SELECT *
FROM dbo.myview
-- Example 2
WITH tmptable AS
(
SELECT *
FROM mytable
)
SELECT *
FROM tmptable

How to add two or more column in third column in SQL server 2008

For Example
Roll No, Name, Maths, English, Total(Maths+English)

You can use a computed column. Assuming columns Maths and English are a numeric type, you can do like so:
ALTER TABLE [MyTable] ADD Total AS Maths + English;
Once created, you access the column (read only, obviously) as you would any other column, i.e.
select English, Maths, Total from [MyTable];

SELECT Roll No, Name, Maths, English, Maths+English AS TOTAL
FROM [YOUR TABLE]

You need to assign the Total column a function
go to the table in design view mode, then click on Total column (I assume
you already have this column, if not create it first, and then click on it in design view)
then from column properties expand 'Computer Column Specification'
then in forumula field write:
(Maths+English)

Copy selected query fields name in Mysql Workbench

I am using mysql workbench (SQL Editor). I need copy the list of columns in each query as was existed in Mysql Query Browser.
For example
Select * From tb
I want have the list of fields like as:
id,title,keyno,......

You mean you want to be able to get one or more columns for a specified table?
1st way
Do SHOW COLUMNS FROM your_table_name and from there on depending on what you want have some basic filtering added by specifying you want only columns that data type is int, default value is null etc e.g. SHOW COLUMNS FROM your_table_name WHERE type='mediumint(8)' ANDnull='yes'
2nd way
This way is a bit more flexible and powerful as you can combine many tables and other properties kept in MySQL's INFORMATION_SCHEMA internal db that has records of all db columns, tables etc. Using the query below as it is and setting TABLE_NAME to the table you want to find the columns for
SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME='your_table_name';
To limit the number of matched columns down to a specific database add AND TABLE_SCHEMA='your_db_name' at the end of the query
Also, to have the column names appear not in multiple rows but in a single row as a comma separated list you can use GROUP_CONCAT(COLUMN_NAME,',') instead of only COLUMN_NAME

To select all columns in select statement, please go to SCHEMAS menu and right click ok table which you want to select column names, then select "Copy to Clipboard > Select All statement".

The solution accepted is fine, but it is limited to field names in tables. To handle arbitrary queries would be to standardize your select clause to be able to use regex to strip out only the column aliases. I format my select clause as "1 row per element" so
Select 1 + 1 as Col1, 1 + 2 Col2 From Table
becomes
Select 1 + 1 as Col1
, 1 + 2 Col2
From Table
Then I use simple regex on the "1 row per select element" version to replace "^.* " (excluding quotes) with nothing. The regex finds everything before the final space in the line, so it assumes your column aliases doesn't contain spaces (so replace spaces with underscore). Or if you don't like "1 row per element" then always use "as" keyword to give you a handle that regex can grasp.

ltrim(rtrim(x)) leave blanks on rtl content - anyone knows on a work around?

i have a table [Company] with a column [Address3] defined as varchar(50)
i can not control the values entered into that table - but i need to extract the values without leading and trailing spaces. i perform the following query:
SELECT DISTINCT RTRIM(LTRIM([Address3])) Address3 FROM [Company] ORDER BY Address3
the column contain both rtl and ltr values
most of the data retrieved is retrieved correctly - but SOME (not all) RTL values are returned with leading and or trailing spaces
i attempted to perform the following query:
SELECT DISTINCT ltrim(rTRIM(ltrim(rTRIM([Address3])))) c, ltrim(rTRIM([Address3])) b, [Address3] a, rtrim(LTRIM([Address3])) Address3 FROM [Company] ORDER BY Address3
but it returned the same problem on all columns - anyone has any idea what could cause it?

The rows that return with extraneous spaces might have a kind of space or invisible character the trim functions don't know about. The documentation doesn't even mention what is considered "a blank" (pretty damn sloppy if you ask me). Try taking one of those rows and looking at the characters one by one to see what character they are.

since you are using varchar, just do this to get the ascii code of all the bad characters
--identify the bad character
SELECT
COUNT(*) AS CountOf
,'>'+RIGHT(LTRIM(RTRIM(Address3)),1)+'<' AS LastChar_Display
,ASCII(RIGHT(LTRIM(RTRIM(Address3)),1)) AS LastChar_ASCII
FROM Company
GROUP BY RIGHT(LTRIM(RTRIM(Address3)),1)
ORDER BY 3 ASC
do a one time fix to data to remove the bogus character, where xxxx is the ASCII value identified in the previous select:
--only one bad character found in previous query
UPDATE Company
SET Address3=REPLACE(Address3,CHAR(xxxx),'')
--multiple different bad characters found by previous query
UPDATE Company
SET Address3=REPLACE(REPLACE(Address3,CHAR(xxxx1),''),char(xxxx2),'')
if you have bogus chars in your data remove them from the data and not each time you select the data. you WILL have to add this REPLACE logic to all INSERTS and UPDATES on this column, to keep any new data from having the bogus characters.
If you can't alter the data, you can just select it this way:
SELECT
LTRIM(RTRIM(REPLACE(Address3,CHAR(xxxx),'')))
,LTRIM(RTRIM(REPLACE(REPLACE(Address3,CHAR(xxxx1),''),char(xxxx2),'')))
...