"Invalid column name" when "Specify database fields" in table output is unchecked in pentaho PDI V7 - pentaho-spoon

I'm trying to insert data to SQL database. I have all the columns in the same order as the data flow. But I'm getting this "Invalid column name name_of_the_actual_data_column" error

Column order won't matter, but exact column names will. Your SQL implementation probably isn't picky enough to require case-sensitive matches, but spaces and punctuation will matter. With Specify database fields unchecked, all field names MUST exist as columns in the target table.
I've found a good way to troubleshoot SQL inserts is to put a Select step before the Table output and make sure that you're really only getting the columns you want to insert.
You can also right-click on the Table output step and choose Input fields... to see the column metadata being passed into the step.

Related

How to import data from csv to postgresql table that has an auto_increment first column or field

Hi I am trying to import data to a postgresql table from a csv file, so I'd like to know how do I exlude the first column since it is an identity column that increments when data is inserted?
Error I get is here
If you are using some script to add or modify data, then you should just skip the variable on the script (e.g. does not write then on the insert statement), but on doing so you should modify your csv to delete the insert column(the data, if any, and the separator, usually a comma) since the number of variables are now different.
Looking at your print I suppose you are using pgadmin or some simmilar GUI, on the case of pgadmin if you click on the column and select the import\export data... option, you will open a windows where you should select "import" and then, on the upper windows menu, click on "columns" and exclude the "ID" or any other auto-increment, this slution also needs you to remove the csv column as well.

Basic questions about Cloud SQL

I'm trying to populate a cloud sql database using a cloud storage bucket, but I'm getting some errors. The csv has the headers (or column names) as first row and does not have all the columns (some columns in the database can be null, so I'm loading the data I need for now).
The database is in postgresql and this is the first database in GCP I'm trying to configure and I'm a little bit confused.
Does it matters if the csv file has the column names?
Does the order of the columns matter in the csv file? (I guess they do if there are not present in the csv)
The PK of the table is a serial number, which I'm not including in the csv file. Do I need to include also the PK? I mean, because its a serial number it should be "auto assigned", right?
Sorry for the noob questions and thanks in advance :)
This is all covered by the COPY documentation.
It matters in that you will have to specify the HEADER option so that the first line is skipped:
[...] on input, the first line is ignored.
The order matters, and if the CSV file does not contain all the columns in the same order as the table, you have to specify them with COPY:
COPY mytable (col12, col2, col4, ...) FROM '/dir/afile' OPTIONS (...);
Same as above: if you omit a table column in the column list, it will be filled with the default value, in that case that is the autogenerated number.

select all columns except two in q kdb historical database

In output I want to select all columns except two columns from a table in q/kdb historical database.
I tried running below query but it does not work on hdb.
delete colid,coltime from table where date=.z.d-1
but it is failing with below error
ERROR: 'par
(trying to update a physically partitioned table)
I referred https://code.kx.com/wiki/Cookbook/ProgrammingIdioms#How_do_I_select_all_the_columns_of_a_table_except_one.3F but no help.
How can we display all columns except for two in kdb historical database?
The reason you are getting par error is due to the fact that it is a partitioned table.
The error is documented here
trying to update a partitioned table
You cannot directly update, delete anything on a partitioned table ( there is a separate db maintenance script for that)
The query you have used as fix is basically selecting the data first in-memory (temporarily) and then deleting the columns, hence it is working.
delete colid,coltime from select from table where date=.z.d-1
You can try the following functional form :
c:cols[t] except `p
?[t;enlist(=;`date;2015.01.01) ;0b;c!c]
Could try a functional select:
?[table;enlist(=;`date;.z.d);0b;{x!x}cols[table]except`colid`coltime]
Here the last argument is a dictionary of column name to column title, which tells the query what to extract. Instead of deleting the columns you specified this selects all but those two, which is the same query more or less.
To see what the functional form of a query is you can run something like:
parse"select colid,coltime from table where date=.z.d"
And it will output the arguments to the functional select.
You can read more on functional selects at code.kx.com.
Only select queries work on partitioned tables, which you resolved by structuring your query where you first selected the table into memory, then deleted the columns you did not want.
If you have a large number of columns and don't want to create a bulky select query you could use a functional select.
?[table;();0b;{x!x}((cols table) except `colid`coltime)]
And show all columns except a subset of columns. The column clause expects a dictionary hence I am using the function {x!x} to convert my list to a dictionary. See more information here
https://code.kx.com/q/ref/funsql/
As nyi mentioned, if you want to permanently delete columns from an historical database you can use the deleteCol function in the dbmaint tools https://github.com/KxSystems/kdb/blob/master/utils/dbmaint.md

How can I copy an IDENTITY field?

I’d like to update some parameters for a table, such as the dist and sort key. In order to do so, I’ve renamed the old version of the table, and recreated the table with the new parameters (these can not be changed once a table has been created).
I need to preserve the id field from the old table, which is an IDENTITY field. If I try the following query however, I get an error:
insert into edw.my_table_new select * from edw.my_table_old;
ERROR: cannot set an identity column to a value [SQL State=0A000]
How can I keep the same id from the old table?
You can't INSERT data setting the IDENTITY columns, but you can load data from S3 using COPY command.
First you will need to create a dump of source table with UNLOAD.
Then simply use COPY with EXPLICIT_IDS parameter as described in Loading default column values:
If an IDENTITY column is included in the column list, the EXPLICIT_IDS
option must also be specified in the COPY command, or the COPY command
will fail. Similarly, if an IDENTITY column is omitted from the column
list, and the EXPLICIT_IDS option is specified, the COPY operation
will fail.
You can explicitly specify the columns, and ignore the identity column:
insert into existing_table (col1, col2) select col1, col2 from another_table;
Use ALTER TABLE APPEND twice, first time with IGNOREEXTRA and the second time with FILLTARGET.
If the target table contains columns that don't exist in the source
table, include FILLTARGET. The command fills the extra columns in the
source table with either the default column value or IDENTITY value,
if one was defined, or NULL.
It moves the columns from one table to another, extremely quickly; took me 4s for 1GB table in dc1.large node.
Appends rows to a target table by moving data from an existing source
table.
...
ALTER TABLE APPEND is usually much faster than a similar CREATE TABLE
AS or INSERT INTO operation because data is moved, not duplicated.
Faster and simpler than UNLOAD + COPY with EXPLICIT_IDS.

T SQL alias column name based on query from a table

I have to display a column name from a select query from table, e,g, 'Column 1' is stored in table table1, I want to display 'Column 1' as column name. I cannot hardcode 'Column 1' as it might be unknown when the code is developed.
Column 1 Column 2
a b
Any idea?
Update
It allows user to define column name.
It has to generate via T SQL
Add alias as a separate column or use dynamic SQL
This doesn't make sense because doing you should know what the column is called: you don't store information about columns and build queries on your own metadata
IF you don't know the name of the column at design time, then use a resource file or similar in the presentation layer to hold the value that you will display as the label for column1. It should be noted, however, you are going to have a very hard time writing any code if you don't know the names of the columns in your database, unless you are select * on everything.
Column names are fairly restricted and should be named with the SQL admin and application developer as the user of the names. It is not the intent of a column name to be descriptor for the end user interface.
Select dbColumn1 as [Customer Name] from tableMain
Even that is not a good practice and then user input is part of the TSQL and you are opening yourself up to SQL injection attacks and it is just not very good control on you query. A better practice is to pass parameters.