I have several CSV files with the same columns, however, the columns are in different order.
I would like to merge, via "import", all these CSV files.
Please could you help with this import statement? How can I make this import statement match the column order?
With Db2 on Unix/Windows, You can use the IMPORT command or the LOAD command . Additionally other ways are possible with the INGEST command.
With IMPORT or LOAD, There are two ways you can do it, either use the "METHOD P" or specify the order of target-columns on the INSERT clause There are two examples below.
The first example uses "Method P" for import:
there are three CSV files whose three columns are in different order, and a target table with three columns (a,b,c):
create table mytab(a integer not null, b integer not null, c integer not null)
DB20000I The SQL command completed successfully.
!cat 1a.csv
1,2,3
!cat 1b.csv
99,98,97
!cat 1c.csv
55,51,59
import from 1a.csv of del method p(1,2,3) insert into mytab
SQL3109N The utility is beginning to load data from file "1a.csv".
SQL3110N The utility has completed processing. "1" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "1".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "1" rows were processed from the input file. "1" rows were
successfully inserted into the table. "0" rows were rejected.
Number of rows read = 1
Number of rows skipped = 0
Number of rows inserted = 1
Number of rows updated = 0
Number of rows rejected = 0
Number of rows committed = 1
import from 1b.csv of del method p(3,2,1) insert into mytab
SQL3109N The utility is beginning to load data from file "1b.csv".
SQL3110N The utility has completed processing. "1" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "1".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "1" rows were processed from the input file. "1" rows were
successfully inserted into the table. "0" rows were rejected.
Number of rows read = 1
Number of rows skipped = 0
Number of rows inserted = 1
Number of rows updated = 0
Number of rows rejected = 0
Number of rows committed = 1
import from 1c.csv of del method p(2,1,3) insert into mytab
SQL3109N The utility is beginning to load data from file "1c.csv".
SQL3110N The utility has completed processing. "1" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "1".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "1" rows were processed from the input file. "1" rows were
successfully inserted into the table. "0" rows were rejected.
Number of rows read = 1
Number of rows skipped = 0
Number of rows inserted = 1
Number of rows updated = 0
Number of rows rejected = 0
Number of rows committed = 1
select * from mytab
A B C
----------- ----------- -----------
1 2 3
97 98 99
51 55 59
3 record(s) selected.
The second example uses ordered-column targets for the insert to match the column-target order in the CSV file.
create table mynewtab(a integer not null, b integer not null, c integer not null)
DB20000I The SQL command completed successfully.
!cat 1a.csv
1,2,3
!cat 1b.csv
99,98,97
!cat 1c.csv
55,51,59
import from 1a.csv of del insert into mynewtab(a,b,c)
SQL3109N The utility is beginning to load data from file "1a.csv".
SQL3110N The utility has completed processing. "1" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "1".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "1" rows were processed from the input file. "1" rows were
successfully inserted into the table. "0" rows were rejected.
Number of rows read = 1
Number of rows skipped = 0
Number of rows inserted = 1
Number of rows updated = 0
Number of rows rejected = 0
Number of rows committed = 1
import from 1b.csv of del insert into mynewtab(c,b,a)
SQL3109N The utility is beginning to load data from file "1b.csv".
SQL3110N The utility has completed processing. "1" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "1".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "1" rows were processed from the input file. "1" rows were
successfully inserted into the table. "0" rows were rejected.
Number of rows read = 1
Number of rows skipped = 0
Number of rows inserted = 1
Number of rows updated = 0
Number of rows rejected = 0
Number of rows committed = 1
import from 1c.csv of del insert into mynewtab(b,a,c)
SQL3109N The utility is beginning to load data from file "1c.csv".
SQL3110N The utility has completed processing. "1" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "1".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "1" rows were processed from the input file. "1" rows were
successfully inserted into the table. "0" rows were rejected.
Number of rows read = 1
Number of rows skipped = 0
Number of rows inserted = 1
Number of rows updated = 0
Number of rows rejected = 0
Number of rows committed = 1
select * from mynewtab
A B C
----------- ----------- -----------
1 2 3
97 98 99
51 55 59
3 record(s) selected.
Related
I have a table where a record may contain null values in one or more columns. I want to delete these records as long as it contains a null value. I'm wondering if there is any suggested way to do that in DolphinDB?
Try DolphinDB function rowAnd to specify the output conditions.
The following script is for your reference. It outputs rows of data only when the columns meet the set conditions (delete the records if NULL contained):
sym = take(`a`b`c, 110)
id = 1..100 join take(int(),10)
id2 = take(int(),10) join 1..100
t = table(sym, id,id2)
t[each(isValid, t.values()).rowAnd()]
The output can be found in the screenshot:
I am trying to get a result in my report, which I beleive, requires a where clause and did not work for me with the select expert section.
I have 2 tables. Lets call them table 1 and table 2.
Table 1 contains unique records.
Table 2 contains multiple records for the same uniqueKey as table 1.
there are 3 fields in table 2 that play a roll for each uniqueKey from table 1.
QTY_ORD
QTY_SHIPPED
ITEM_CANCEL
Lets assume for item # 1 from table 1, there are 5 records in table 2. Each record has a values for the 3 above mentioned fields. I need to display the SUM of all the records where ITEM_CANCEL = 0 of QTY_SHIPPED - QTY_ORD.
It could be that 3 of the records have ITEM_CANCEL = 1 (We can ignore these records), but for the other 2 reocrds where ITEM_CANCEL = 0, I need the SUM of QTY_SHIPPED - SUM of QTY_ORD.
the current code I have is as follows"
if {current_order1.ITEM_CANCEL} = 0 then
sum({current_order1.QTY_ORD})-sum({current_order1.QTY_SHIPPED}) else
0
but this result gives me the sum of ALL the records, including the ones where ITEM_CANCEL = 1.
If I use ITEM_CANCEL = 0 in the select expert, then it removes ALL the results that have no value in table 2. I even tried the code without using the SUM function, but this provided the result of only 1 of the records in table 2 where ITEM_CANCEL = 0, and not the total difference of the 2 records in table 2 that I require.
Any suggestions on this?
Start with a detail-level formuls (no SUM):
if {current_order1.ITEM_CANCEL} = 0 then {current_order1.QTY_ORD} - {current_order1.QTY_SHIPPED} ELSE 0
Then, SUM that formula at whatever Group or Report levels you require.
I have a new problem with CLPPlus and IMPORT command.
I try to import data into a table which contain a space in its name, but it seems to fail:
SQL> IMPORT FROM '/home/i1058/outfile' INSERT INTO USER1."TABLE 1";
Invalid Syntax Error
SQL> IMPORT FROM '/home/i1058/outfile' INSERT INTO USER1.'TABLE 1';
Invalid Syntax Error
I have tried many things but it always fails.
Of course I have also tried a 'classic' LOAD with CLP and it works perfectly :
db2 'LOAD FROM "outfile" OF DEL MODIFIED BY CODEPAGE=1208 NOCHARDEL INSERT INTO "USER1"."TABLE 1"'
...
Number of rows read = 3
Number of rows skipped = 0
Number of rows loaded = 3
Number of rows rejected = 0
Number of rows deleted = 0
Number of rows committed = 3
Any ideas ?
Thanks and regards
Here's how it works on my 10.5 system:
SQL> create table "TEST TBL" (f1 int);
DB250000I: The command completed successfully.
SQL> IMPORT FROM '/tmp/dat' of del insert into "TEST TBL";
Total number of rows read : 6
Total number of rows skipped : 0
Total number of rows inserted : 6
Total number of rows updated : 0
Total number of rows rejected : 0
Total number of rows committed : 6
DB250000I: The command completed successfully.
SQL> IMPORT FROM '/tmp/dat' insert into "TEST TBL";
Invalid Syntax Error
Looks like documentation has an error in that it does not show the file type option for the CLPPlus import command.
I have a local file movies.dat formatted as movie_id:movie_title:genre. For example:
1:movie1:Comedy
2:movie2:Drama
3:movie3:Horror
...
I create an external table using the following command.
CREATE EXTERNAL TABLE movies(movie_id INT, movie_title String, genre String)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\:' -- need backslash!!
LOCATION '/exc103320/movies_copy'; -- name of the directory to copy the original file
Then, I load the data to the table by
LOAD DATA LOCAL INPATH 'movies.dat' OVERWRITE INTO TABLE movies;
When I run SELECT * FROM movies LIMIT 3;
I see the first 3 rows.
When I run SELECT movie_id FROM movies LIMIT 3; I get the following error
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1420729875693_6595, Tracking URL = http://cshadoop1.utdallas.edu:8088/proxy/application_1420729875693_6595/
Kill Command = /usr/local/hadoop-2.4.1/bin/hadoop job -kill job_1420729875693_6595
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2015-03-29 17:14:54,820 Stage-1 map = 0%, reduce = 0%
Ended Job = job_1420729875693_6595 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://cshadoop1.utdallas.edu:8088/cluster/app/application_1420729875693_6595
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
Any idea why this happens?
I believe you dont need the backlash in the "ROW FORMAT
DELIMITED FIELDS TERMINATED BY" statement.
Try the DDL statement like this and see if it works.
CREATE EXTERNAL TABLE movies(movie_id INT, movie_title String, genre String)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY ':'
LOCATION '/exc103320/movies_copy';
Suppose I fetch valid rows from table where marks_colm = '300' and we get 100 rows
For Each fetched row, I'd like to:
create 3 new rows:
increase max count of sequence_column by +1 set marks ='350'
again increase max count of sequence_column by +1 set marks ='351'
again increase max count of sequence_column by +1 set marks ='352'
copy these three rows to an array ..
insert the whole array into the table
Example
input row:
Name1 ... RollNo31.... sequence5 ... marks300
output should be
3 output rows for each one of the input row above
Name1 ... RollNo31.... sequence6 ... marks350
Name1 ... RollNo31.... sequence7 ... marks351
Name1 ... RollNo31.... sequence8 ... marks352
How can I achieve this?
I believe you can achieve your goal using multi-row insert. Note that, because there may be multiple errors encountered due to your inserting multiple rows, you must use the get diagnostics statement to retrieve details of any errors that may occur, DSNTIAR will be insufficient.