I have a new problem with CLPPlus and IMPORT command.
I try to import data into a table which contain a space in its name, but it seems to fail:
SQL> IMPORT FROM '/home/i1058/outfile' INSERT INTO USER1."TABLE 1";
Invalid Syntax Error
SQL> IMPORT FROM '/home/i1058/outfile' INSERT INTO USER1.'TABLE 1';
Invalid Syntax Error
I have tried many things but it always fails.
Of course I have also tried a 'classic' LOAD with CLP and it works perfectly :
db2 'LOAD FROM "outfile" OF DEL MODIFIED BY CODEPAGE=1208 NOCHARDEL INSERT INTO "USER1"."TABLE 1"'
...
Number of rows read = 3
Number of rows skipped = 0
Number of rows loaded = 3
Number of rows rejected = 0
Number of rows deleted = 0
Number of rows committed = 3
Any ideas ?
Thanks and regards
Here's how it works on my 10.5 system:
SQL> create table "TEST TBL" (f1 int);
DB250000I: The command completed successfully.
SQL> IMPORT FROM '/tmp/dat' of del insert into "TEST TBL";
Total number of rows read : 6
Total number of rows skipped : 0
Total number of rows inserted : 6
Total number of rows updated : 0
Total number of rows rejected : 0
Total number of rows committed : 6
DB250000I: The command completed successfully.
SQL> IMPORT FROM '/tmp/dat' insert into "TEST TBL";
Invalid Syntax Error
Looks like documentation has an error in that it does not show the file type option for the CLPPlus import command.
Related
I have thousands of lines of duplicate data in PostgreSQL database. To find out which row are duplicated, I am using this code:
SELECT "Date" FROM stockdata
group by "Date"
having count("Date")>1
This has produced again thousands of lines of date column which have more then 1 entry. How can I remove the row with the date so that just 1 entry of the duplicated item remains.
P.S I cannot use a Primary Key when entering data.
Update
As per the comment. There is no primary key. Also the Date is unique thus there cannot be 2 or more of it.
df look like this:
Date High Low Open Close Volume Adj Close
0 2017-04-03 893.489990 885.419983 888.000000 891.510010 3422300 891.510010
1 2017-04-04 908.539978 890.280029 891.500000 906.830017 4984700 906.830017
2 2017-04-05 923.719971 905.619995 910.820007 909.280029 7508400 909.280029
3 2017-04-06 917.190002 894.489990 913.799988 898.280029 6344100 898.280029
4 2017-04-07 900.090027 889.309998 899.650024 894.880005 3710900 894.880005
... ... ... ... ... ... ... ...
12595 2022-03-28 1097.880005 1053.599976 1065.099976 1091.839966 34168700 1091.839966
12596 2022-03-29 1114.770020 1073.109985 1107.989990 1099.569946 24538300 1099.569946
12597 2022-03-30 1113.949951 1084.000000 1091.170044 1093.989990 19955000 1093.989990
12598 2022-03-31 1103.140015 1076.640015 1094.569946 1077.599976 16265600 1077.599976
12599 2022-04-01 1094.750000 1066.640015 1081.150024 1076.352783 11449987 1076.352783
12600 rows × 7 columns
The data is repeated a few times at places.
However the rows with the same date with have the same data.
This data is not a stock data (i am using it as a troubleshoot example) but from yokogawa datalogger. https://www.yokogawa.com/in/solutions/products-platforms/data-acquisition/data-logger/#Overview
There are redundancies in the system and the earlier integrator had just dumped all the data on 1 database and thus if redundant logger comes online, the database has multiple entries. I need to remove it so we can actually use the data. I don't have access to their software.
Further Update:
Using this code as suggested in the comments:
delete from stockdata s
using
(SELECT "Date" , max(ctid) as max_ctid from stockdata group by "Date") t
where s.ctid<>t.max_ctid
and s."Date"=t."Date";
It was able to do the job but going forward, is this dangerous solution for production?
This should do the trick:
DELETE FROM
stockdata a
USING stockdata b
WHERE
a.id < b.id
AND a.Date = b.Date;
But be careful, this will immediately delete all duplicates. There is no way to restore them.
I am trying to get a result in my report, which I beleive, requires a where clause and did not work for me with the select expert section.
I have 2 tables. Lets call them table 1 and table 2.
Table 1 contains unique records.
Table 2 contains multiple records for the same uniqueKey as table 1.
there are 3 fields in table 2 that play a roll for each uniqueKey from table 1.
QTY_ORD
QTY_SHIPPED
ITEM_CANCEL
Lets assume for item # 1 from table 1, there are 5 records in table 2. Each record has a values for the 3 above mentioned fields. I need to display the SUM of all the records where ITEM_CANCEL = 0 of QTY_SHIPPED - QTY_ORD.
It could be that 3 of the records have ITEM_CANCEL = 1 (We can ignore these records), but for the other 2 reocrds where ITEM_CANCEL = 0, I need the SUM of QTY_SHIPPED - SUM of QTY_ORD.
the current code I have is as follows"
if {current_order1.ITEM_CANCEL} = 0 then
sum({current_order1.QTY_ORD})-sum({current_order1.QTY_SHIPPED}) else
0
but this result gives me the sum of ALL the records, including the ones where ITEM_CANCEL = 1.
If I use ITEM_CANCEL = 0 in the select expert, then it removes ALL the results that have no value in table 2. I even tried the code without using the SUM function, but this provided the result of only 1 of the records in table 2 where ITEM_CANCEL = 0, and not the total difference of the 2 records in table 2 that I require.
Any suggestions on this?
Start with a detail-level formuls (no SUM):
if {current_order1.ITEM_CANCEL} = 0 then {current_order1.QTY_ORD} - {current_order1.QTY_SHIPPED} ELSE 0
Then, SUM that formula at whatever Group or Report levels you require.
I'm trying to insert data with HiveContext like this:
/* table filedata
CREATE TABLE `filedata`(
`host_id` string,
`reportbatch` string,
`url` string,
`datatype` string,
`data` string,
`created_at` string,
`if_del` boolean)
*/
hiveContext.sql("insert into filedata (host_id, data) values (\"a1e1\", \"welcome\")")
Error and try to use "select":
hiveContext.sql("select \"a1e1\" as host_id, \"welcome\"as data").write.mode("append").saveAsTable("filedata")
/*
stack trace
java.lang.ArrayIndexOutOfBoundsException: 2
*/
It needs to all columns like this:
hc.sql("select \"a1e1\" as host_id,
\"xx\" as reportbatch,
\"xx\" as url,
\"xx\" as datatype,
\"welcome\" as data,
\"2017\" as created_at,
1 as if_del").write.mode("append").saveAsTable("filedata")
Is there a way to insert specified columns? For example, only insert columns "host_id" and "data".
As far as i Know , Hive does not support the insertion of values into only some columns
From the documentation
Each row listed in the VALUES clause is inserted into table tablename.
Values must be provided for every column in the table. The standard
SQL syntax that allows the user to insert values into only some
columns is not yet supported. To mimic the standard SQL, nulls can be
provided for columns the user does not wish to assign a value to.
So you should try this:
val data = sqlc.sql("select 'a1e1', null, null, null, 'welcome', null, null, null")
data.write.mode("append").insertInto("filedata")
Reference here
You can do it if you are using row columnar file format such as ORC. Please see the working example below. This example is in Hive but will work very well with HiveContext.
hive> use default;
OK
Time taken: 1.735 seconds
hive> create table test_insert (a string, b string, c string, d int) stored as orc;
OK
Time taken: 0.132 seconds
hive> insert into test_insert (a,c) values('x','y');
Query ID = user_20171219190337_b293c372-5225-4084-94a1-dec1df9e930d
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1507021764560_1375895)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 4.06 s
--------------------------------------------------------------------------------
Loading data to table default.test_insert
Table default.test_insert stats: [numFiles=1, numRows=1, totalSize=417, rawDataSize=254]
OK
Time taken: 6.828 seconds
hive> select * from test_insert;
OK
x NULL y NULL
Time taken: 0.142 seconds, Fetched: 1 row(s)
hive>
I have several CSV files with the same columns, however, the columns are in different order.
I would like to merge, via "import", all these CSV files.
Please could you help with this import statement? How can I make this import statement match the column order?
With Db2 on Unix/Windows, You can use the IMPORT command or the LOAD command . Additionally other ways are possible with the INGEST command.
With IMPORT or LOAD, There are two ways you can do it, either use the "METHOD P" or specify the order of target-columns on the INSERT clause There are two examples below.
The first example uses "Method P" for import:
there are three CSV files whose three columns are in different order, and a target table with three columns (a,b,c):
create table mytab(a integer not null, b integer not null, c integer not null)
DB20000I The SQL command completed successfully.
!cat 1a.csv
1,2,3
!cat 1b.csv
99,98,97
!cat 1c.csv
55,51,59
import from 1a.csv of del method p(1,2,3) insert into mytab
SQL3109N The utility is beginning to load data from file "1a.csv".
SQL3110N The utility has completed processing. "1" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "1".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "1" rows were processed from the input file. "1" rows were
successfully inserted into the table. "0" rows were rejected.
Number of rows read = 1
Number of rows skipped = 0
Number of rows inserted = 1
Number of rows updated = 0
Number of rows rejected = 0
Number of rows committed = 1
import from 1b.csv of del method p(3,2,1) insert into mytab
SQL3109N The utility is beginning to load data from file "1b.csv".
SQL3110N The utility has completed processing. "1" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "1".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "1" rows were processed from the input file. "1" rows were
successfully inserted into the table. "0" rows were rejected.
Number of rows read = 1
Number of rows skipped = 0
Number of rows inserted = 1
Number of rows updated = 0
Number of rows rejected = 0
Number of rows committed = 1
import from 1c.csv of del method p(2,1,3) insert into mytab
SQL3109N The utility is beginning to load data from file "1c.csv".
SQL3110N The utility has completed processing. "1" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "1".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "1" rows were processed from the input file. "1" rows were
successfully inserted into the table. "0" rows were rejected.
Number of rows read = 1
Number of rows skipped = 0
Number of rows inserted = 1
Number of rows updated = 0
Number of rows rejected = 0
Number of rows committed = 1
select * from mytab
A B C
----------- ----------- -----------
1 2 3
97 98 99
51 55 59
3 record(s) selected.
The second example uses ordered-column targets for the insert to match the column-target order in the CSV file.
create table mynewtab(a integer not null, b integer not null, c integer not null)
DB20000I The SQL command completed successfully.
!cat 1a.csv
1,2,3
!cat 1b.csv
99,98,97
!cat 1c.csv
55,51,59
import from 1a.csv of del insert into mynewtab(a,b,c)
SQL3109N The utility is beginning to load data from file "1a.csv".
SQL3110N The utility has completed processing. "1" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "1".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "1" rows were processed from the input file. "1" rows were
successfully inserted into the table. "0" rows were rejected.
Number of rows read = 1
Number of rows skipped = 0
Number of rows inserted = 1
Number of rows updated = 0
Number of rows rejected = 0
Number of rows committed = 1
import from 1b.csv of del insert into mynewtab(c,b,a)
SQL3109N The utility is beginning to load data from file "1b.csv".
SQL3110N The utility has completed processing. "1" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "1".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "1" rows were processed from the input file. "1" rows were
successfully inserted into the table. "0" rows were rejected.
Number of rows read = 1
Number of rows skipped = 0
Number of rows inserted = 1
Number of rows updated = 0
Number of rows rejected = 0
Number of rows committed = 1
import from 1c.csv of del insert into mynewtab(b,a,c)
SQL3109N The utility is beginning to load data from file "1c.csv".
SQL3110N The utility has completed processing. "1" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "1".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "1" rows were processed from the input file. "1" rows were
successfully inserted into the table. "0" rows were rejected.
Number of rows read = 1
Number of rows skipped = 0
Number of rows inserted = 1
Number of rows updated = 0
Number of rows rejected = 0
Number of rows committed = 1
select * from mynewtab
A B C
----------- ----------- -----------
1 2 3
97 98 99
51 55 59
3 record(s) selected.
I have a local file movies.dat formatted as movie_id:movie_title:genre. For example:
1:movie1:Comedy
2:movie2:Drama
3:movie3:Horror
...
I create an external table using the following command.
CREATE EXTERNAL TABLE movies(movie_id INT, movie_title String, genre String)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\:' -- need backslash!!
LOCATION '/exc103320/movies_copy'; -- name of the directory to copy the original file
Then, I load the data to the table by
LOAD DATA LOCAL INPATH 'movies.dat' OVERWRITE INTO TABLE movies;
When I run SELECT * FROM movies LIMIT 3;
I see the first 3 rows.
When I run SELECT movie_id FROM movies LIMIT 3; I get the following error
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1420729875693_6595, Tracking URL = http://cshadoop1.utdallas.edu:8088/proxy/application_1420729875693_6595/
Kill Command = /usr/local/hadoop-2.4.1/bin/hadoop job -kill job_1420729875693_6595
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2015-03-29 17:14:54,820 Stage-1 map = 0%, reduce = 0%
Ended Job = job_1420729875693_6595 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://cshadoop1.utdallas.edu:8088/cluster/app/application_1420729875693_6595
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
Any idea why this happens?
I believe you dont need the backlash in the "ROW FORMAT
DELIMITED FIELDS TERMINATED BY" statement.
Try the DDL statement like this and see if it works.
CREATE EXTERNAL TABLE movies(movie_id INT, movie_title String, genre String)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY ':'
LOCATION '/exc103320/movies_copy';