Mainframe COBOL db2 delete program - db2

Detail question is my next question -- please check below link
DB2/Cursor program working in cobol

This should be a 2 step process. First identify the records that are to be deleted and then delete those records in the 2nd step. I think, you'll Not be able to delete the records from the same table you are fetching, that's why 2 steps.
Step 1:
Select Concat(A.ClientId, A.PhoneNumber, A.Timestamp) from MyTable A
Where Concat(A.ClientId, A.PhoneNumber, A.Timestamp)
Not IN (Select Concat(B.ClientId, B.PhoneNumber, Min(B.Timestamp))
from MyTable B
Group by B.ClientId, B.PhoneNumber);
Step 2:
Delete from MyTable
Where Concat(A.ClientId, A.PhoneNumber, A.Timestamp) IN (all the values you got from Step1);
You can run step 1, create a dataset to get all the values and use that dataset in step 2. If you are running dynamic then a cut paste will do, else you'll have to modify the dataset using SORT to create a query out of it. That will be another Step in the middle of step 1 and 2.
.

Related

Can we write and later read the same table with latest data in spark glue in Single run?

1.read table A from SRC
2.perform CDC & write A to TGT
3.read table B from SRC
4.read table A from TGT
5.B_new =join of A & B
6.write B_new to TGT
All the above steps are in a single run.
As first I load data in target table A and later use that target in the join. But while taking join in step 5, table A doesn't have the latest data.
For example if step 2 load a record in target A, but when I read that target at step 4, that record is not loaded. But when glue job ends, that record is found in target table A.
But in the end, all latest data present in A but missing in B due to join not happened correctly

are INTO, FROM an JOIN the only ways to get a table?

I'm currently writing a script which will allow me to input a file (generally .sql) and it'll generate a list of every table that's used in that file. the process is simple as it opened the input file, checks for a substring and if that substring exists outputs the line to the screen.
the substring that being checked is tsql keywords that is indicative of a selected table such as INTO, FROM and JOIN. not being a T-SQL wizard those 3 keywords are the only ones i know of that are used to select a table in a query.
So my question is, in T-SQL are INTO, FROM an JOIN the only ways to get a table? or are these others?
There're many ways to get a table, here're some of them:
DELETE
FROM
INTO
JOIN
MERGE
OBJECT_ID (N'dbo.mytable', N'U') where U is the object type for table.
TABLE, e.g. ALTER TABLE, TRUNCATE TABLE, DROP TABLE
UPDATE
However, by using your script, you'll not only get real tables, but maybe VIEW and temporary table. Here're 2 examples:
-- Example 1
SELECT *
FROM dbo.myview
-- Example 2
WITH tmptable AS
(
SELECT *
FROM mytable
)
SELECT *
FROM tmptable

Redshift Copy and auto-increment does not work

I am using the COPY command from redshift to copy json data from S3.
The table definition is as follows:
CREATE TABLE my_raw
(
id BIGINT IDENTITY(1,1),
...
...
) diststyle even;
The command for copy i am using is as follows:
COPY my_raw FROM 's3://dev-usage/my/2015-01-22/my-usage-1421928858909-15499f6cc977435b96e610298919db26' credentials 'aws_access_key_id=XXXX;aws_secret_access_key=XXX' json 's3://bemole-usage/schemas/json_schema' ;
I am expecting that any new id inserted will always be > select max(id) from my_raw . In fact it's clearly not the case.
If I issue the above copy command twice, the first time the ids start from 1 to N although that file is creating 114 records(that's a known issue with redshift when it has multiple shards). The second time the ids are also between 1 and N but it took free numbers that were not used in the first copy.
See below for a demo:
usagedb=# COPY my_raw FROM 's3://bemole-usage/my/2015-01-22/my-usage-1421930213881-b8afbe07ab34401592841af5f7ddb31c' credentials 'aws_access_key_id=XXXX;aws_secret_access_key=XXXX' json 's3://bemole-usage/schemas/json_schema' COMPUPDATE OFF;
INFO: Load into table 'my_raw' completed, 114 record(s) loaded successfully.
COPY
usagedb=#
usagedb=# select max(id) from my_raw;
max
------
4556
(1 row)
usagedb=# COPY my_raw FROM 's3://bemole-usage/my/2015-01-22/my-usage-1421930213881-b8afbe07ab34401592841af5f7ddb31c' credentials 'aws_access_key_id=XXXX;aws_secret_access_key=XXXX' json 's3://bemole-usage/schemas/my_json_schema' COMPUPDATE OFF;
INFO: Load into table 'my_raw' completed, 114 record(s) loaded successfully.
COPY
usagedb=# select max(id) from my_raw;
max
------
4556
(1 row)
Thx in advance
The only solution i found to make sure have sequential Ids based on the insertion is to maintain a pair of tables. The first one is the stage table in which the items are inserted by the COPY command. The stage table will actually not have an ID column.
Then I have another table that is the exact replica of the stage table except that it has an additional column for the Ids. Then there is a job that takes care of filling the master table from the stage using the ROW_NUMBER() function.
In practice, this means executing the following statement after each Redshift COPY is performed:
insert into master
(id,result_code,ct_timestamp,...)
select
#{startIncrement}+row_number() over(order by ct_timestamp) as id,
result_code,...
from stage;
Then the Ids are guaranteed to be sequential/consecutives in the master table.
I can't reproduce your problem, however it is interesting how you have identity columns set correctly in conjunction with copy. Here a small summary:
Be aware that you can specify the columns (and their order) for a copy command.
COPY my_table (col1, col2, col3) FROM s3://...
So if:
EXPLICIT_IDS flag is NOT set
no columns listed like shown above
and you csv does not contain data for the IDENTITY column
then the identity values in the table will be set automatically in monotonously as we all want it.
doc:
If an IDENTITY column is included in the column list, then EXPLICIT_IDS must also be specified; if an IDENTITY column is omitted, then EXPLICIT_IDS cannot be specified. If no column list is specified, the command behaves as if a complete, in-order column list was specified, with IDENTITY columns omitted if EXPLICIT_IDS was also not specified.

Merge SQL to Exclude Duplicate Records So Merge 2nd time Doesn't Fail

I have three tables and only one that I directly control and am doing a MERGE between them. See my abbreviated but working example here (sqlfiddle example).
I am doing a MERGE between table 1 and Table 2 to Table 3. Table 1 has duplicate data which the MERGE (erroneously) can handle on the first run (insert) but fails with this message on the second run (update).
The MERGE statement attempted to UPDATE or DELETE the same row more
than once.
My question is, can the MERGE be written to either use an EXCEPT such as
SELECT AdFull FROM [dbo].[Users] WHERE AdFull IS NOT NULL
EXCEPT
SELECT AdFull FROM [dbo].[Users]
WHERE AdFull IS NOT NULL
GROUP BY AdFull
HAVING COUNT(*) = 1
or a different Join to only show users that are not duplicated? Or even a way to select a specific one of the duplicates?
Answered Questions
MERGE is a working Insert due to the nature of Fiddle. But due (AFAIK) to the stateless nature of fiddle one never sees the error in Fiddle on a second run, because a merge never happens with the data, only inserts.
Ignore Rows: Actually I would eventually like to use an individual duplicate row via divining of one based on a condition. The actual data table I am dealing with away from the fiddle example has more columns and it would be nice to maybe select a specific row in a duplicate set due to a specific condition.
The example doesn't bare it out, but yes the duplicates are due to the computed AdFull column. Think of a system adding a temp employee, that user gets a row. Then the temp employee gets hired on as fulltime, keeps the ad account but then gets another row in the user table. Yes I know it shouldn't happen. So that is how a duplicate comes about.
(Duplicate values Table 3) Table three is a result table that can be cleaned out for any duplicates to start this process afresh.
In your MERGE statement can you do something similar this?
MERGE INTO [dbo].Table3 AS T3
USING
(
SELECT
AdFull,
MAX(StartedOn)
FROM [dbo].Table2 AS [ad]
GROUP BY AdFull
) AS T2
ON (T2.AdFull = T3.AdFull)
WHEN MATCHED THEN UPDATE blah
WHEN NOT MATCHED THEN INSERT blah
Using the MAX aggregate with a GROUP BY should give you only the information from when the temp was hired on. Then if the AdFull matches you can simply UPDATE Table3 with the most recent information and if there is no match then INSERT a new row.
UPDATE: If I fail to mention that MERGE should be used with caution I will take flak from #AaronBertrand.

SQLite - a smart way to remove and add new objects

I have a table in my database and I want for each row in my table to have an unique id and to have the rows named sequently.
For example: I have 10 rows, each has an id - starting from 0, ending at 9. When I remove a row from a table, lets say - row number 5, there occurs a "hole". And afterwards I add more data, but the "hole" is still there.
It is important for me to know exact number of rows and to have at every row data in order to access my table arbitrarily.
There is a way in sqlite to do it? Or do I have to manually manage removing and adding of data?
Thank you in advance,
Ilya.
It may be worth considering whether you really want to do this. Primary keys usually should not change through the lifetime of the row, and you can always find the total number of rows by running:
SELECT COUNT(*) FROM table_name;
That said, the following trigger should "roll down" every ID number whenever a delete creates a hole:
CREATE TRIGGER sequentialize_ids AFTER DELETE ON table_name FOR EACH ROW
BEGIN
UPDATE table_name SET id=id-1 WHERE id > OLD.id;
END;
I tested this on a sample database and it appears to work as advertised. If you have the following table:
id name
1 First
2 Second
3 Third
4 Fourth
And delete where id=2, afterwards the table will be:
id name
1 First
2 Third
3 Fourth
This trigger can take a long time and has very poor scaling properties (it takes longer for each row you delete and each remaining row in the table). On my computer, deleting 15 rows at the beginning of a 1000 row table took 0.26 seconds, but this will certainly be longer on an iPhone.
I strongly suggest that you re-think your design. In my opinion your asking yourself for troubles in the future (e.g. if you create another table and want to have some relations between the tables).
If you want to know the number of rows just use:
SELECT count(*) FROM table_name;
If you want to access rows in the order of id, just define this field using PRIMARY KEY constraint:
CREATE TABLE test (
id INTEGER PRIMARY KEY,
...
);
and get rows using ORDER BY clause with ASC or DESC:
SELECT * FROM table_name ORDER BY id ASC;
Sqlite creates an index for the primary key field, so this query is fast.
I think that you would be interested in reading about LIMIT and OFFSET clauses.
The best source of information is the SQLite documentation.
If you don't want to take Stephen Jennings's very clever but performance-killing approach, just query a little differently. Instead of:
SELECT * FROM mytable WHERE id = ?
Do:
SELECT * FROM mytable ORDER BY id LIMIT 1 OFFSET ?
Note that OFFSET is zero-based, so you may need to subtract 1 from the variable you're indexing in with.
If you want to reclaim deleted row ids the VACUUM command or pragma may be what you seek,
http://www.sqlite.org/faq.html#q12
http://www.sqlite.org/lang_vacuum.html
http://www.sqlite.org/pragma.html#pragma_auto_vacuum