Sort and Join of GDG file and DB2 table

Sort and Join of GDG file and DB2 table - jcl

I have one GDG file and DB2 table.
I want to compare the value (column 45-52) from GDG file with one of the column in DB2 table.
Is it possible to compare and join the file and table in JCL.
If so, please give me a sample code

Regarding "in JCL" please read this.
In answer to your question, yes it can be done. You must unload the DB2 data into a flat file using whatever utilities are available at your shop, SyncSort, DSNTEP4, etc. Then you can join that flat file with your GDG using your SORT utility (SyncSort JOINKEYS, DFSORT JOINKEYS).

Related

Talend shuffle the order of the columns

I was trying to achieve merging all the rows of a file into columns based on a certain sequence number. This has been achieved by tpivotToColumnDelimited.( this has to be done , cannot be changed ).
But after using that, the column ordering has been changed.
Is there any way of reading a file according to a schema and writing the file according to some other schema in talend ? ( Basically shuffling the column ordering in a file )
I had tried using setting tdynamicschema from input and output but was not able to read and write the data properly.
Any help would be highly appreciated.

I had solved the issue.
Simply added a column which had the index number read from the file and before using the tpivotToColumnDelimited , i had used that column dynamically to sort the results and write to a tmp file and then with the help of tpivotToColumnDelimited , it is now according to the input schema.

Dump subset of records in an OpenEdge database table in the ".d" file format

I am looking for the easiest way to manually dump a subset of records in an OpenEdge database table in the Progress ".d" file format.
The best way I can imagine is creating an extra test database with the identical schema as the source database, and then copying the subset of records over to the test database using FOR EACH and BUFFER-COPY statements. Then just export the data from the test database using the Dump Data and Definitions Table Contens (.d file )... menu option.

That seems like a lot of trouble. If you can identify the subset of records in order to do the BUFFER-COPY than you should also be able to:
OUTPUT TO VALUE( "table.d" ).
FOR EACH table NO-LOCK WHERE someCondition:
EXPORT table.
END.
OUTPUT CLOSE.
Which is, essentially, what the dictionary "dump data" .d file is less a few lines of administrivia at the bottom which can be safely omitted for most purposes.

Redshift - Adding a column, do we have to change our previous CSVs to include it?

I currently have a redshift table in our database that has 10 columns, and I want to add another. It's trivial to do an alter table to do this.
My question - When I do this, will all my old CSV files fail to insert into redshift (via COPY from S3) given they won't have this new column?
I was hoping the columns would just be NULL vs. it failing on import, but I haven't seen any documentation on this.
Ideally I wish I could specify the actual column name in the header row of the CSV, but I haven't seen if that is possible anywhere.

FILLRECORD in COPY command does that: 'Allows data files to be loaded when contiguous columns are missing at the end of some of the records'.

How to select distinct values in a column in Talend

I am importing an excel file in Talend.
I want to select all the distinct values in column "A" and then dump that data into the database. Is it possible to do that with Talend?
If not, what are the alternatives available. Any help is appreciated

Yes you can do that easily with Talend Open Studio.
Create a new job like this one:
You can replace the tOracleOutput component by the component corresponding to your database.
Then parameterize the tAggregateRow component like this :
Distinct values of ColumnA will be transfered to distinctColumnA in the output schema.
You can also get the number of occurences by adding a count of columnB in the operations table.

Using tUniqRow in Talend Open Studio 6.3 works very well and you get to keep all your columns.

Tableau Extract API with multiple tables in a database

I am currently experimenting with Tableau Extract API to generate some TDE from the tables I have in a PostgreSQL database. I was able to write a code to generate the TDE from single table, but I would like to do this for multiple joined tables. To be more specific, if I have two tables that are inner joined by some field, how would I generate the TDE for this?
I can see that if I am working with small number of tables, I could use a SQL query with JOIN clauses to create a one gigantic table, and generate the TDE from that table.
>> SELECT * FROM table_1 INNER JOIN table_2
INTO new_table_1
ON table_1.id_1 = table_2.id_2;
>> SELECT * FROM new_table_1 INNER JOIN TABLE_3
INTO new_table_2
ON new_table_1.id_1 = table_3.id_3
and then generate the TDE from new_table_2.
However, I have some tables that have over 40 different fields, so this could get messy.
Is this even a possibility with current version of the API?

You can read from as many tables or other sources as you want. Or use complex query with lots of joins, or create a view and read from that. Usually, creating a view is helpful when you have a complex query joining many tables.
The data extract API is totally agnostic about how or where you get the data to feed it -- the whole point is to allow you to grab data from unusual sources that don't have pre-built drivers for Tableau.
Since Tableau has a Postgres driver and can read from it directly, you don't need to write a program with the data extract API at all. You can define your extract with Tableau Desktop. If you need to schedule automated refreshes of the extract, you can use Tableau Server or its tabcmd command.

Many thanks for your replies. I am aware that I could use Tableau Desktop to define my extract. In fact, I have done this many times before. I am just trying to create the extracts using the API, because I need to create some calculated fields, which is near impossible to create using the Tableau Desktop.
At this point, I am hesitant to use JOINs in the SQL query because the resulting table would look too complicated to comprehend (some of these tables also have same field names).
When you say that I could read from multiple tables or sources, does that mean with the Tableau Extract API? At this point, I cannot find anywhere in this API that accommodates multiple sources. For example, I know that when I use multiple tables in the Tableau Desktop, there are icons on the left hand side that tells me that the extract is composed of multiple tables. This just doesn't seem to be happening with the API, which leaves me stranded. Anyways, thank you again for your replies.

Going back to the topic, this is something that I tried few days ago on my python code
try:
tdefile= tde.Extract("extract.tde")
except:
os.remove("extract.tde")
tdefile = tde.Extract("extract.tde")
tableDef = tde.TableDefinition()
# Read each column in table and set the column data types using tableDef.addColumn
# Some code goes here...
for eachTable in tableNames:
tableAdd = tdeFile.addTable(eachTable, tableDef)
# Use SQL query to retrieve bunch_of_rows from eachTable
for some_row in bunch_of_rows:
# Read each row in table, and set the values in each column position of each row
# Some code goes here...
tableAdd.insert(some_row)
some_row.close()
tdefile.close()
When I execute this code, I get the error that eachTable has to be called "Extract".
Of course, this code has its flaws, as there is no where in this code that tells how each table are being joined.
So I am little thrown off here, because it doesn't seem like I can use multiple tables unless I use JOINs to generate one table that contains everything.