I'm completely new to talend and currently I need to build a job that reads params values like
Database Name
Server Name
Host Name
Password
username
From a property file and passed those values in a tPostgresqlConnection Component.
This is what I have try so far
tFileInputProperties -------> tContextLoad ------> tPostgresqlConnection
Talend Job Image:
Problem is the job is taking only one database connection instead of looping for all databases connection that I have defined in the file properties.
Can someone advice how i can create a Job that will loop params values on the tPostgresqlConnection component?
I suggest you create a properties file per database connection, and in each file you define your connection parameters :
server=myServer
port=myPort
user=myUser
password=myPassword
...
(Make sur your parameters have the same keys across files)
Then you can do something like this :
tFileList -- Iterate -- tFileInputProperties -- Main -- tContextLoad -- OnComponentOk -- tRunJob
the tContextLoad component will populate the context parameters (which need to exist in your main job and child job).
In the tRunJob, you encapsulate your job logic : connect to your database using your connection parameters (server, port, user..etc) passed down from the parent job, do what you need to do, and close your connection at the end of your job.
Make sure to check the option "Transmit whole context" in your tRunJob component settings.
Of course you can do it all in a single job using OnSubjobOk links, but I find this more readable (and easily maintainable)
Related
Problem: I use a different database for every customer (each one has a separate sub-domain) extracted from the domain name. e.g. customer1.abc.com then a database name, "db_customer1" is used for PostgreSQL DB connections (e.g In line 31 of https://github.com/pawank/zio-playground/blob/master/src/main/scala/dev/xymox/zio/playground/zhttp/ItemEndpoints.scala new db information is to be used).
zio-http is to be used for the implementation (e.g https://github.com/pawank/zio-playground/blob/master/src/main/scala/dev/xymox/zio/playground/zhttp/ItemServer.scala).
How do I either pass a database name in every repository function which can be used by db library or provide zlayer dynamically with new db name other than what is taken from the config file. (In the example code at line 21 of https://github.com/pawank/zio-playground/blob/master/src/main/scala/dev/xymox/zio/playground/zhttp/ItemServer.scala the data source ZioQuillContext.dataSourceLayer is statically set.
Any suggestion to provide the layer dependency within HTTP methods?
Steps to run minimal reproducible example:
$sbt
$run
(Choose no 40 for example)
git#github.com:pawank/zio-playground.git
I am quite new to JMeter, so I am looking for the best approach to do this: I want to get a list of messageID's from Database1 and then check whether these messageID values will be found in Database2 and then check the ErrorMessage column for these ID's against what I expect.
I have the JDBC Request working for extracting the list of messageID's from Database1. JMeter returns the list to me, but now I'm stuck. I am not sure how to handle the variable names and result variable names field in the JDBC Request and use this in the next throughput controller loop for the JDBC Request for Database2.
My JDBC request looks like this (PostgreSQL):
SELECT messageID FROM database1
ORDER BY created DESC
FETCH FIRST 20 ROWS ONLY
Variable names: messageid
Result variable names: resultDB1
Then I use the BeanShell Assertion to see whether the connection to the database is present, or whether the response is empty.
But now, I have to connect to a different database, so I need to make a new throughput controller with a new JDBC configuration, Request, etc in there, but I don't know how to pass on the messageid list to this new request.
What I thought about was writing the list of results from Database1 into a file and then read the values from that file for Database2, but that seems like unnecessarily complicated to me, like there should be a solution in JMeter already for that. Also, I am running my JMeter tests on a remote linux server, so I don't want to make it more complicated by making new files and saving them somewhere.
You can convert your resultDB1 into a JMeter Property like:
props.put("resultDB1", vars.getObject("resultDB1"));
As per JMeter Documentation:
Properties are not the same as variables. Variables are local to a thread; properties are common to all threads
So basically JMeter Properties is a subset of Java Properties which are global for the whole JVM
Once done you will be able to access the value in other Thread Groups like:
ArrayList resultDB1 = (ArrayList)props.get("resultDB1");
ArrayList resultDB2 = (ArrayList)vars.getObject("resultDB2");
//your code to compare 2 result sets here
Also be aware that since JMeter 3.1 you should be using JSR223 Test Elements and Groovy language for scripting so consider migrating to JSR223 Assertion on next available opportunity.
We are deploying Tableau for a bank.
We had created 6 test dashboards using dummy data on a staging data base using sql connection and lets say has an ip 10.10.10.10.
Now we need to use the same view we had used with the dummy data on Live data but using a different connection which is again an sql engine & IP lets say as 20.20.20.20. All the variable names and other properties are the same, not difference is that the Live data would not have calculated fields which we can deploy on the Live environment.
The challenge is: the LIVE data being of a bank is highly confidential and cannot be used from outside operations site rather we need to deploy it from an ODC [restricted environment]. Hence we simply cannot do a replace data source.
Hence we are planning to move twbx files and data extracts for each of these views using a shared folder to the ODC. Then the process would be like below:
As the LIVE sql data base is different from the dummy sql we will get error
We will select edit data connection
Will select tableau data extract for each sheet and dashboard
Will then select the option of replace data source and select LIVE SQL database
Will extract the new data
The visualization should work fine
Earlier we had just moved TWBX files hence it failed. Is there a different approach to it.
I did something similar to it
For that, you must have
same schema as of Live database and dummy database
do not change name of any source table or column
create your viz
send it in the .tbw form which is editable HTML format
Now the hard part- open your tbw in notepad and replace all connection details to new one
save and open in the tableau
tell me if it didn't worked
One method would be to modify your hosts file on your local computer, pointing the production server name the staging instance of the database. For example, let's say your production database is prod.url.com and you have a reporting staging db server instance called reportstage.otherurl.com
Open your hosts file. Add an entry for prod.url.com. Point it to reportstage.otherurl.com
Develop the report in Desktop, with the db connection string to prod.url.com.
When you publish the twb file to Server, no connection string changes are needed.
Another easier way is to publish the twb to Server with your staging connection string but edit the connection string in the data source in Server.
Develop the twb file on your local computer against your staging database.
Publish the twb file to Server.
Go to the workbook on Server and instead of looking at the views, click on Data Sources.
Edit the data source(s) connection information. This allows you to edit the server name, port, username, or password.
I've used this second method quite a bit. We have an environment where we can't hit the production db outside of the data center. Our staging environment doesn't have that restriction. We develop against the stage db, deploy, and edit the server name in the data source.
I have a job that inserts data from sql server to mysql. I have set the project settings as -
Have checked the check box for - Use statistics(tStatCatcher), Use logs (tLogcatcher), Use volumentrics (tflowmetercatcher)
Have selected 'On Databases'. And put in the table names
(stats_table,logs_table,flowmeter_table) as well. These tables were created before. The schema of these tables were determined using tcreatetable component.
The problem is when I run the job, data is inserted in the stats_table but not in flowmeter_table
My job is as follows
tmssInput -->tmap --> tmysqoutput.
I have not included tstatcatcher,tlogcatcher,tflowmetercatcher. The stats and logs for this job are taken from the project settings.
My question - Why is there no data entered in flowmeter_table? Should I include tStatCatcher , tlogcatcher and tflowmetercatcher explicitly in the job for it to run fine?
I am using TOS
Thanks in advance
Rathi
Using flow meter requires you to manually configure the flows you want to monitor.
On every flow you want to monitor, right-click on the row >parameters>advanced settings>Monitor connection.
Then you should be able to see data in your flow table.
If you are using the project settings , you don't need to add the *Catcher component on your job.
You need to use tstatcatcher,tlogcatcher,tflowmetercatcher composant in the job directly.
The composant have already their schema defined so you jusneed to put a tmap and redirect in the table you want like :
Moreover in order tu use the tlog catcher you need to put some tdie or twarn in your job.
I've just started using the luigi library. I am regularly scraping a website and inserting any new records into a Postgres database. As I'm trying to rewrite parts of my scripts to use luigi, it's not clear to me how the "marker table" is supposed to be used.
Workflow:
Scrape data
Query DB to check if new data differs from old data.
If so, store the new data in the same table.
However, using luigi's postgres.CopyToTable, if the table already exists, no new data will be inserted. I guess I should be using the inserted column in the table_updates table to figure out what new data should be inserted, but it's unclear to me what that process looks like and I can't find any clear examples online.
You don't have to worry about marker table much: it's an internal table luigi uses to track which task has already been successfully executed. In order to do so, luigi uses the update_id property of your task. If you didn't declared one, then luigi will use the task_id as shown here. That task_id is a concatenation of the task family name and the first three parameters of your task.
The key here is to overwrite the update_id property of your task and return a custom string that you'll know will be unique for each run of your task. Usually you should use the significant parameters of your task, something like:
#property
def update_id(self):
return ":".join(self.param1, self.param2, self.param3)
By significant I mean parameters that change the output of your task. I imagine parameters like website url o id, and scraping date. Parameters like the hostname, port, username or password of your database will be the same for any of these tasks so they shouldn't be considered significant.
Notice that without having details about your tables and the data you're trying to save its pretty hard to say how you must build that update_id string, so please be careful.