I am working on the migration of a db2 process, which connects to several remotes servers, exports data into our local db, and then manipulates it (insert computed data, calculated times, etc) I have created some activities in DataConnect to replicate the export data from different datasources and load to local tables. The scripts that handles the data have to be done in DB2 Warechouse on Cloud (ex dashdb)
Currently, this scripts run automatically triggered by the first task (manual) However, having the new processes separated (2 services) it does not allow me to automate it. Furthermore, we have many activities in dataconnect, then it keeps switching between dc and db2...and you have to go from one console to the other.
Does anyone know of a Bluemix service which allow schedule or trigger jobs or events from services? Is there a way to use the API and programmatically do this?
Thanks
Well,
Bluemix offers Workload Scheduler. Data Connect allows to schedule activities.
Juan,
A couple of things come to mind for automation here. Do the databases that you are speaking of have IP line of sight to the Warehouse DB ? If so remote tables may be able to help depending on the source database. Once the data is visible you should be able to write a SQL process that manages the process all from Warehouse DB.
The other possibility is external tables as long as the data is visible on the head node. There are some other choices like s3 storage. I think the concept is if that you can push your data into S3 storage you can pull it into Warehouse DB. I think you should be able to coordinate this all from the Warehouse db side as long as the data is visible through remote tables and/or external tables.
Related
We have our web services and database set up on AWS a while back and application is now in production. For some reason, we need to terminate the old AWS and move everything under a newly created AWS account. Application and all the infrastructure are pretty straightforward. It is trickier for data though. The current database is still receiving lots of data on daily basis. So it is best to migrate the data after we turn off the old application and switch on new platform.
Both source RDS and target RDS are Postgres. We have about 40GB data to transfer. There are three approaches I could think of and they all have drawbacks.
Take a snapshot of the first RDS and restore it in second one. Problem is I don't need to transfer all the data from source to destination. Probably just records after 10/01 is enough. Also snapshot works best to restore in an empty rds that is just created. For our case, the new RDS will start receiving data already after the cutoff. Only after that, the data will be transferred from old account to new account otherwise we will lose data.
Dump data from tables in old RDS and backup in new RDS. This will have the same problem as #1. Also, if I dump data to local machine and then back up from local, the network speed is bottleneck.
Export table data to csv files and import to new RDS. The advantage is this method allows pick and choose and some data cleaning as well. But it takes forever to export a big fact table to local csv file. Another problem is, for some of the tables, I have surrogate row IDs which are serial (auto-incremental). The row IDs of exported csv may conflicting with existing data in new RDS tables.
I wonder if there is a better way to do it. Maybe some ETL tool AWS has which does point to point direct transfer without involving using local computer as the middle point.
In 2022 the simplest way to achieve this task is using AWS Database Migration Services (AWS DMS).
You can create a migration task, and set the original database as the source endpoint, and the new database as a destination endpoint.
Next create a task with "Full load, ongoing replication" settings.
More details here: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html
I have recently moved the data of RDS from one account to other using Bucardo (https://bucardo.org/). Please refer the following blogs
https://www.compose.com/articles/using-bucardo-5-3-to-migrate-a-live-postgresql-database/
https://bucardo.org/pipermail/bucardo-general/2017-February/002875.html
Though this has not mentioned exactly about migration between two RDS account, this could help setting things. We still need some intermediate point such as EC2 instance where we need to configure this Bucardo and migrate the data between accounts. If you are looking for more information, I am happy to help.
In short, we need to take a manual snapshot of the source db and restore it in the another account (https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ShareSnapshot.html) and with Bucardo set up in the EC2 instance, we can start to sync the data using triggers and that will update the data in destination db as and then the new data comes in to the source DB.
I have defined several Activities in IBM Data Connect (on Bluemix) and would like to chain them together, e.g. one for copying a Cloudant DB to dashDB, another for refining the copied data - so forth and so on.
Can this be done? If yes - how.
Data Connect doesn't currently support a way of chaining your activities together. However, you could make use of the current scheduling capabilities to arrange the activities to run in sequence. As the only current trigger mechanism we have is time, their successful operation would require you to leave enough time for each one to execute before the next activity in the chain.
I will find out for you if we have the kind of feature you're after on our roadmap.
Regards,
Wesley -
IBM Bluemix Data Connect Engineering
You can also use the Data Connect API to do the orchestration. See the documentation here https://console.ng.bluemix.net/docs/services/dataworks1/index.html
Regards,
Hernando Borda
IBM Bluemix Data Connect Product Manager
I'm currently researching on how to replicate physical files (store contents) and the meta data (database) of Alfresco. This is of course a safety measure in case of server failure or whatsoever.
Currently i am running Alfresco's Database on PostgreSQL Engine, and by far, have learned PostgreSQL's WAL and Stream replication. Of which i believe, i can use in terms on replicating Alfresco's meta data (database) real time.
The next problem i face now, is as to how i can replicate alfresco's repository/physical files (store contents) in real time ?
i am currently looking at Alfresco's Built-in Replication Job. But as far as i have read, it is "scheduled" and not in real time. And, it needs another instance of Alfresco running on the "SLAVE" Server.
So my question is:
Does Alfresco's Built-in Replication Job cover both the Physical/Repository Files (store contents) and meta data (database) contents ?
or
what is/are the viable ways of replication Alfresco's Physical/Repository Files (store contents) and meta data (database) contents in real time ?
The replication service can be used to replicate objects from one Alfresco server to another at the object level, not the file system and database level. So, of course there are files and database records that are created when an object is replicated, but the those are by-products of the object being created in the replication target.
The replication service is really used to make it easier for objects in a particular path to be read by people in another office. When they read the object they get it locally. When they click "Edit" in Share they will be redirected back to the source Alfresco server.
Long story short, the replication service is in no way, shape, or form, to be used to replicate data for backup or disaster recovery.
If you are running on EC2 or a local filer that supports it, it should be enough to take volume snapshots.
Otherwise, you could use something like rsync scheduled with cron.
But this approach sounds risky. I'm not sure how you will ensure that your database is kept in sync with your file system, which is a requirement for your Alfresco repo to remain consistent.
I'm trying to set up an architecture with 2 databases, say preview and live, that have the exact same schemas. The use case is that edits can be made to the preview database and then pushed to the live database after they are vetted and approved. The production application would read from the live database.
What would be the most appropriate way to push all data from the preview database to the live database without bringing the live database down? Ideally the copy from preview to live would be an atomic transaction.
I've worked with this type of setup in MSSQL, but I'm fairly new to Postgres. So I'm open to hearing other ways to architect this (with Schemas perhaps?).
EDIT: The main reason to use separate databases is that I may need more than 1 target database (not just a single "live" database). I also may need to switch target databases on the fly without altering the source database schema.
I think what you're looking for is a "hot standby". This would be a separate instance of Postgresql, possibly on the same server but usually not, which is a near-real-time replica of the primary server.
In broad strokes, this is done by shipping the binary transaction logs from the primary server to the backup server, and then "replaying" them there. The exact mechanism for transmitting the logs may vary depending on your requirements.
Fortunately, the docs on this are excellent:
https://www.postgresql.org/docs/9.3/static/warm-standby.html
https://www.postgresql.org/docs/9.0/static/hot-standby.html
I want to write a C++ administrative app to simplify management of DBs I am in charge of. Currently, when I want to tell if there are users connected to multiple Firebird databases operated by 2 different instances of it, I have to connect to every single DB and check. That's ok, but I don't want to register every new database that is being created when i don't look, I want some way to list databases that are currently open or otherwise in use by the server. Current 2 uses of this functionality I can think of are:
Auto-inclusion in backup procedure
Application update, which require users to log off (one-look and I would be able to tell whom to kick or at least which department to call)
Firebird does not have an API to list all available databases. Technically Firebird simply doesn't know about the existence of a database until you actually connect to it.
You might be able to find all databases that are being connected to using the Trace API or the monitoring tables, but that does not exclude the possibility that other databases exist on your system.