Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Our Raw Data is in SQL SERVER,Data keep on growing,Growth rate is high,We need to load the data incrementally to Redshift for Analytics.Can you please point out me a good practice to load the data.How feasible with SSIS to load directly to Redshift (with out S3).
It's going to be impractical to load from SQL Server to Redshift without landing the data on S3. You can try loading via SSH but, of course, SSH on Windows is not well supported.
http://docs.aws.amazon.com/redshift/latest/dg/loading-data-from-remote-hosts.html
I did a presentation a while back on what we discovered when migrating from SQL Server to Redshift. One of the things we discovered was that SSIS was not very useful for interacting with AWS services.
http://blog.joeharris76.com/2013/09/migrating-from-sql-server-to-redshift.html
Finally, you could look into some of the commercial "replication" tools that automate the process of incrementally updating Redshift from an on-premise database. I hear good things about Attunity.
http://www.attunity.com/products/attunity-cloudbeam/amazon-redshift
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I have been successfully used the Postgresql Debezium plugin for Kafka connect. This connector hooks up directly to the relational database's Write Ahead Log(WAL) which vastly improves performance, compared to the normal JDBC connector, which continuously polls the database via an sql query.
Is something similar possible with Redshift as the source, instead of Postgresql? I know there are major differences between Redshift and Postgresql, in that Redshift is columnar based, cluster-based, doesn't have secondary indexes, and it has different use cases. I could not find definitive information if Redshift has anything similar with a write ahead log or it uses a completely different approach.
Is there a write-ahead-log based approach to stream data changes from a redshift table directly into kafka, via debezium or some other way, or its not technically possible? If not, what about some alternative which achieves the same thing?
To answer your question in one line - No, it's not supported, and Im sure that AWS(or any modern DW) will never even think of enabling this feature.
Here are two strong reasons from my point of view:
RedShift itself getting the data from a different database(like your Postgres) and the main purpose is Read not write(so less concurrent writes).
For analytical purposes we bring all the data into a DW. From there it'll go to BI tool or any ML related works. But I never have seen any place where DW data will go to another database in Real or near realtime.
(You might already know this option)If you still need to do this, then you are getting the data from some sources, right? use that same source to send the data where you want to use from RedShift CDC.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
I am not very exerienced in Posgresql and I did not find a solution to this question yet.
I have data in a database stored in a cloud space (onedrive). I can manage them by my Windows 10 installation of Postgres 12. I have also a laptop with linux (manjaro) installed and Postgresql server.
I would like to understand if I can access to the same cloud data with both servers. Consider that since I am a single user, data access is never concurrent. I use only one server per time.
I read it is possible to share the same data in:
https://www.postgresql.org/docs/12/different-replication-solutions.html
Howverer, I cannot find a detailed procedure. Any suggestion ? Probably there is a better solution ? Thanks
Your question is pretty unclear; I'll try my best to straighten you out.
First, a database doesn't access data, it holds data. So your desire to have two database servers access the same data does not make a lot of sense.
There are three options that come to mind:
You want to hold only one copy of the data and want to access it from different points: in that case you don't need a server running on your laptop, only a database client that accesses the database in the cloud.
This is the normal way to operate.
You want a database on your laptop to hold a cooy of the data from the other database, but you want to be able to modify the copy, so that the two database will diverge: use pg_dump to export the definitions and data from the cloud database and restore the dump locally.
You want the database on the laptop to be and remain a faithful copy of the cloud database: if your cloud provider allows replication connections, set up your laptop database as a streaming replication standby of the cloud database.
The consequence is that you only have read access to the local cooy of the database.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I just started learning about IoT and data streaming. Apologies if this question seems too obvious or generic.
I am working on a school project, which involves streaming data from hundreds (maybe thousands) of Iot sensors, storing said data on a database, then retrieving that data for display on a web-based UI.
Things to note are:
fault-tolerance and the ability to accept incomplete data entries
the database has to have the ability to load and query data by stream
I've looked around on Google for some ideas on how to build an architecture that can support these requirements. Here's what I have in mind:
Sensor data is collected by FluentD and converted into a stream
Apache Spark manages a cluster of MongoDB servers
a. the MongoDB servers are connected to the same storage
b. Spark will handle fault-tolerance and load balancing between MongoDB servers
BigQuery will be used for handling queries from UI/web application.
My current idea of a IoT streaming architecture :
The question now is whether this architecture is feasible, or whether it would work at all. I'm open to any ideas and suggestions.
Thanks in advance!
Note that you could stream your device data directly into BigQuery and avoid an intermediate buffering step.
See:
https://cloud.google.com/bigquery/streaming-data-into-bigquery
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am considering using mongo db (it could be postgresql or any other ) as a data warehouse, my concern is that up to twenty or more users could be running queries at a time and this could have serious implications in terms of performance.
My question is what is the best approach to handle this in a cloud based and non cloud based environment? Do cloud based db's automatically handle this? If so would the data be consistent through all instances if a refresh on the data was made? In a non cloud based environment would the best approach be to load balance all instances? Again how would you ensure data integrity for all instances?
thanks in advance
I think auto sharding is what I am looking for
http://docs.mongodb.org/v2.6/MongoDB-sharding-guide.pdf
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I have built an iPhone app that needs to pull data from a server. First I need to figure out what kind of server I will need. I'm a bit familiar with MySQL but was wondering if anyone had better suggestions on a backend. My app will have two tables that need to be populated by data residing on a server. Also, a user can submit data to this database which in turns populates the previously mentioned tables.
If you have any tips or tutorials, code snippets, etc. it would be really appreciated!
*Edit: I should mention this IS a remote database
Well if it's a remote server your thinking about then you should be looking into implementing some kind of service architecture like Web Services over SOAP or REST.
If the data stays on the iPhone than by all means use SQLite as it's fast and lightweight.
I would recommend starting with SQLite and local data first. Then when you have your main program logic, flow and UI complete, replacing the SQLite with Web Service calls.
Your database sounds really simple with just two tables. Perhaps you don't even need a SQL database. Perhaps a Berkeley DB would be sufficient. It has atomic commit, transactions, rollback etc. However you store data as key-value pairs.
However I am not sure if you can use it as a server. You could consider accessing the data through something like remote objects. This is done very easily in Cocoa. You could build a Berkeley DB on your sever that hosts the data as distributed objects that your iPhone app connects to.