Is there a way to persist HSQLDB data? - persistence

We have all of our unit tests written so that they create and populate tables in HSQL. I want the developers who use this to be able to write queries against this HSQL DB ( 1) by writing queries they can better understand the data model and the ones not as familiar with SQL can play with the data before writing the runtime statements and 2) since they don't have access to the test DB/security reasons). Is there a way to persist the results of the test data so that it may be examine and analyzed with a an sql client?
Right now I am jury rigging it by switching the data source to a different DB (like DB2/mysql, then connecting to that DB on my machine so I can play with persistant data), however it would be easier for me if HSQL supports persisting this than to explain how to do this to every new developer.
Just to be clear, I need an SQL client to interact with persistent data, so debugging and checking memory won't be clean. This has more to do with initial development and not debugging/maintenance/testing.

If you use an HSQLDB Server instance for your tests, the data will survive the test run.
If the server uses a jdbc:hsqldb:mem:aname (all-in-memory) url for its database, then the data will be available while the server is running. Alternatively the server can use a jdbc:hsqldb:file:filepath url and the data is persisted to files.
The latest HSQLDB docs explain the different options. Most of the observations also apply to older (1.8.x) versions. However, the latest version 2.0.1 supports starting a server and creating databases dynamically upon the first connection, which can simplify testing a lot.
http://hsqldb.org/doc/2.0/guide/deployment-chapt.html#N13C3D

Related

Mirror SAP internal data to an external system

We would like to mirror data which is inside SAP to an external database.
Up to now there is a script which exports the data every night.
The customer wants this to happen more often. It should happen every hour.
The export is quite big, and we search for a better way to mirror data which is inside SAP to an external database.
Based on the tag, I assume that your external database is a PostgreSQL database. In this case, I don't think you will really find a pure SAP, database independent solution.
The standard solution for this sort of replication is the SAP SLT Server. It supports taking data out of your SAP system to either a SAP target or a non-SAP target. Currently it supports the following non-SAP targets:
DB2
SAP MaxDB
Microsoft SQL Server
Oracle
Sybase ASE
As you can see, PostgreSQL is not included in there (yet). In conclusion, I see the following possibilities:
Use SLT in combination with some other external DB that is supported.
Use a third party replication tool like for example SymmetricDS.
Depending on your source database, you might be able to use some database specific tools (e.g. SAP HANA Smart Data Integration).
Write some custom code for doing it. In my opinion, you should try to build a sort of log table in this case, to record (using maybe triggers) which rows were inserted / updated / deleted since the last replication. IMO, this should be really a last resort, as database replication is a fairly common topic and you should not reinvent the wheel.

Is it mandatory to run Database Designer for every schema in HP Vertica?

Constantly i have been hitting with Resource pool allocation error after creating several tables in new schema.
After running the Database Designer in vertica for particular schema with all tables the queries are running fine.
Kindly help me to understand the concept.
The Database Designer is optional; you don't have to use it at all. Using it helps you optimize your physical layout, and if you're having trouble with resource-pool allocation it sounds like you might benefit from that.
From the documentation:
The HP Vertica Database Designer:
Analyzes your logical schema, sample data, and, optionally, your sample queries.
Creates a physical schema design (a set of projections) that can be deployed automatically or manually.
Can be used by anyone without specialized database knowledge.
Can be run and rerun any time for additional optimization without stopping the database.
Uses strategies to provide optimal query performance and data compression.
You can run DBD for just a particular query (optimizes whatever's needed to support that query) or for your entire database. It uses sample queries that you provide, so if your usage patterns change over time it can help to rerun it.

Data mining with postgres in production environment - is there a better way?

There is a web application which is running for a years and during its life time the application has gathered a lot of user data. Data is stored in relational DB (postgres). Not all of this data is needed to run application (to do the business). However form time to time business people ask me to provide reports of this data data. And this causes some problems:
sometimes these SQL queries are long running
quires are executed against production DB (not cool)
not so easy to deliver reports on weekly or monthly base
some parts of data is stored in way which is not suitable for such
querying (queries are inefficient)
My idea (note that I am a developer not the data mining specialist) how to improve this whole process of delivering reports is:
create separate DB which regularly is update with production data
optimize how data is stored
create a dashboard to present reports
Question: But is there a better way? Is there another DB which better fits for such data analysis? Or should I look into modern data mining tools?
Thanks!
Do you really do data mining (as in: classification, clustering, anomaly detection), or is "data mining" for you any reporting on the data? In the latter case, all the "modern data mining tools" will disappoint you, because they serve a different purpose.
Have you used the indexing functionality of Postgres well? Your scenario sounds as if selection and aggregation are most of the work, and SQL databases are excellent for this - if well designed.
For example, materialized views and triggers can be used to process data into a scheme more usable for your reporting.
There are a thousand ways to approach this issue but I think that the path of least resistance for you would be postgres replication. Check out this Postgres replication tutorial for a quick, proof-of-concept. (There are many hits when you Google for postgres replication and that link is just one of them.) Here is a link documenting streaming replication from the PostgreSQL site's wiki.
I am suggesting this because it meets all of your criteria and also stays withing the bounds of the technology you're familiar with. The only learning curve would be the replication part.
Replication solves your issue because it would create a second database which would effectively become your "read-only" db which would be updated via the replication process. You would keep the schema the same but your indexing could be altered and reports/dashboards customized. This is the database you would query. Your main database would be your transactional database which serves the users and the replicated database would serve the stakeholders.
This is a wide topic, so please do your diligence and research it. But it's also something that can work for you and can be quickly turned around.
If you really want try Data Mining with PostgreSQL there are some tools which can be used.
The very simple way is KNIME. It is easy to install. It has full featured Data Mining tools. You can access your data directly from database, process and save it back to database.
Hardcore way is MADLib. It installs Data Mining functions in Python and C directly in Postgres so you can mine with SQL queries.
Both projects are stable enough to try it.
For reporting, we use non-transactional (read only) database. We don't care about normalization. If I were you, I would use another database for reporting. I will desing the tables following OLAP principals, (star schema, snow flake), and use an ETL tool to dump the data periodically (may be weekly) to the read only database to start creating reports.
Reports are used for decision support, so they don't have to be in realtime, and usually don't have to be current. In other words it is acceptable to create report up to last week or last month.

How to create multiple instance of sqlite database?

I am making an online app in which when I sync my data web then 25 to 30 local database queries in different tables are executed. So it will take around 25 to 30 sec because all database queries are execute in this manner, first check that data is present or not in local database if present then row is update otherwise insert. Now I want to ask that there are any way through which I can execute these all queries concurrently. If I can do this then I can save my 10 to 15 sec in every sync. So please gave a better solution to execute multiple queries.
Consider using High Performance database management system such as cubeSQL :
SQLabs has announced the release of cubeSQL a fully featured and high
performance relational database management system built on top of the
sqlite database engine. It is the ideal database server for both
developers who want to convert a single user database solution to a
multiuser project and for companies looking for an affordable, easy to
use and easy to maintain database management system. cubeSQL runs on
Windows, Mac, Linux and it can be embedded into any iOS and Cocoa
application.
cubeSQL is incredibly fast, has a small footprint, is highly reliable
and it offers some unique features. It can be easily accessed with any
JSON client, with PHP, with the native C SDK, with a Windows DLL and
with an highly optimized REAL Studio plugin.
It is not possible to run 2 or more than two queries at a single time cause when 1 query runs it locks the DataBase.
If all queries you want to execute that relates to the different table then in that case you can create the Separate Database File for every Table.

PostgreSql or SQL Server 2008 R2 should be use with .Net application using entity framework?

I have a database in PostgreSQL with millions of records and I have to develop a website that will use this database using Entity Framework (using dotnetConnect for PostgreSQL driver in case of PostgreSQL database).
Since SQL Server and .Net are both native to the Windows platform, should I migrate the database from PostgreSQL to SQL Server 2008 R2 for performance reasons?
I have read some blogs comparing the two RDBMS' but I am still confused about which system I should use.
There is no clear answer here, as its subjective, however this is what I would consider:
The overhead of learning a new DBMS and its tools.
The SQL dialects each RDBMS uses and if you are using that dialect currently.
The cost (monetary and time) required to migrate from PostgreSQL to another RDBMS
Do you or your client have an ongoing budget for the new RDBMS? If not, don't make the mistake of developing an application to use a RDBMS that will never see the light of day.
Personally if your current database is working well I wouldn't change. Why fix what isn't broke?
You need to find out if there is actually a problem, and if moving to SQL Server will fix it before doing any application changes.
Start by ignoring the fact you've got .net and using entity framework. Look at the queries that your web application is going to make, and try them directly against the database. See if its returning the information quick enough.
Only if, after you've tuned indexes etc. you can't make the answers come back in a time you're happy with should you decide the database is a problem. At that point it makes sense to try the same tests against a SQL Server database, but don't just assume SQL Server is going to be faster. You might find out that neither can do what you need, and you need to use faster disks or more memory etc.
The mechanism you're using to talk to a database (DotConnect or Microsoft drivers) will likely be a very minor performance consideration, considering the amount of information flowing (SQL statements in one direction and result sets in the other) is going to be almost identical for both technologies.