Can someone point me towards a source for schema definition for the Intergy database? It's a Medical Practice Management database by a company named Vitera (Intergy used to be owned by Sage). The database engine is Progress. Basically, what I'm looking for is the table names, the associated columns, and the Primary/Foreign Keys. I've gone to Vitera, and have been told that this info is proprietary. I've built a simple web app that peeks at the Progress catalog tables, and this has gotten me part of the way. But, it would be nice to get a little more detail. Thanks.
If you can get to the catalog tables you've got what you need. You probably just need a little understanding of the relationships between _file, _field and _index to complete the picture.
_file = table
_field = column
As Tim points out there are no explicit foreign keys and so forth.
Fields are related to tables by a common "recid". I'm not a SQL guy and won't pretend to be one but the 4GL version of the query relating them is:
for each _file no-lock where _tbl-type = "t":
display _file-name.
for each _field no-lock where _field._file-recid = recid( _file ):
display _field-name.
end.
for each _index no-lock where _index._file-recid = recid( _file ):
display _index-name.
for each _index-field no-lock where _index-field._index-recid = recid( _index ):
find _field no-lock where recid( _field ) = _index-field._field-recid.
display _field-name.
end.
end.
end.
It should be fairly obvious how to convert that to SQL.
Progress uses a "Data Admin Tool" to manage the schema, and it'll dump a "df" file for you which'll have the db's schema structure.
It's not a SQL db though, so there's no "primary / foreign key" implemented in the db, so you'll have to infer any relationships between the tables.
Contact the customer that you are supporting and see if they are paying for ODBC access as the schema is something they get with a paid subscription. If not then I am guessing they may be in violation of their contract with their vendor.
Related
I am new SQL and I was wondering if there is any quick way of getting a global "view" of a new database (if for example you are starting to use a database you know nothing about and you want to just get a global idea of how the entire database looks like).
In other words is there a way to :
Maybe get some graphical representation of the database? - a sort of diagram that shows the relation between all tables
Maybe do some sort of query that could return the no. of rows, no. columns (and ideally column names) of each table in the database?
Apologies if this is a really basic question, I am very new to SQL. I am currently using PostgreSQL and PgAdmin4. Thanks
In my case SQL for structured data and considering Lucene for text search.
Yes MSSQL has FullText but Lucene offers some stuff I want.
For the purpose of the question any external search.
In SQL there is a main table with a PK.
In SQL there are a number queries that use the main table and number of other tables.
From the external search I will get list of Main.PK to filter by.
That list could be from 1 to to 1 million.
The external search is the most expensive part of the search. The SQL part is very efficient. Passing the SQL PK to the external is not really a good option as I need various data from the SQL query. The only thing coming back from Lucene is the PK (term) and some times the score.
Is there a best practice?
Options I see are
where Main.PK in (PK values from external search)
populate the external search PK values in a #TEMP and join to that
since some times I need the score this seems best as I can put the
score in the #temp
In an ideal world there would be a join like this:
join exeternalvirtualtable as evt
on evt.PK = Main.PK
and syntax specific to the external search
I get that is asking a lot but is there anything like that in general?
Is there a syntax/API to make an external search look like a table (or view) to MSSQL?
Is there anything like that for MSSQL to Lucene?
This is kind of a start OLE DB Providers and OPENROWSET
Ideally a .NET Framework Data Providers for Lucene that mapped some SQL syntax to Lucene.
The app is .NET in case there is a .NET specific solution.
The product RavenDB combines a structures and unstructured (Lucene) search very fast even if the Lucene return a lot of row so there has to be a way to do this short of putting PK in a #temp.
Is there a syntax/API to make an external search look like a table (or view) to MSSQL?
You can use IndexSearcher class of Lucene, it will give you a TopDocs object that contain the relevant documents (PKs in your case). Then you can populate a SQL table based on this result.
You will need something like this:
TopDocs topDocs = searcher.search(query, MAX_HITS);
for (int i = 0; i < topDocs.scoreDocs.length; i++) {
Document doc = searcher.doc(topDocs.scoreDocs[i].doc);
String pk = doc.get("PK");
// Connection to database and executing insertion
}
I want to view the schema of data which are being stored in kvstore , like what are the keys and their type and also values and their type(as Oracle NoSql is a key-value store). As per my knowledge we can use "show schema " command but it will work only if Avro schema is added in that particular store and second thing is it will give the information of only value names and its type but key name and its type is still a bottleneck.
So is there any utility I can use to view the structure of data like we use "describe" command in oracle SQL ?
You are right that 'kv->show schema' will show you the field names (columns) and its types when you have a Avro schema. When you don't register a schema then database have no knowledge of what your value object looks like. In that case client application maintains the schema of the value field (instead of the database).
About the keys, a) keys are always string type b) you can view them from the datashell prompt if you do something like this "kv-> get kv -keyonly -all".
I would also like to mention that in the upcoming R3 release we will be introducing table data model which will give you much closer experience to relational database (in case of table definitions). You can take a look of a webinar we did on this subject: http://bit.ly/1lPazSZ.
Hope that helps,
Anuj
I have a bit of an "upsert" type of question... but, I want to throw it out there because it's a little bit different than any that I've read on stackoverflow.
Basic problem.
I'm working on moving from mysql to PostgreSQL 9.1.5 (hosted on Heroku). As a part of that, I need to import multiple CSV files everyday. Some of the data is sales information and is almost guaranteed to be new and need to be inserted. But, other parts of the data is almost guaranteed to be the same. For example, the csv files (note plural) will have POS (point of sale) information in them. This rarely changes (and is most likely only via additions). Then there is product information. There are about 10,000 products (vast majority will be unchanged, but it's possible to have both additions and updates).
The final item (but is important), is that I have a requirement to be able to provide an audit trail/information for any given item. For example, if I add a new POS record, I need to be able to trace that back to the file it was found in. If I change a UPC code or description of a product, then I need to be able to trace it back to the import (and file) where the change came from.
Solution that I'm contemplating.
Since the data is provided to me via CSV, then I'm working around the idea that COPY will be the best/fastest way. The structure of the data in the files is not exactly what I have in the database (i.e. final destination). So, I'm copying them into tables in the staging schema that match the CSV (note: one schema per datasource). The tables in the staging schemas will have a before insert row triggers. These triggers can decide what to do with the data (insert, update or ignore).
For the tables that are most likely to contain new data, then it will try to insert first. If the record is already there, then it will return NULL (and stop the insert into the staging table). For tables that rarely change, then it will query the table and see if the record is found. If it is, then I need a way to see if any of the fields are changed. (because remember, I need to show that the record was modified by import x from file y) I obviously can just boiler plate out the code and test each column. But, was looking for something a little more "eloquent" and more maintainable than that.
In a way, what I'm kind of doing is combining a importing system with an audit trail system. So, in researching audit trails, I reviewed the following wiki.postgresql.org article. It seems like the hstore might be a nice way of getting changes (and being able to easily ignore some columns in the table that aren't important - e.g. "last_modified")
I'm about 90% sure it will all work... I've created some testing tables etc and played around with it.
My question?
Is a better, more maintainable way of accomplishing this task of finding the maybe 3 records out of 10K that require a change to the database. I could certainly write a python script (or something else) that reads the file and tries to figure out what to do with each record, but that feels horribly inefficient and will lead to lots of round trips.
A few final things:
I don't have control over the input files. I would love it if they only sent me the deltas, but they don't and it's completely outside of my control or influence.
he system is grow and new data sources are likely to be added that will greatly increase the amount of data being processed (so, I'm trying to keep things efficient)
I know this is not nice, simple SO question (like "how to sort a list in python") but I believe one of the great things about SO is that you can ask hard questions and people will share their thoughts about how they think the best way to solve it is.
I have lots of similar operations. What I do is COPY to temporary staging tables:
CREATE TEMP TABLE target_tmp AS
SELECT * FROM target_tbl LIMIT 0; -- only copy structure, no data
COPY target_tmp FROM '/path/to/target.csv';
For performance, run ANALYZE - temp. tables are not analyzed by autovacuum!
ANALYZE target_tmp;
Also for performance, maybe even create an index or two on the temp table, or add a primary key if the data allows for that.
ALTER TABLE ADD CONSTRAINT target_tmp_pkey PRIMARY KEY(target_id);
You don't need the performance stuff for small imports.
Then use the full scope of SQL commands to digest the new data.
For instance, if the primary key of the target table is target_id ..
Maybe DELETE what isn't there any more?
DELETE FROM target_tbl t
WHERE NOT EXISTS (
SELECT 1 FROM target_tmp t1
WHERE t1.target_id = t.target_id
);
Then UPDATE what's already there:
UPDATE target_tbl t
SET col1 = t1.col1
FROM target_tmp t1
WHERE t.target_id = t1.target_id
To avoid empty UPDATEs, simply add:
...
AND col1 IS DISTINCT FROM t1.col1; -- repeat for relevant columns
Or, if the whole row is relevant:
...
AND t IS DISTINCT FROM t1; -- check the whole row
Then INSERT what's new:
INSERT INTO target_tbl(target_id, col1)
SELECT t1.target_id, t1.col1
FROM target_tmp t1
LEFT JOIN target_tbl t USING (target_id)
WHERE t.target_id IS NULL;
Clean up if your session goes on (temp tables are dropped at end of session automatically):
DROP TABLE target_tmp;
Or use ON COMMIT DROP or similar with CREATE TEMP TABLE.
Code untested, but should work in any modern version of PostgreSQL except for typos.
suppose that I have this RDBM table (Entity-attribute-value_model):
col1: entityID
col2: attributeName
col3: value
and I want to use HBase due to scaling issues.
I know that the only way to access Hbase table is using a primary key (cursor). you can get a cursor for a specific key, and iterate the rows one-by-one .
The issue is, that in my case, I want to be able to iterate on all 3 columns.
for example :
for a given an entityID I want to get all its attriutes and values
for a give attributeName and value I want to all the entitiIDS
...
so one idea I had is to build one Hbase table that will hold the data (table DATA, with entityID as primary index), and 2 "index" tables one with attributeName as a primary key, and the other one with value
each index table will hold a list of pointers (entityIDs) for the DATA table.
Is it a reasonable approach ? or is is an 'abuse' of Hbase concepts ?
In this blog the author say:
HBase allows get operations by primary
key and scans (think: cursor) over row
ranges. (If you have both scale and
need of secondary indexes, don’t worry
- Lucene to the rescue! But that’s another post.)
Do you know how Lucene can help ?
-- Yonatan
Secondary indexes would indeed be useful for many potential applications of HBase, and I believe the developers are in fact looking at it. Checkout http://www.mail-archive.com/hbase-dev#hadoop.apache.org/msg04801.html.
In the mean time though, if your application data storage can be modelled as a star schema (see http://en.wikipedia.org/wiki/Star_schema) you might like to checkout the solution that Hypertable proposes for secondary index-type needs http://markmail.org/message/rphm4q6cbar2ycgp
I recommend having two different flat tables: one for looking up attributes+values given entityID, and one for looking up the entityID given attributes+values.
Table 1 would look like this:
entityID1 {
attribute1: value1;
attribute2: value2;
...
}
and Table 2:
attribute1_value1 {
entityID1;
}
attribute2_value2 {
entityID1;
}