Is there a convenient, open-source method to generate a SAS XPORT Transport Format (xpt) file from a postgreSQL database for FDA submission?
I have checked the FDA specifications, available at http://www.fda.gov/downloads/ForIndustry/DataStandards/StudyDataStandards/UCM312964.pdf
These state that 'SAS XPORT transport files can be converted to various other formats using commercially available off the shelf software', but no software packages other than SAS are suggested.
The specifications for an SAS XPORT file are available at http://support.sas.com/techsup/technote/ts140.html
I have checked OpenClinica (which is the EDC software we are using), PGAdmin3 and AM (which can import .xpt files, but I didn't find an export method)
Easy way? Not that I know of. I think one way or another it will take some development work.
My recommendation is to do it as follows:
Write a user-defined function/stored procedure for pulling the data you need for each section.
Write a user-defined function to pull this data from each section and arrange it into an XML file. TheXML functions are likely to come in handy for this.
Of course you could also put the xml conversion in an arbitrary front-end. However, in general, you will find that the design above forces you to push everything into set-based logic which is likely to be more powerful in your case.
If you don't mind using Python, my XPORT module can write xpt files. https://github.com/selik/xport
If you have trouble using it, write me a note and I'll try to help. https://github.com/selik/xport/issues
Related
Looking for good way to load FIXED-Width data into postgres tables. I do this is sas and python not postgres. I guess there is not a native method. The files are a few GB. The one way I have seen does not work on my file for some reason (possibly memory issues). There you load as one large column and then parse into tables. I can use psycopy2 but because of memory issues would rather not. Any ideas or tools that work. Does pgloader work well or are there native methods?
http://www.postgresonline.com/journal/index.php?/archives/157-Import-fixed-width-data-into-PostgreSQL-with-just-PSQL.html
Thanks
There's no convenient built-in method to ingest fixed-width tabular data in PostgreSQL. I suggest using a tool like Pentaho Kettle or Talend Studio to do the data-loading, as they're good at consuming many different file formats. I don't remember if pg_bulkload supports fixed-width, but suspect not.
Alternately, you can generally write a simple script with something like Python and the psycopg2 module, loading the fixed-width data row by row and sending that to PostgreSQL. psycopg2's support for the COPY command via copy_from makes this vastly more efficient. I didn't find a convenient fixed-width file reader for Python in a quick search but I'm sure they're out there. You can use whatever language you like anyway - Perl's DBI and DBD::Pg do just as well, and there are millions of fixed-width file reader modules for Perl.
The Python Pandas library has a function pandas.read_fwf which works great.
Data can be read in using python, then written to Postgres database.
I am trying to load text data into a postgresql database via COPY FROM. Data is definitely not clean CSV.
The input data isn't always consistent: sometimes there are excess fields (separator is part of a field's content) or there are nulls instead of 0's in integer fields.
The result is that PostgreSQL throws an error and stops loading.
Currently I am trying to massage the data into consistency via perl.
Is there a better strategy?
Can PostgreSQL be asked to be as tolerant as mysql or sqlite in that respect?
Thanks
PostgreSQL's COPY FROM isn't designed to handle bodgy data and is quite strict. There's little support for tolerance of dodgy data.
I thought there was little interest in adding any until I saw this proposed patch posted just a few days ago for possible inclusion in PostgreSQL 9.3. The patch has been resoundingly rejected, but shows that there's some interest in the idea; read the thread.
It's sometimes possible to COPY FROM into a staging TEMPORARY table that has all text fields with no constraints. Then you can massage the data using SQL from there. That'll only work if the SQL is at least well-formed and regular, though, and it doesn't sound like yours is.
If the data isn't clean, you need to pre-process it with a script in a suitable scripting language.
Have that script:
Connect to PostgreSQL and INSERT rows;
Connect to PostgreSQL and use the scripting language's Pg APIs to COPY rows in; or
Write out clean CSV that you can COPY FROM
Python's csv module can be handy for this. You can use any language you like; perl, python, php, Java, C, whatever.
If you were enthusiastic you could write it in PL/Perlu or PL/Pythonu, inserting the data as you read it and clean it up. I wouldn't bother.
I have a huge SQL script which i need to analyse. It would be really helpful if i could find a way which can generate a call tree; ie, to see which all procedures are called from a particular procedure. a perl based example is here, http://sqlblog.com/blogs/linchi_shea/archive/2009/10/23/find-the-complete-call-tree-for-a-stored-procedure.aspx
but i need a tool to analyse the text file (.sql file), not the procedure stored in the database. due to some reasons i will not be able to create the whole set of procedures in the database and use the above mentioned tool.
please respond if you have come across any ide/tool with this feature.
Probably not very helpful, as it violates your request for a "offline" sql file, text based parsing tool, but wanted to throw this redgate tool out there that I have used with great success in the past; RedGate Sql Dependency Tracker. It works very well and does a good job mapping out your objects and all their dependencies (definable as to what you want mapped). But it does require a database with all of the existing objects in place to work properly. :(
If you can't find one out there, I guess you could maybe do some script/macro text parsing if all the procedure calls are easily defined and predictable in the file. AutoHotKey is a great general purpose scripting tool/framework, and there are a few sql based scripts out there...just not one exactly like you are looking for that I have seen.
I would like to import an Excel .xls file workbook into Powerbuilder. The file has 2 sheets and these sheets must be imported into 2 differenct db tables.
Any assistance is kindly appreciated.
Thanks
John.
First thing, there's nothing automagic, along the lines of a one-line solution that you could get for other file formats. There's a manual method, there's a scripting approach, and you can probably merge the two as a third option.
For a manual method, you can go into Excel and export your data as something that will import into a DataWindow. You don't mention your PowerBuilder version, but the file format for importing from Excel that comes to mind is CSV, which was added in PB9.
For a scripting approach, you can use OLE (assuming Excel is installed on the client machine) and access data however you want with the scripting engine, moving it into PowerBuilder in whatever format you want.
To mix the methods, you could use OLE to export the file to a couple of CSVs, then dw.FileImport() the data in.
Good luck,
Terry.
Postscript: Sybase has examples of OLE access, and examples of using ODBC, a solution I had neglected before.
If you give names to the areas with the data in Excel and then setup ODBC connections that point to them, you can access them like a database table from within PowerBuilder.
Dear all ,
Can any one suggest me the postgres tool for linux which is used to find the
difference between the 2 given database
I tried with the apgdiff 2.3 but it gives the difference in terms of schema not the data
but I need both !
Thanks in advance !
Comparing data is not easy especially if your database is huge. I created Python program that can dump PostgreSQL data schema to file that can be easily compared via 3rd party diff programm: http://code.activestate.com/recipes/576557-dump-postgresql-db-schema-to-text/?in=user-186902
I think that this program can be extended by dumping all tables data into separate CSV files, similar to those used by PostgreSQL COPY command. Remember to add the same ORDER BY in SELECT ... queries. I have created tool that reads SELECT statements from file and saves results in separate files. This way I can manage which tables and fields I want to compare (not all fields can be used in ORDER BY, and not all are important for me). Such configuration can be easily created using "dump schema" utility.
Check out dbsolo DBSOLO. It does both object and data compares and can create a sync script based on the results. It's free to try and $99 to buy. My guess is the 99 bucks will be money well spent to avoid trying to come up with your own software to do this.
Data Compare
http://www.dbsolo.com/help/datacomp.html
Object Compare
http://www.dbsolo.com/help/compare.html
apgdiff https://www.apgdiff.com/
It's an opensource solution. I used it before for checking differences between differences in dumps. Quite useful
[EDIT]
It's for differenting by schema only