Is there a quicker way to export a NatTable than the ExportCommand? - nattable

Exporting the NatTable using the Export Command provided works fine, however when the table is populated with large amounts of data, the exporting will take incredibly long, sometimes not even happening if the amount of data is too large. Is there another way to export the data to just a simple csv file or something like that that won't get bogged down?

What exporter do you use? The default in core creates the xml format. The one in the POI extension uses Apache POI. And there was a csv exporter contributed to core which is not released yet, but available in the SNAPSHOT builds.
If they are all not sufficient you can even implement your own exporter and register it via the ConfigRegistry.

Related

How to copy Tableau Data Extract logic?

Someone in my org created a Data Extract. There is an issue in one of the worksheets that uses it, and we suspect it's due to a mistake in how the Union was built.
But since it's a Data Extract, I can't see the UI for the data merge. Is there anyway to take a current Data Extract and view the logic that creates it?
Download the extract from the server (I'm assuming you're using server), then open that extract using desktop. You should be able to see the details of it.
Before going too deep into extract details, note that extracts are not intended to be permanent systems of record for data - just an efficient way to work with query results for optimized reporting. So in general, you should always be able to throw away the extract and look at the original source - or recreate the extract on command. But life isn't always perfect so ...
If you use Tableau Desktop to look at your worksheet, and look at the data source icon at the top of the data pane in the left sidebar, do you see an icon for your data source that looks like two databases with one on top of (shadowing) the other? If so, you can at right click on the data source icon and view its properties to see the source database table or file path. You can then even try disabling the extract to view the original source data.
If instead you see a single database icon, you have a "naked" extract where you've discarded the reference to the original source, (unless it is stored in the catalog mentioned below.)
If your organization purchased the Data Management Add-on for Tableau Server (strongly recommended), then if your data source is published to Tableau Server you can trace its history and origin by exploring the Tableau Catalog. That is especially valuable if the extract was built by a Tableau Prep Flow.
If instead, someone built the extract another way, say by writing a custom app using the Tableau Data Extract API, then the answer is to find that program.
One last point, in recent versions of Tableau, extracts are stored in an efficient relational type database file called Hyper. Hyper extracts can either be a single table (say serializing the results of a query joining multiple tables) or a Hyper extract can contain multiple tables (say serializing caching individual tables and deferring the join for later).
That may not be relevant to your question, but could turn out to matter as you reverse engineer how the extract was created.

Is there a postgreSQL foreign table wrapper that will access zipped csv files?

I'm interested in storing long term static data outside of the database, ideally in compressed files that are dynamically uncompressed when accessed. I am currently using the existing file_fdw for some purposed, but would really like to be able to compress the data.
We currently use 9.3.
There seems to be such a wrapper here. It requires Multicorn, so you'll have to install that first.
I have not tried it and don't know how well it works.
Alternatively, did you consider using compression at the storage level?

Using postgres to replace csv files (pandas to load data)

I have been saving files as .csv for over a year now and connecting those files to Tableau Desktop for visualization for some end-users (who use Tableau Reader to view the data).
I think I settled on migrating to postgreSQL and I will be using the pandas library to_sql to fill it up.
I get 9 different files each day and I process each of them (I currently consolidate them into monthly files in .csv.bz2 format) by adding columns, calculations, replacing information, etc.
I create two massive csv files using pd.concat and pd.merge out of those
processed files which Tableau is connected to. These files are literally overwritten every day when new data is added which is time consuming
Is it okay to still do my file joins and concatenation with pandas and export the output data to postgres? This will be my first time using a real database and I am more comfortable with pandas compared to learning SQL syntax and creating views or tables. I just want to avoid overwriting the same csv files over and over (and some other csv problems I run into).
Don't worry too much about normalization. A properly normalized database will usually be more efficient and easier to handle than an non-normalized. On the other hand, if you have non-normalized csv data you dump into a database, your import functions will be a lot more complicated if you do a proper normalization. I think I would recommend you to make one step at the time. Start up with just loading the processed csv-files into postgres. I am pretty sure all processing following that will be a lot easier and quicker than doing it using csv-files (just make sure you set up the right indexes). When you start to get used to using the database, you can start to do more processing there.
Just remember, one thing a database is really good at is to pick out the subset of data you want to work on. Try as much as possible to avoid pulling out huge amount of data from the database when you only intend to work on a subset of it.

Import Data to cassandra and create the Primary Key

I've got some csv data to import to cassandra. This could work with the copy-command. The Problem is, that the csv doesn't serve a unique ID for the data so I need to create a timeuuid on import.
Is it possible to do this via copy-command or did I need to write a external script for importing?
I would write a quick script to do it, the copy command can really only handle small amounts of data anyway. Try the new python driver. I find it quite fast to setup loading scripts with, especially if you need any sort of minor modifications of the data before being loaded.
If you have a really big set of data bulk-loading is still the way to go.

Export data from postgreSQL in SAS XPORT (xpt) format

Is there a convenient, open-source method to generate a SAS XPORT Transport Format (xpt) file from a postgreSQL database for FDA submission?
I have checked the FDA specifications, available at http://www.fda.gov/downloads/ForIndustry/DataStandards/StudyDataStandards/UCM312964.pdf
These state that 'SAS XPORT transport files can be converted to various other formats using commercially available off the shelf software', but no software packages other than SAS are suggested.
The specifications for an SAS XPORT file are available at http://support.sas.com/techsup/technote/ts140.html
I have checked OpenClinica (which is the EDC software we are using), PGAdmin3 and AM (which can import .xpt files, but I didn't find an export method)
Easy way? Not that I know of. I think one way or another it will take some development work.
My recommendation is to do it as follows:
Write a user-defined function/stored procedure for pulling the data you need for each section.
Write a user-defined function to pull this data from each section and arrange it into an XML file. TheXML functions are likely to come in handy for this.
Of course you could also put the xml conversion in an arbitrary front-end. However, in general, you will find that the design above forces you to push everything into set-based logic which is likely to be more powerful in your case.
If you don't mind using Python, my XPORT module can write xpt files. https://github.com/selik/xport
If you have trouble using it, write me a note and I'll try to help. https://github.com/selik/xport/issues