openrefine flag changed rows - data-cleaning

I'm using openrefine to cleanup an excel data set. I have about 70 operations and I've been cutting and pasting on different data sets. I maintain a record id and export to a new excel sheet. Then I reload the sheet using the record id.
It works well, but I have to reload the entire database even if only a handful of records change. is there an easy way to flag changed records so I only export / import the changed records to the application?
Can I easily add a flag to the 70 operations to tag only changed records?

There isn't currently a straightforward way to do this in OpenRefine, but it would make an interesting feature request. The best way to do this currently would probably be to work with CSV or TSV files and diff the before & after files to come up with a delta to load.

Related

When you create a free form of Microstrategy, is it possible to do an automatic mapping?

When you finish the free form query in microstrategy, the next step is to map the columns.
Is there any way to do it automatically? At least make the list of the columns with its names.
Thanks!!!!
Sadly, this isn't possible. You will have to map all columns manually.
While this functionality isnt possible with freeform reporting specifically, Microstrategy Data Import will allow you the ability to create Data Import Cubes. These cubes can be configured as live connections, meaning they execute against the data source selected every time they are used, and are not your typical snapshot cube. Data Imports from a database can be sourced from a database query. This effectively allows you to write your own SQL with the end result being a report that you did not have to specify columns manually for.

Is there a way to export only the Firestore delta or changes (the differences) since the last export?

I am doing a daily export of my Firestore records to Google's bucket storage using this scheduled export code (cron): https://firebase.google.com/docs/firestore/solutions/schedule-export#deploy_the_app_and_cron_job
Is there a feature or code or API that is available yet to export only the delta differences or the changes made since the last export or does it only do full backups all the time?
There is not currently a way to do incremental backups. The export mechanism described in the documentation is not really a "backup" in the way that most people would think of that word. It's just an export, to be used to make it easy to recover from disaster, or make copies to bootstrap other databases for immediate use.
Here's an approach to get the segmented data using a temporary collection /exportData
Reminder that the necessary read, write, & delete calls for this still fall under standard billable usage
As part of the scheduled script, run following async sequence:
Empty collection /exportData
Run a filtered query to get the delta data
Write the delta data you want to export to collection /exportData
Export just that collection /exportData
As per the docs "You can export all documents or just specific collections." https://firebase.google.com/docs/firestore/manage-data/export-import

Is it possible to copy just highlighted numbers from Tableau?

I have a Tableau workbook that connects to a database and then has several sheets that reorganize the data into different tables and graphs that I need.
If I make a sheet that has 2 rows and 1 field for example, I can't highlight the numbers and just copy them without also copying the row names for each item.
Is there a way I can copy just the numbers, nothing else?
It does not appear to be possible :(
As can be seen from the following Tableau threads:
Copy data from Text tables to clipboard
Copy single cell from view data
various incarnations of your request have already been asked of the development team that have yet to make it into Tableau. I also couldn't find anything in the user documents that describes a workaround.
There's a way to do this using Python and probably Autohotkey if that's of interest - both options are hackish.

Using postgres to replace csv files (pandas to load data)

I have been saving files as .csv for over a year now and connecting those files to Tableau Desktop for visualization for some end-users (who use Tableau Reader to view the data).
I think I settled on migrating to postgreSQL and I will be using the pandas library to_sql to fill it up.
I get 9 different files each day and I process each of them (I currently consolidate them into monthly files in .csv.bz2 format) by adding columns, calculations, replacing information, etc.
I create two massive csv files using pd.concat and pd.merge out of those
processed files which Tableau is connected to. These files are literally overwritten every day when new data is added which is time consuming
Is it okay to still do my file joins and concatenation with pandas and export the output data to postgres? This will be my first time using a real database and I am more comfortable with pandas compared to learning SQL syntax and creating views or tables. I just want to avoid overwriting the same csv files over and over (and some other csv problems I run into).
Don't worry too much about normalization. A properly normalized database will usually be more efficient and easier to handle than an non-normalized. On the other hand, if you have non-normalized csv data you dump into a database, your import functions will be a lot more complicated if you do a proper normalization. I think I would recommend you to make one step at the time. Start up with just loading the processed csv-files into postgres. I am pretty sure all processing following that will be a lot easier and quicker than doing it using csv-files (just make sure you set up the right indexes). When you start to get used to using the database, you can start to do more processing there.
Just remember, one thing a database is really good at is to pick out the subset of data you want to work on. Try as much as possible to avoid pulling out huge amount of data from the database when you only intend to work on a subset of it.

Exporting into a single large CSV from MySQL Workbench into the client machine without viewing it on GUI?

After going through similar questions on Stackoverflow, I am unable to find a method where I could export a large CSV file from a query made in MySQL workbench (v 5.2).
The query is about 4 million rows, 8 columns (comes to about 300Mb when exported as a csv file).
Currently I load the entire rows (have see it in the GUI) and use the export option. This makes my machine crash most of the time)
My constraints are:
I am not looking for a solution via bash terminal.
I need to export it to the client machine and not the database server.
Is this drawback of MySQL Workbench?
How do I not see it in GUI but yet export all the rows into a single file?
There is a similar question I found, but the answers dont meet the constraints I have:
" Exporting query results in MySQL Workbench beyond 1000 records "
Thanks.
In order to export to CSV you first have to load all that data, which is a lot to have in a GUI. many controls are simply no made to carry that much data. So your best bet is to avoid GUI as much as possible.
One way could be to run your query outputting to a text window (see Query menu). This is not CSV but at least should work. You can then try to copy out the text into a spreadsheet and convert it to CSV.
If that is too much work try limiting your rows into ranges, say 1 million each, using the LIMIT clause on your query. Lower the size until you have one that can be handled by MySQL Workbench. You will get n CSV files you have to concatenate later. A small application or (depending on your OS) a system tool should be able to strip headers and concatenate the files into one.