Import Data to cassandra and create the Primary Key - import

I've got some csv data to import to cassandra. This could work with the copy-command. The Problem is, that the csv doesn't serve a unique ID for the data so I need to create a timeuuid on import.
Is it possible to do this via copy-command or did I need to write a external script for importing?

I would write a quick script to do it, the copy command can really only handle small amounts of data anyway. Try the new python driver. I find it quite fast to setup loading scripts with, especially if you need any sort of minor modifications of the data before being loaded.
If you have a really big set of data bulk-loading is still the way to go.

Related

any way to import a csv file to postgresql without creating new tables from scratch?

I have 2 files and want to import them to PostgreSql using pgAdmin4 so the only way I know is to create the tables and manually create the columns and every thing from the beginning, but this will take a lot of time hoping I didn't miss any thing, so any way to simply import them and PostgreSql consider the headers as the columns, creates the tables and saves me tons of time. I'm new to all of this so my question form maybe wrong but this is the best I can ask.

Working with PowerShell and file based DB operations

I have a scenario where I have a lot of files in a CSV file i need to do operations on. The script needs to be able to handle if script is stopped or failed, then it should continue where i stopped from. In a database scenario this would be fairly simple. I would have an updated column and update that when operation for the line has completed. I have looked if I somehow could update the CSV on the fly, but I dont think that is possible. I could start having multiple files, but not that elegant. Can anyone recommend some kind of simple file based DB like framework? Where I from PowerShell could create a new database file (maybe json) and read from it and update on the fly.
If your problem is really so complex, that you actually need somewhat of a local database solution, then consider to go with SQLite which was built for such scenarios.
In your case, since you process an CSV row-by-row, I assume storing the info for the current row only will be enough. (Line number, status etc.)

Using postgres to replace csv files (pandas to load data)

I have been saving files as .csv for over a year now and connecting those files to Tableau Desktop for visualization for some end-users (who use Tableau Reader to view the data).
I think I settled on migrating to postgreSQL and I will be using the pandas library to_sql to fill it up.
I get 9 different files each day and I process each of them (I currently consolidate them into monthly files in .csv.bz2 format) by adding columns, calculations, replacing information, etc.
I create two massive csv files using pd.concat and pd.merge out of those
processed files which Tableau is connected to. These files are literally overwritten every day when new data is added which is time consuming
Is it okay to still do my file joins and concatenation with pandas and export the output data to postgres? This will be my first time using a real database and I am more comfortable with pandas compared to learning SQL syntax and creating views or tables. I just want to avoid overwriting the same csv files over and over (and some other csv problems I run into).
Don't worry too much about normalization. A properly normalized database will usually be more efficient and easier to handle than an non-normalized. On the other hand, if you have non-normalized csv data you dump into a database, your import functions will be a lot more complicated if you do a proper normalization. I think I would recommend you to make one step at the time. Start up with just loading the processed csv-files into postgres. I am pretty sure all processing following that will be a lot easier and quicker than doing it using csv-files (just make sure you set up the right indexes). When you start to get used to using the database, you can start to do more processing there.
Just remember, one thing a database is really good at is to pick out the subset of data you want to work on. Try as much as possible to avoid pulling out huge amount of data from the database when you only intend to work on a subset of it.

Import to postgreSQL from csv filtering each line

I have the following question (even thorough research couldn't help me):
I want to import data from a (rather large) CSV/TXT file to a postgreSQL DB, but I want to filter each line (before importing it) based on specific criteria.
What command/solution can I use?
On sidenote: If I am not reading from file, but a data stream what is the relevant command/procedure?
Thank you all in advance and sorry if this has been in some answer/doc that I have missed!
Petros
To explain the staging table approach, which is what I use myself:
Create a table (could be a temporary table) matching your csv structure
Import into that table, doing no filtering
Process and import your data into the real tables using SQL to filter and process.
Now, in PostgreSQL, you could also use the file_fdw to give you direct sql access to csv files. In general the staging table solution will usually be cleaner, but you can do this by essentially letting PostgreSQL treat the file as a table and going through a foreign data wrapper.

Filemaker Pro Advanced - Scripting import from ODBC with variable target table

I have several tables I'm importing from ODBC using the import script step. Currently, I have an import script for each and every table. This is becoming unwieldy as I now have nearly 200 different tables.
I know I can calculate the SQL statement to say something like "Select * from " & $TableName. However, I can't figure out how to set the target table without specifying it in the script. Please, tell me I'm being dense and there is a good way to do this!
Thanks in advance for your assistance,
Nicole Willson
Integrated Research
Unfortunately, the target table of an import has to be hard coded in FileMaker up through version 12 if you're using the Import Records script step. I can think of a workaround to this, but it's rather convoluted and if you're importing a large number of records, would probably significantly increase the time to import them.
The workaround would be to not use the Import Records script step, but to script the creation of records and the population of data into fields yourself.
First of all, the success of this would depend on how you're using ODBC. As far as I can think, it would only work if you're using ODBC to create shadow tables within FileMaker so that FileMaker can access the ODBC database via other script steps. I'm not an expert with the other ODBC facilities of FileMaker, so I don't know if this workaround would be helpful in other cases.
So, if you have a shadow table into the remote ODBC database, then you can use a script something like the following. The basic idea is to have two sets of layouts, one for the shadow tables that information is coming from and another for the FileMaker tables that the information needs to go to. Loop through this list, pulling information from the shadow table into variables (or something like the dictionary library I wrote which you can find at https://github.com/chivalry/filemaker-dictionary). Then go to the layout linked to the target table, create a record and populate the fields.
This isn't a novice technique, however. In addition to using variables and loops, you're also going to have to use FileMaker's design functions to determine the source and destination of each field and Set Field By Name to put the data in the right place. But as far as I can tell, it's the only way to dynamically target tables for importing data.