FILESTREAM file manipulations - filestream

Our application stores user files on the physical drives (alebit a large number of them). The files are organised and grouped into folders (one folder for each user). The application also manipulates the folders by creating sub folders within it and grouping files into those subfolders. The physical location of each file is stored in the SQL Server database.
As such you can gauge that the application is tighly coupled with the physical filesystem. We are looking to migrate the file management operations to SQL Server FILESTREAM. However as I have understood FILESTREAM does not allow one to create a file hierarchy consisting of folder and folder groups. Also FILESTREAM does not allow me to rename files.
Is this true of FILESTREAM? Is there any other option for using FILESTREAM to manage the physical files without me having to change my application logic considerably?

See here for a basic overview of the Filestream feature.
When using Filestream, you let Sql Server engine control the placement and naming of NTFS files. The physical paths are abstracted by the engine and therefore clients cannot directly manipulate/rename them.
The idea is that if your scenario organizes the blobs into a hierarchy, that hierarchy needs to be implemented on the relational level.
Since opening filestream data needs to happen through special API (and involves the Sql Server engine), your existing client code would have to be changed to use this API rather than opening NTFS files directly.

Related

is it possible to truncate sys_file_processedfile?

I am stuck with a sys_file_processedfile table with more than 200.000 entries. Is it possible to truncate the table and empty the folder /fileadmin/_processed_ without destroying something?
Thanks!
It is possible.
In Admin Tools (Installtool) under Maintenance there is a card named Remove Temporary Assets which you should use to do so.
TYPO3 stores processed files and cached images in a dedicated directory. This directory is likely to grow quickly.
With this action you can delete the files in this folder. Afterwards, you should also clear the cache database tables.
The File Abstraction Layer additionally stores a database record for every file it needs to process. (e.g. image thumbnails). In case you modified some graphics settings (All Configuration [GFX]) and you need all processed files to get regenerated, you can use this tool to remove the "processed" ones.

How Postgres achieves database-level isolation

Per doc of postgres,
Databases are physically separated and access control is managed at the connection level.
Is there any further details that how Postgres achieves this physical isolation? Are those files used to store data in the backend totally separate?
Is there any further details that how postgres archives this physical isolation? Are those files used to store data in the backend totally separate?
Yes. Each table is stored as a separate file (actually, multiple files). Different databases are in different directories. Indexes, etc are also in one or more separate files.
However, there's a lot of shared state. Some system tables are shared between all databases. The write-ahead log (WAL) is also shared, as is the commit log (pg_clog). So you cannot just extract one database's files and attach it to another PostgreSQL instance. They're meaningless without some of the shared files.

Using Data compare to copy one database over another

Ive used the Data Comare tool to update schema between the same DB's on different servers, but what If so many things have changed (including data), I simply want to REPLACE the target database?
In the past Ive just used TSQL, taken a backup then restored onto the target with the replace command and/or move if the data & log files are on different drives. Id rather have an easier way to do this.
You can use Schema Compare (also by Red Gate) to compare the schema of your source database to a blank target database (and update), then use Data Compare to compare the data in them (and update). This should leave you with the target the same as the source. However, it may well be easier to use the backup/restore method in that instance.

Tableau TDE or connect to files directly?

I have a personal license for Tableau. I am using it to connect to .csv and .xlsx files currently but am running into some issues.
1) The .csv files are massive (10+ gig)
2) The Excel files are starting to reach the 1mil row limit
3) I need to add certain columns to the .csv files sometimes (like unique ID and a few formulas) which means that I need to open sections of them in Excel, modify what I need to, then save a new file
Would it be better to create an extract for each of these files and then connect the Tableau Workbook to the extract instead of the file? Currently I am connected directly to files and then extract data from there and refresh everyday.
I don't know about others, but I'm using that exactly guideline. I'll have some Workbooks that will simply serve to extract data from some datasource (be it SQL, xlsx, csv, mdb, or any other), and all analysis will be performed in other Workbooks, that'll connect only to tdes
The advantages are:
1) Whenever you need to update a data source, you'll need to only update once (and replace the tde file) and all your workbooks will be up to date. If you connect to the same data source and extract to different tde files, you'll have to extract to all those different tde files (and worry about having updated the extract in that specific Workbook). And even if you extract to the same tde (which doesn't make much sense), it can be confusing (am I connected to the tde or to the file? Does the extract I made in the other workbook updated this one too? Well, yes it did, but it can be confusing)
2) You don't have to worry about replacing a datasource, especially when it's a csv, xlsx or mdb file. You can keep many different versions of those files, and choose which one is the best one. For instance, I'll have table_v1.mdb, table_v2.mdb, ..., and a single table_v1.tde, which will be the extract of one of those mdb files. And I still have the previous versions in case I need them.
3) When you have a SQL connection, or anything that is not a file (csv, xlsx, mdb), extracts are very handy for basically the same reasons above, with (at least) one upside. You don't need to connect to a server every time you want to perform an analysis. That means you can do everything offline, and the person using Tableau doesn't need to have access to the SQL table (or any other source).
One good practice is always keeping a back-up when updating a tde (because, well, shit happens)
10 gig csv, wow. Yes, you should absolutely use a data extract, that would be much quicker. For that much data you could look at other connections such as MS Access or a SQL instance.
If your data have that many rows, I would try to set up a small MySQL instance on your local machine and keep the data there instead. You would be able to connect Tableau directly to the MySQL instance and would be able to easily edit the source data.

Multiple scripts accessing common data file in parallel - possible?

I have some Perl scripts on a unix-based server, which access a common text file containing server IPs and login credentials, which are used to login and perform routine operations on those servers. Currently, these scripts are being run manually at different times.
I would like to know that if I cron these scripts to execute at the same time, will it cause any issues with accessing data from the text file (file locking?), since all scripts will essentially be accessing the data file at the same time?
Also, is there a better way to do it (without using a DB - since I can't, due to some server restrictions) ?
It depends on which kind of access.
There is no problem in reading the data file from multiple processes. If you want to update the data file while it could be read, it's better to do it atomically (e.g. write a new version under different name, than rename it).