In Power Query, when duplicating the source query should I duplicate the Transform File folder as well? - import

My apologies in advance if this question has already been asked, if so I cannot find it.
So, I have this huge data base divided by country where I need to import from each country data base individually and then, in Power Query, append the queries as one.
When I imported the US files, the Power Query automatically generated a Transform File folder with 4 helper queries:
Then I just duplicated the query US - Sales and named it as UK - Sales pointing it to the UK sales folder:
The Transform File folder didn't duplicate, though.
Everything seems to be working just fine right now, however I'd like to know if this could be problem in the near future, because I still have several countries to go. Should I manually import new queries as new connections instead of just duplicating them or it just doesn't matter?
Many thanks!

The Transform Files Folder group contains the code that is called to transform a list of files. It is re-usable code. You can see the Sample File, which serves as the template for the transform actions.
As long as the file that is arrived at for the Sample File has the same structure as the files that you are feeding into the command, then you can use any query with any list of files.
One thing you need to make sure is that the Sample File is not removed from your data source. You may want to create a new dummy file just for that purpose, make sure it won't be deleted, and then point the Sample File query to pull just that file.

The Transform Helper Queries are special queries that you may edit the queries, but you cannot delete and recreate your own manually. They are automatically created by PQ when combining list of contents and are inherently linked to the parent query.
That said, you cannot replicate them, and must use the Combine function provided by PQ to create the helper queries.
You may however, avoid duplicating the queries, instead replicate your steps in the parent query, and use table union to join the list before combining the contents with the same helper queries.

Related

how to quickly locate which sheets/dashboards contain a field?

I am creating a data dictionary and I am supposed to track the location of any used field in a workbook. For example (superstore sample data), I need to specify which sheets/dashboards have the [sub-category] field.
My dataset has hundreds of measures/dimensions/calc fields, so it's incredibly time exhaustive to click into every single sheet/dashboard just to see if a field exists in there, so is there a quicker way to do this?
One robust, but not free, approach is to use Tableau's Data Catalog which is part of the Tableau Server Data Management Add-On
Another option is to build your own cross reference - You could start with Chris Gerrard's ruby libraries described in the article http://tableaufriction.blogspot.com/2018/09/documenting-dashboards-and-their.html

How to call a sas dataset by its label or where to check its name

I have a problem in dealing with SAS Enterprise Guide that runs on the server of my client.
I do not have access to the libraries so, in order to use the datasets the only thing we can do is to store them on the local disk C: of the computer and drag them to SAS.
We can not create libraries because the server does not read local paths.
Once you drag a table, let's call it "mydata" in SAS, the table is automatically renamed "mydata9865" with random numbers at the end and "mydata" is its label.
If you right-click the table and go to properties, you can't find the name of the table, just the label.
The only way I found to check the real name of the dataset is to open the Query Builder and check the name in the code preview.
The problem is that I am dealing with tables of millions of records and the machine I am using is very slow, so whenever I want to open the Query Building, just to check the table's name, it takes sometimes even an hour.
I am not a SAS expert, so I am sure there is a smarter way to do so. Is it possible for instance to use the table by calling it with its label?
data mydata2;
set mydata;
run;
instead of
set mydata9865?
Or is there some place I can rapidly check the name of the table without going through the query builder?
I tried to google it but I can't find anything, I hope someone will be able to help me!
Thank you in advance
Hover the mouse pointer over a data node to see it's attributes. The data set name is the File name: value.
For example:
In this example I had renamed the nodes created by two different queries to be the same (doable:yes, smart:maybe not). NOTE: A data node Label: is not necessarily the same as it's underlying data set's label metadata.
Regarding
use the table by calling it with its label?
Two nodes can have the same label, and is a a situation that defeats this approach.
Use the COPY task to upload your data explicitly. It sounds like you're not adding your data to the projects properly so SAS automatically assigns a name, rather than if you explicitly import or load your data.
Problem solved! I should have simply upload the data to the server with Tasks->Data->Upload Data Sets to Server but I didn't know this task so I didn't know it was possible to do it at all!
https://communities.sas.com/t5/SAS-Enterprise-Guide/Importing-sas-data-sets-from-C-drive-into-SAS-EG-not-possible/td-p/135184
Thank you everybody for you help!

Symfony: getting form values before and after form handling

Hello I want to be able to compare values before and after form handling, so that I can process them before flush.
What I do is collect old values in an array before handlerequest.
I then compare new values to the old values in the array.
It works perfectly on simple variables, like strings for instance.
However I want to work on uploaded files. I am able to get their fullpath and names before handling the form but when I get the values after checking if form is valid, I am still getting the same old value.
I tried both calling $entity->getVar() and $form->getData()->getVar() and I have the same output....
Hello I actually found a solution. Yet it is a departure from the strategy announced in my question, which I realize is somewhat truncated regarding my objective. Which was to compare old file names and new names (those names actually include full path) for changes, so that I would unlink those of those old names that were not in the new name list anymore. Basically, to operate a cleanup after a file was uploaded to replace another, without the first one being deleted first. And to save the webmaster the hassle of having to sort between uniqid-named files that are still used by the web site and those that are useless.
Problem is that my upload functions, that are very similar to those given in examples to the file upload code shown on the official documentation pages, seemed to take effect at flush time.
So, since what I wanted to do with those files had nothing to do with database operations, I resorted to having step two code launch after flush, which works fine.
However I am intrigued by your solutions, as they are both strategies I hadn't thought of. Thank you for suggestions.
However I am not sure if cloning the whole object will be as straightforward as comparing two arrays of file names.

Mongo schema: Todo-list with groups

I want to learn mongo and decided to create a more complex todo-application for learning purpose.
The basic idea is a task-list where tasks are grouped in folders. Users may have different access to those folders (read, write) and tasks may be moved to other folders. Usually (especially for syncing) tasks will be requested by-folder and not alone.
Basically I thought about three approaches and would like to hear your opinion for them. Maybe I missed some points or just have the wrong way of thinking.
A - List of References
Collections: User, Folder, Task
Folders contain references to Users
Folders contain references to Tasks
Problem
When updating a Task a reference to Folder is needed. Either those reference is stored within the Task (redundancy) or it must be passed with each API-call.
B - Subdocuments
Collections: User, Folder
Folders contain references to Users
Tasks are subdocuments within Folders
Problem
No way to update a Task without knowing the Folder. Both need to be transmitted as well but there is no redundancy compared to A.
C - References
Collections: User, Folder, Task
Folders contain references to Users
Taskskeep a reference to their Folders
Problem
Requesting a folder means searching in a long list instead of having direct references (A) or just returning the folder (B).
If you don't need any metadata for the folder except the name you could also go with:
Collections: User,Task
Task has field folder
User has arrays read_access and write_access
Then
You can get a list of all folders with
db.task.distinct("folder")
The folder a specific user can access are automatically retrieved when you retrieve the user document so those can basically known at login.
You can get all tasks a user can read with
db.task.find( { folder: { $in: read_access } } )
with read_access beeing the respective array you got from your users document. The same with write_access.
You can find all tasks within a folder with a simple find query for the folder name.
Renaming a folder can be achieved with one update query on each of the collections.
Creating a folder or moving a task to another folder can also be achieved in simple manners.
So without metadata for folders that is what I would do. If you need metadata for folders it can become a little more complicated but basically you could manage those independent of the tasks and users above using a folder collection containing the metadata with _id beeing the folder name referenced in user and task.
Edit:
Comparison of the different approaches
Stumbled over this link which might be of interest for you. In there is a discussion of transitioning from a relational database model to mongo. The difference beeing that in a relational database you usually try to go for third normal form where one of the goals is to avoid bias to any form of access pattern where in mongodb you can try to model your data to best fit your access patterns (while keeping in mind not to introduce possible data anomalies through redundancy).
So with that in mind:
your model A is a way how you could do it in a relational database (each type of information in one table referenced over id)
model B would be tailored for an access pattern where you always list a complete folder and tasks are only edited when the folder is opened (if you retrieve one folder you have all the task without an additional query)
C would be a different relational model than A and I think little closer to third normal form (without knowing the exact tables)
My suggestion would support the folder access not as optimal as B but would make it easier to show and edit single tasks
Problems that could come up with the schemas: Since A and C are basically relational you can get a problem with foreign keys since mongodb does not enforce foreign key constraints (e.g. you could delete a folder while there are still tasks referencing it in C or a task without deleting its reference in the folder in A). You could circumvent this problem by enforcing it from the application. For B the 16MB document limit could become a problem circumventable by allowing folders to split into multiple document when they reach a certain task count.
So new conclusion: I think A and C might not show you the advanatages of mongodb (and might even be more work to build in mongodb than in sql) since they are what you would do on a traditional relational database which is the way mongodb was not designed for (e.g. the missing join statement, no foreign key constraints). In sum B most matches your access patern "Usually (especially for syncing) tasks will be requested by-folder" while still allowing to easily edit and move tasks once the folder is opened.

Executing list of .sql files in certain order

I have a directory having a number of .sql files.I am able to execute them.But, I want to execute them in certain order.So, for example if I have 4 files xy.sql,dy.sql,trim.sql and see.sql.I want to execute them in see.sql,dy.sql,trim.sql and xy.sql.What happens now is I get a list of files using DirectoryInfo object.Now, I need to sort them using my order.I am using C# 3.5
thanks
It might be better to rename your files so that they sort into the correct order natively. This prevents having to maintain a separate "execution order" list somewhere.
Using a common prefix for the sql file names is a bit self-documenting as well, e.g.
exec1_see.sql
exec2_dy.sql
exec3_trim.sql and
exec4_xy.sql
It's unclear what the ordering algorithm is for your filenames. It doesn't appear to be entirely dictated by the filename (such as alphabetical ordering).
If you have arbitrary order in which you must execute these scripts, I would recommend that you create an additional file which defines that order, and use that to drive the ordering.
If the list of filenames is known at compile time, you could just hard-code that order in your code. I'm assuming, from your question, however, that the set of files is likely to change, and new ones may be added. If that's the case, I refer you to my previous paragraph.
You can obtain a List<string> that consists of all your .sql files, and call the List<T>.sort function to sort it and then operate on the files based on the newly sorted sequence.
number them and use a tool like my SimpleScriptRunner or the Tarantino DB change tool