Scrumwise can export its whole data as XML. This contains lots of data including Projects, Backlog, Sprints, Tasks, Team Members, etc. It can also export Tasks as CSV.
GreenHopper can import Projects in various formats (but not XML).
I'd like to transfer as much as possible between Scrumwise and GreenHopper. I'm thinking of extracting the Projects node from the XML, converting to JSON, and importing that. Right now GreenHopper rejects the data right from the start.
Is there a reference to the data schema used in GreenHopper? I'd like to transfer more than just the Project, but all its associated data.
Select all the backlog tasks and export. Scrumwise uses the semicolon as the delimiter. Convert by search and replace and JIRA will at least try to import. Import into a new empty project, expect it to take several attempts to get it right. You'll need to add columns for time remaining in seconds. Getting the statuses and resolution was the hardest part. JIRA gives you status options that will fail during import! Check the jira workflow for valid states. Also, an empty resolution means "unresolved", an unrecognized resolution defaults to "resolved". For tasks, you will need to export as XML, parse the XML for all tasks and add a parent ID column.
Related
My apologies in advance if this question has already been asked, if so I cannot find it.
So, I have this huge data base divided by country where I need to import from each country data base individually and then, in Power Query, append the queries as one.
When I imported the US files, the Power Query automatically generated a Transform File folder with 4 helper queries:
Then I just duplicated the query US - Sales and named it as UK - Sales pointing it to the UK sales folder:
The Transform File folder didn't duplicate, though.
Everything seems to be working just fine right now, however I'd like to know if this could be problem in the near future, because I still have several countries to go. Should I manually import new queries as new connections instead of just duplicating them or it just doesn't matter?
Many thanks!
The Transform Files Folder group contains the code that is called to transform a list of files. It is re-usable code. You can see the Sample File, which serves as the template for the transform actions.
As long as the file that is arrived at for the Sample File has the same structure as the files that you are feeding into the command, then you can use any query with any list of files.
One thing you need to make sure is that the Sample File is not removed from your data source. You may want to create a new dummy file just for that purpose, make sure it won't be deleted, and then point the Sample File query to pull just that file.
The Transform Helper Queries are special queries that you may edit the queries, but you cannot delete and recreate your own manually. They are automatically created by PQ when combining list of contents and are inherently linked to the parent query.
That said, you cannot replicate them, and must use the Combine function provided by PQ to create the helper queries.
You may however, avoid duplicating the queries, instead replicate your steps in the parent query, and use table union to join the list before combining the contents with the same helper queries.
I am creating a data dictionary and I am supposed to track the location of any used field in a workbook. For example (superstore sample data), I need to specify which sheets/dashboards have the [sub-category] field.
My dataset has hundreds of measures/dimensions/calc fields, so it's incredibly time exhaustive to click into every single sheet/dashboard just to see if a field exists in there, so is there a quicker way to do this?
One robust, but not free, approach is to use Tableau's Data Catalog which is part of the Tableau Server Data Management Add-On
Another option is to build your own cross reference - You could start with Chris Gerrard's ruby libraries described in the article http://tableaufriction.blogspot.com/2018/09/documenting-dashboards-and-their.html
I have two separate Data flows in Azure Data Factory, and I want to combine them into a single Data flow.
There is a technique for copying elements from one Data flow to another, as described in this video: https://www.youtube.com/watch?v=3_1I4XdoBKQ
This does not work for Source or Sink stages, though. The Script elements do not contain the Dataset that the Source or Sink is connected to, and if you try to copy them, the designer window closes and the Data flow is corrupted. The details are in the JSON, but I have tried copying and pasting into the JSON and that doesn't work either - the source appears on the canvas, but is not usable.
Does anyone know if there is a technique for doing this, other than just manually recreating the objects on the canvas?
Thanks Leon for confirming that this isn't supported, here is my workaround process.
Open the Data Flow that will receive the merged code.
Open the Data Flow that contains the code to merge in.
Go through the to-be-merged flow and change the names of any transformations that clash with the names of transformations in the target flow.
Manually create, in the target flow, any Sources that did not already exist.
Copy the entire script out of the to-be-merged flow into a text editor.
Remove the Sources and Sinks.
Copy the remaining transformations into the clipboard, and paste them in to the target flow's script editor.
Manually create the Sinks, remembering to set all properties such as "Allow Update".
Be prepared that, if you make a mistake and paste in something that is not correct, then the flow editor window will close and the flow will be unusable. The only way to recover it is to refresh and discard all changes since you last published, so don't do this if you have other unpublished changes that you don't want to lose!
I have already established a practice in our team that no mappings are done in Sinks. All mappings are done in Derived Column transformations, and any column name ambiguity is resolved in a Select transformations, so the Sink is always just auto-map. That makes operations like this simpler.
It should be possible to keep the Source definitions in Step 6, remove the Source elements from the target script, and paste the new Sources in to replace them, but that's a little more complex and error-prone.
I am experimenting with Azure Data Factory to replace some other data-load solutions we currently have, and I'm struggling with finding the best way to organize and parameterize the pipelines to provide the scalability we need.
Our typical pattern is that we build an integration for a particular Platform. This "integration" is essentially the mapping and transform of fields from their data files (CSVs) into our Stage1 SQL database, and by the time the data lands in there, the data types should be set properly and the indexes set.
Within each Platform, we have Customers. Each Customer has their own set of data files that get processed in that Customer context -- within the scope of a Platform, all Customer files follow the same schema (or close to it), but they all get sent to us separately. If you looked at our incoming file store, it might look like (simplified, there are 20-30 source datasets per customer depending on platform):
Platform
Customer A
Employees.csv
PayPeriods.csv
etc
Customer B
Employees.csv
PayPeriods.csv
etc
Each customer lands in their own SQL schema. So after processing the above, I should have CustomerA.Employees and CustomerB.Employees tables. (This allows a little bit of schema drift between customers, which does happen on some platforms. We handle it later in our stage 2 ETL process.)
What I'm trying to figure out is:
What is the best way to setup ADF so I can effectively manage one set of mappings per platform, and automatically accommodate any new customers we add to that platform without having to change the pipeline/flow?
My current thinking is to have one pipeline per platform, and one dataflow per file per platform. The pipeline has a variable, "schemaname", which is set using the path of the file that triggered it (e.g. "CustomerA"). Then, depending on file name, there is a branching conditional that will fire the right dataflow. E.g. if it's "employees.csv" it runs one dataflow, if it's "payperiods.csv" it loads a different dataflow. Also, they'd all be using the same generic target sink datasource, the table name being parameterized and those parameters being set in the pipeline using the schema variable and the filename from the conditional branch.
Are there any pitfalls to setting it up this way? Am I thinking about this correctly?
This sounds solid. Just be aware that you if you define column-specific mappings with expressions that expect those columns to be present, you may have data flow execution failures if those columns are not present in your customer source files.
The ways to protect against that in ADF Data Flow is to use column patterns. This will allow you to define mappings that are generic and more flexible.
I'm trying to work out how I can export individual Cognos Reports via the command line, for the purposes of source versioning in Git at a report-by-report level. I presume XML would be the output format.
I read that the Cognos SDK can help but you need to build your own solution, which may be possible but this use case feels like something many others would already want and there'd be tooling already.
Of course, importing the individual report would also be needed.
Can anyone help here please?
Thanks.
If your end game is version control (Who changed what, when?), you should look into MotioCI. Last time I looked, there was no free version of MotioCI.
You can use tools like the ones provided by companies like http://www.motio.com. With the free version you can export the XML of the reports but only one by one.
You can also use a Cognos deployment of the reports that generates a zip file with the XML of the reports, but all the reports are in the same file and you will have to extract the XML of the individual reports by hand.
I found the SDK to be cumbersome and, when I got it working, slow.
Yes, report specs are XML.
I have created a process that produces output like what you are asking for. Here's what it involves:
A recursive common table expression (CTE) query to get the report
specs along with the folder structure as seen in Cognos.
A PowerShell script to run the query and write the results to the file system.
Another PowerShell script to pull the current content from the remote git repo, run the first PowerShell script, then add, commit, and push the results up to the remote git repo.
I also wrote a PowerShell script to perform the operations associated with git push. This involves using a program I found called HTML Tidy (http://tidy.sourceforge.net/) that can be used to make the XML human-readable. This helps with diffs in git. I use TFS, so I get a nice, side-by-side diff if I have tidied the XML. (Otherwise, it tells me the only line of XML has changed.)
I recently added output for dashboards (exploration) and data sets (dataSet2). Dashboards are stored as JSON, so my routine had to tidy that (simple in PowerShell).
I run my routine daily, getting new and modified content from the last 3 days (just in case), and weekly to do an entire dump (to capture the deletes). The weekly process takes about six minutes. The daily process is negligible.
Before you ask: I hesitate to provide actual code because I can't take any responsibility for your system.
Updates:
Hacking away at the Content Store database is not recommended and it is not supported by IBM.
For reference/comparison: I'm running IBM Cognos 11.0.7 on IIS on Windows 2012 R2 with the Content Store database on MS SQL Server 2016. Your system may be different.
Additional Resources
https://www.cognoise.com/index.php/topic,28289.msg113869.html#msg113869
https://www.cognoise.com/index.php/topic,17411.msg50409.html#msg50409
https://learn.microsoft.com/en-us/powershell/scripting/overview?view=powershell-6
https://learn.microsoft.com/en-us/sql/t-sql/language-reference?view=sql-server-2017
https://git-scm.com/docs
http://tidy.sourceforge.net/