Best practices for Informatica Webservice workflow - workflow

I have created a Informatica webservice workflow which takes 1 parameter as input. A Webservice provider source definition is used for this and mapping is a one-way type.
Workflow works fine when parameter is being passed. But when the same workflow is triggered from Informatica Power center directly (in which case no parameters are passed), mapping that contains webservice provider source definition takes 3 minutes to complete (Gives Timeout based commit point in the log).
Is it a good practice to run the webservice workflow from power center directly? And is there a way to improve its performance when triggered from power center directly?
Note: I am trying to use 1 workflow for both - 1) Pass the parameter from web 2) Schedule the workflow in Informatica

Answers to your questions below.
Is it a good practice to run the webservice workflow from power center directly?
Of course it depends on requirement - whether you need to extract data automatically from WS or not. If you pass parameter using some session then i dont see much issue here and your session is completing within time.
So, you can create a new session/command task/shell script to create a param file and then use it in original session so it is passed on to WS.
In a complex scenario, you may have to pass multiple values, in such case, i would recommend to use a parent workflow to call original workflow multiple times and change param every time before call.
Is there a way to improve its performance when triggered from power center directly?
It is really depends on few factors.
The web service - Make sure you are using correct input and output columns. Most of the time WS are sensitive to outside call and you need to choose optimized column to extract data for better performance. You can work with WS admin to know correct column.
If informatica flow is complex then depending on bottle neck transformation/s (source, target, expression, lookup, aggregator, sorter), we can check and take actions.
For lookup, you can add new filter to exclude unwanted data, remove unwanted columns etc.
For aggregator, you can use sorter before to improve perf.
... like this

Related

Best practices for parameterizing load of multiple CSV files in Data Factory

I am experimenting with Azure Data Factory to replace some other data-load solutions we currently have, and I'm struggling with finding the best way to organize and parameterize the pipelines to provide the scalability we need.
Our typical pattern is that we build an integration for a particular Platform. This "integration" is essentially the mapping and transform of fields from their data files (CSVs) into our Stage1 SQL database, and by the time the data lands in there, the data types should be set properly and the indexes set.
Within each Platform, we have Customers. Each Customer has their own set of data files that get processed in that Customer context -- within the scope of a Platform, all Customer files follow the same schema (or close to it), but they all get sent to us separately. If you looked at our incoming file store, it might look like (simplified, there are 20-30 source datasets per customer depending on platform):
Platform
Customer A
Employees.csv
PayPeriods.csv
etc
Customer B
Employees.csv
PayPeriods.csv
etc
Each customer lands in their own SQL schema. So after processing the above, I should have CustomerA.Employees and CustomerB.Employees tables. (This allows a little bit of schema drift between customers, which does happen on some platforms. We handle it later in our stage 2 ETL process.)
What I'm trying to figure out is:
What is the best way to setup ADF so I can effectively manage one set of mappings per platform, and automatically accommodate any new customers we add to that platform without having to change the pipeline/flow?
My current thinking is to have one pipeline per platform, and one dataflow per file per platform. The pipeline has a variable, "schemaname", which is set using the path of the file that triggered it (e.g. "CustomerA"). Then, depending on file name, there is a branching conditional that will fire the right dataflow. E.g. if it's "employees.csv" it runs one dataflow, if it's "payperiods.csv" it loads a different dataflow. Also, they'd all be using the same generic target sink datasource, the table name being parameterized and those parameters being set in the pipeline using the schema variable and the filename from the conditional branch.
Are there any pitfalls to setting it up this way? Am I thinking about this correctly?
This sounds solid. Just be aware that you if you define column-specific mappings with expressions that expect those columns to be present, you may have data flow execution failures if those columns are not present in your customer source files.
The ways to protect against that in ADF Data Flow is to use column patterns. This will allow you to define mappings that are generic and more flexible.

Work Flow Support multiple Scenario

I am building a base workflow will support around 25 Customer
all customers they matches with one basic workflow and each one has little different request lets say one customer wanna send email and another one don't wanna send email
What I am thinking to make
1- make one workflow and in the different requirement I will make switch to check who is
the user then switch each user to his requirements
(Advantages)this way powerful in maintenance and if there is any common requirements
easy to add
(Disadvantages) if The customer number increase and be like 100 and each is different
and we expect to have 100 user using the workflow but with the Different
little requirements
2- make Differnt workflow for each customer which meaning I will have a 100 workflow
in the future and in declaration instantiate the object from the specific workflow
which related to the Current user
(Advantages) each workflow is separate
(Disadvantages) - hard to add simple feature this meaning write the same thing 100
time so this is not Professional
so What I need ??
I wanna know if those only the ways I have to use in this situation or I missing another technique
One way would be to break out your workflow into smaller parts, each which do a specific thing. You could organize a layout like the following, to be able to support multiple variations of the inbound request.
Customer1-Activity.xaml
- Common-Activity1.xaml
- Common-Activity2.xaml
Customer2-Activity.xaml
- Common-Activity1.xaml
- Common-Activity2.xaml
For any new customers you have, you only need to create a root XAML activity, with each having the slight changes required for your incoming request parameters.
Option #2: Pass in a dictionary to your activity
Thought of a better idea, where you could have your workflow have a Dictionary<string, object> type be an input argument. The dictionary can contain the parameter/argument set that was given to your workflow. Your workflow could then query for the parameter set to initialize itself with that info.

Should Command / Handles hold the full aggregates or only its id?

I'm trying to play around with DDD and CQRS.
And I got this two solutions :
add AggregateId to my command / event. It's nice beauce I can use my command as my web service's parameter, and I can as well return some instance of my command to my forms for saying "you can do this command,t his one and this one"
add my full Aggregate to my command / event. It's nice because I'm sure that I won't load my aggregate 100 times if there is a lot of event going on, I'll just pass my reference around (for instance I won't load it in my command's validator and in my command handler). But i'd add to create a parameter class for each command wih only the id.
For now I have the id in the commands and the full model in the events (I trust my unit of work for caching the Load(aggregateId) so i won't execute the same request 100 for 1 command).
Is there a right / better way ?
Yes your current approach is correct - reference the aggregate with an identity value on the command. A command is meant to be serialized and sent across process boundaries. Also, a command is normally constructed by a client who may not have enough information to create an entire aggregate instance. This is also why an identity should be used. And yes, your unit of work should take care of caching an aggregate for the duration of a unit of work, if need be.

How to make an InArgument's value dependant upon the value of another InArgument at design time

I have a requirement to allow a user to specify the value of an InArgument / property from a list of valid values (e.g. a combobox). The list of valid values is determined by the value of another InArgument (the value of which will be set by an expression).
For instance, at design time:
User enters a file path into workflow variable FilePath
The DependedUpon InArgument is set to the value of FilePath
The file is queried and a list of valid values is displayed to the user to select the appropriate value (presumably via a custom PropertyValueEditor).
Is this possible?
Considering this is being done at design time, I'd strongly suggest you provide for all this logic within the designer, rather than in the Activity itself.
Design-time logic shouldn't be contained within your Activity. Your Activity should be able to run independent of any designer. Think about it this way...
You sit down and design your workflow using Activities and their designers. Once done, you install/xcopy the workflows to a server somewhere else. When the server loads that Activity prior to executing it, what happens when your design logic executes in CacheMetadata? Either it is skipped using some heuristic to determine that you are not running in design time, or you include extra logic to skip this code when it is unable to locate that file. Either way, why is a server executing this design time code? The answer is that it shouldn't be executing it; that code belongs with the designers.
This is why, if you look at the framework, you'll see that Activities and their designers exist in different assemblies. Your code should be the same way--design-centric code should be delivered in separate assemblies from your Activities, so that you may deliver both to designers, and only the Activity assemblies to your application servers.
When do you want to validate this, at design time or run time?
Design time is limited because the user can use an expression that depends on another variable and you can't read the value from there at design time. You can however look at the expression and possibly deduce an invalid combination that way. In this case you need to add code to the CacheMetadata function.
At run time you can get the actual values and validate them in the Execute function.

How do I listen for, load and run user-defined workflows at runtime that have been persisted using SqlWorkflowInstanceStore?

The result of SqlWorkflowInstanceStore.WaitForEvents does not tell me what type of workflow is runnable. The constructor of WorkflowApplication takes a workflow definition, and at a minimum, I need to be able to store a workflow ID in the store and query it, so that I can determine which workflow definition to load for the WorkflowApplication.
I also don't want to create a SqlWorkflowInstanceStore for each custom workflow type, since there may be thousands of different workflows.
I thought about trying to use WorkflowServiceHost, but not every workflow has a Receive activity and I don't think it is feasible to have thousands of WorkflowServiceHosts running, each supporting a different workflow type.
Ideally, I just want to query the database for a runnable workflow, determine its workflow definition ID, load the appropriate XAML from a workflow definition table, instantiate WorkflowApplication with the workflow definition, and call LoadRunnableInstance().
I would like to have a way to correlate which workflow is related to a given HasRunnableWorkflowEvent raised by the SqlWorkflowInstanceStore (along with the custom workflow definition ID), or have an alternate way of supporting potentially thousands of different custom workflow types created at runtime. I must also load balance the execution of workflows across multiple application servers.
There's a free product from Microsoft that does pretty much everything you say there, and then some. Oh, and it's excellent too.
Windows Server AppFabric. No, not Azure.
http://www.microsoft.com/windowsserver2008/en/us/app-main.aspx
-Oisin