CloudFormation parametes limit exceeded - aws-cloudformation

I have exceeded parameters limit in cloudformation template.Is there any way that I can keep all parameters in s3 as a file and include to my template

It appears that you are saying you have an AWS CloudFormation template that has a number of parameters, and you are unable to create the stack because there are too many parameters.
The CloudFormation limits documentation says:
Maximum number of parameters that you can declare in your AWS CloudFormation template: 60 parameters
To specify more parameters, you can use mappings or lists in order to assign multiple values to a single parameter.
This advice is suggested that some parameters could be combined and passed through as one parameter (a list). Alternatively, if multiple values are related to each other you could create a mapping such that one input value allows multiple values to be retrieved via a lookup.
Worst case, you could use an AWS Lambda function as a Custom Resource to perform additional logic and return values -- these values could be retrieved from an object in Amazon S3 or through some other processing.
Also, it is normally recommended to avoid having large, complex templates. Instead, break it into smaller templates and call nested templates. This makes it easier maintain and debug.

it appeared that the limit for the number of parameters has been increased to 200 as per documentation

Related

How can I apply different windows to one PCollection at once?

So my usecase is that the elements in my PCollection should be put into windows of different lengths (which are specified in the Row itself), but the following operations like the GroupBy are the same, so I don't want to split up the PCollection at this point.
So what I'm trying to do is basically this:
windowed_items = (
items
| 'windowing' >> beam.WindowInto(window.SlidingWindows(lambda row: int(row.WINDOW_LENGTH), 60))
)
However, when building the pipeline I get the error TypeError: '<=' not supported between instances of 'function' and 'int'.
An alternative to applying different windows to one PCollection would be to split/branch the PCollection based on the defined window into multiple PCollections and apply the respective window to each. However, this would mean to hardcode the windowing for every allowed value, and in my case this is possibly a huge number which is why I want to avoid it.
So from the error I'm getting (but not being able to find it explicitely in the docs) I understand that the SlidingWindows parameters have to be provided when building the pipeline and cannot be determined at runtime. Is this correct? Is there some workaround how I can apply different windows to one PCollection at once or is it simply not possible? If that is the case, are there any other alternative approaches to the one I outlined above?
I believe that custom session windowing is what you are looking for. However, it's not supported in the Python SDK yet.

Graphite/Grafana query wildcard for unknown level

Is it possible to write a query to match all of these metrics:
foo.bar.something1.ending_word
foo.bar.something1.something2.ending_word
foo.bar.something1.something_else.ending_word
foo.bar.something1.something2.something3.something4.ending_word
foo.bar.ending_word
Something like this:
foo.bar[*].ending_word
?
I am trying to use this to query data in Grafana:
If your metrics are not tagged metrics, then you have to use the normal target expression syntax, in which case wildcards cannot cover indeterminite levels. Reference the graphite doc on this:
All wildcards apply only within a single path element. In other words,
they do not include or cross dots (.). Therefore, servers.* will not
match servers.ix02ehssvc04v.cpu.total.user, while servers....
will.
If your metrics are tagged metrics, then you must use seriesByTag() and in this case the name may be treated as "just another" tag called "name" in the seriesByTag() function. You may use regular expressions in seriesByTag() by using "name=~regEx", which means you can use .* to cover as many levels as you want. See the graphite docs on querying by tags for additional information.
If you can't control the metric naming such that you can use tagging, I don't see a way to do what you want. Be warned that switching a metric to being tagged means that you'll have to change all existing references to it (as well as migrate the data).
There may be some way to do this in Grafana, but I don't know Grafana.

Looping over a range of numbers in Data Factory

I'm quite new to Data Factory and have a question regarding looping over a list of consecutive numbers.
I have been assigned a task to retrieve data from an api in Data Factory (based on a pre-existing template). The problem is that the api is split into multiple pages and the link is :".../2020/entries?skippages=1&pagesize=1000".
In my pipeline I therefore need to loop over the page number (the number of iterations is 11).
I have looked a bit into ForEach and Until loops but it seems a lot more complicated than need be.
What is best practice for such a task?
Hopefully, this makes sense. If not, please let me know and I will elaborate.
Thanks in advance.
Azure Data Factory (ADF) and Synapse Pipelines have a number of functions you can use in your pipelines, including range which generates a range of numbers.
All you have to do is specify range in the Items section of a ForEach loop. A simple example:
To explain the definition a bit further, all ADF expressions (not including Mapping Data Flows) start with the # symbol, range is the function, 1 is the start index and 11 is the count or max number to reach. See the help for the range function here.
In order to access the number inside the loop, use the item() syntax, remember # at the start of the expression.
As you are paging from a web API, you should have a good look at the Pagination section of the Copy activity which may offer an alternate and more dynamic approach depending on the capabilities of the API you are calling.

KubeFlow, handling large dynamic arrays and ParallelFor with current size limitations

I've been struggling to find a good solution for this manner for the past day and would like to hear your thoughts.
I have a pipeline which receives a large & dynamic JSON array (containing only stringified objects),
I need to be able to create a ContainerOp for each entry in that array (using dsl.ParallelFor).
This works fine for small inputs.
Right now the array comes in as a file http url due to pipeline input arguements size limitations of argo and Kubernetes (or that is what I understood from the current open issues), but - when I try to read the file from one Op to use as input for the ParallelFor I encounter the output size limitation.
What would be a good & reusable solution for such a scenario?
Thanks!
the array comes in as a file http url due to pipeline input arguements size limitations of argo and Kubernetes
Usually the external data is first imported into the pipeline (downloaded and output). Then the components use inputPath and outputPath to pass big data pieces as files.
The size limitation only applies for the data that you consume as value instead of file using inputValue.
The loops consume the data by value, so the size limit applies to them.
What you can do is make this data smaller. For example if your data is a JSON list of big objects [{obj1}, {obj2}, ... , {objN}], you can transform it to list of indexes [1, 2, ... , N], pass that list to the loop and then inside the loop you can have a component that uses the index and the data to select a single piece to work on N ->{objN}.

How to expose a function that takes two input files as a REST resource?

I need to expose a function, let's say compute that takes two input files: a plan file and a system file. The compute function uses to system file to see whether the plan in the plan file can be executed or not. It produces an output file containing the result of this check including recommendations for the plan.
I need to expose this functionality in a REST architecture and have no influence on the compute function itself (it is being developed by another organization). I can control the interface through which it is accessed.
What would be a recommended way to expose this functionality in a REST architecture?
Create a /compute resource that accepts multipart/form-data and POST your files to this. There is a fairly good and complete form based example of just such a REST service here