can any one please help me for how I can increase the PostgreSQL - max number of parameters length. I don't want to do any other way i want to use normal query as I am using. but if I am passing 90,000 parameters in IN Query then how I make it possible to execute this query?
If you believe this page https://msdn.microsoft.com/en-us/library/ms143432.aspx the number of parameters for example in a stored proc, statement, function, ... are fix.
Related
I'm using Dataprep on GCP to wrangle a large file with a billion rows. I would like to limit the number of rows in the output of the flow, as I am prototyping a Machine Learning model.
Let's say I would like to keep one million rows out of the original billion. Is this possible to do this with Dataprep? I have reviewed the documentation of sampling, but that only applies to the input of the Transformer tool and not the outcome of the process.
You can do this, but it does take a bit of extra work in your Recipe--set up a formula in a new column using something like RANDBETWEEN to give you a random integer output between 1 and 1,000 (in this million-to-billion case). From there, you can filter rows based on whatever random integer between 1 and 1,000 as what you'll keep, and then your output will only have your randomized subset. Just have your last part of the recipe remove this temporary column.
So indeed there are 2 approaches to this.
As Courtney Grimes said, you can use one of the 2 functions that create random-number out of a range.
randbetween :
rand :
These methods can be used to slice an "even" portion of your data. As suggested, a randbetween(1,1000) , then pick 1<x<1000 to filter, because it's 1\1000 of data (million out of a billion).
Alternatively, if you just want to have million records in your output, but either
Don't want to rely on the knowledge of the size of the entire table
just want the first million rows, agnostic to how many rows there are -
You can just use 2 of these 3 row filtering methods: (top rows\ range)
P.S
By understanding the $sourcerownumber metadata parameter (can read in-product documentation), you can filter\keep a portion of the data (as per the first scenario) in 1 step (AKA without creating an additional column.
BTW, an easy way of "discovery" of how-to's in Trifacta would be to just type what you're looking for in the "search-transtormation" pane (accessed via ctrl-k). By searching "filter", you'll get most of the relevant options for your problem.
Cheers!
I need help on a basic calculation that I'm unable to figure on Tableau.
I am trying to setup a calculated field that has dependency on its previous value to calculate its current value. Here is a simple example from Excel -
Sample Exhibit
As you can see, each value in a row is dependent on its previous value and multiplied by a constant.
In Tableau, when I'm trying to create a calculated field, it is not letting me refer to itself (-1 lagged value) in the code. I'd appreciate any help on how this can be resolved. Thanks in advance!
Tableau can do this client side with a table calc. You’ll have to learn how table calcs operate from the help- especially partitioning and addressing. Then you can use the function Previous_Value() to refer to the previous value. Practice on something simple first to make sure you understand how previous value() works. Hint, the argument to that function doesn’t mean what most people assume it means
If you want to perform this calculation server side instead, then you’ll need to use custom SQL so you can specify an analytic aka windowing query
Check the LOOKUP field to get the value from the preceding row. For example: LOOKUP(SUM([Value]),-1)
https://help.tableau.com/current/pro/desktop/en-us/functions_functions_tablecalculation.htm#lookupexpression-offset
You may need to make yourself familiar with the Table Calculation partitioning if not getting the expected result.
Sorry if this seems trivial, but I am fairly new to Tableau. I have a simple table that has 1 dimension for columns and 1 dimension for rows. My Marks are the Count of a third dimension. I'd like to divide only 1 of the columns in the table by a constant but not all of them. When I have tried conditional statements, I receive the error regarding mix of non-aggregate and aggregate statements.
What is the best way to divide a single column's values based upon a condition?
Thanks in advance.
Typically the error regarding non-aggregate and aggregate statements can be resolved using the ATTR() function.
SUM([Sales]) / [Constant]
Turns to:
SUM([Sales]) / ATTR([Constant])
Or conversely, which might or might not fit your data:
[Sales] / [Constant]
You just cant mix the two as in the first example.
Edit
This is probably a more accurate place for the ATTR() function given what I'm guessing is your use case:
If ATTR([Segment]) = 'Corporate'
Then COUNT(Sales) / SUM([Constant])
END
Try turning the constant to a discrete measure and see if that works. (right-click on measure and select 'discrete')
Also, without seeing the conditional code you are using, you probably need to wrap the entire condition with count() in order to not get the Aggregate/Non-Aggregate error, like this:
Count(If [MyDimension] = "XX" then [MyOtherDimension] else Null End)
NOT like this:
If [MyDimension] = "XX" then Count([MyOtherDimension]) else Null End
I have multiple Prometheus instances providing the same metric, such as:
my_metric{app="foo", state="active", instance="server-1"} 20
my_metric{app="foo", state="inactive", instance="server-1"} 30
my_metric{app="foo", state="active", instance="server-2"} 20
my_metric{app="foo", state="inactive", instance="server-2"} 30
Now I want to display this metric in a Grafana singlestat widget. When I use the following query...
sum(my_metric{app="foo", state="active"})
...it, of course, sums up all values and returns 40. So I tell Prometheus to sum it by instance...
sum(my_metric{app="foo", state="active"}) by (instance)
...which results in a "Multiple Series Error" in Grafana. Is there a way to tell Prometheus/Grafana to only use the first of the results?
I don't know of a distinct, but I think this would work too:
topk(1, my_metric{app="foo", state="active"} by (instance))
Check out the second to last example in here:
https://prometheus.io/docs/prometheus/latest/querying/examples/
One way I just found is to additionally do an average over all values:
avg(sum(my_metric{app="foo", state="active"}) by(instance))
If you need to return an arbitrary time series out of multiple matching time series, then this can be done with topk() or bottomk() functions. For example, the following query returns a single time series with the maximum value out of multiple time series which match my_metric{app="foo", state="active"}:
topk(1, my_metric{app="foo", state="active"})
You need to set instant query option in Grafana when using topk(). Otherwise topk(1, ...) may return multiple time series when it is used for building a graph with range query. This is because topk(1, ...) selects a single time series with the max value individually per each point on the graph. Different points on the graph may have different time series with the max value. There is a workaround, which allows returning a single series out of many series on a graph in alternative Prometheus-like systems such as VictoriaMetrics. It provides topk_* and bottomk_* functions for this purpose. See, for example, topk_last or topk_avg.
Note that topk() has no common grounds with DISTINCT from SQL. If you need to select distinct label values with PromQL, then you need to use count(...) by (label). It will return unique label values for the given label alongside the number of unique time series per each label value. For example, count(my_metric) by (app) will return unique app label names for time series with my_metric name. This is roughly equivalent to the following SQL with DISTINCT clause:
SELECT DISTINCT app FROM my_metric
See count() docs for details.
From every references that I search how to do cumulative sum / running total. they said it's better using windows function, so I did
select grandtotal,sum(grandtotal)over(order by agentname) from call
but I realize that the results are okay as long as the value of each rows are different. Here is the result :
Is There anyway to fix this?
You might want to review the documentation on window specifications (which is here). The default is "range between" which defines the range by the values in the row. You want "rows between":
select grandtotal,
sum(grandtotal) over (order by agentname rows between unbounded preceding and current row)
from call;
Alternatively, you could include an id column in the sort to guarantee uniqueness and not have to deal with the issue of equal key values.