I have a workflow engine, with the help of a rest api, i am starting concurrent workflows, each api call returns a uuid. I want to query all these uuid's as well concurrently, how can I do that with jmeter?
If you want to start workflows and query them within the bounds of a single test - just extract the IDs of started ones using suitable JMeter Post-Processor and query them using a separate Sampler.
If you want to create workflows first and then just query them, i.e. you're not interested in creation time and associated metrics - you can create the workflows in setUp Thread Group and write the uuids into a file using Flexible File Writer. Then in the "normal" Thread Group use CSV Data Set Config to read the uuids from the file.
Related
I have a use case where I am using spring batch and writing to 3 different data sources based on the job parameters. All of this mechanism is working absolutely fine but the only problem is the meta data. Spring batch is using the default data Source to write the metadata . So whenever I write the data for a job, the transactional data always goes to the correct DB but the batch metadata always goes to default DB.
Is it possible to selectively write the meta data also to the respective databases based on the jobs parameter?
#michaelMinella , #MahmoudBenHassine Can you please help.
I want to export the data from Cloud SQL (postgres) to a CSV file periodically (once a day for example) and each time the DB rows are exported it must not be exported in the next export task.
I'm currently using a POST request to perform the export task using cloud scheduler. The problem here (or at least until I know) is that it won't be able to export and delete (or update the rows to mark them as exported) in a single http export request.
Is there any possibility to delete (or update) the rows which have been exported automatically with any Cloud SQL parameter in the http export request?
If not, I assume it should be done it a cloud function triggered by a pub/sub (using scheduler to send data once a day to pub/sub) but, is there any optimal way to take all the ID of the rows retrieved from the select statment (which will be use in the export) to delete (or update) them later?
You can export and delete (or update) at the same time using RETURNING.
\copy (DELETE FROM pgbench_accounts WHERE aid<1000 RETURNING *) to foo.txt
The problem would be in the face of crashes. How can you know that foo.txt has been writing and flushed to disk, before the DELETE is allowed to commit? Or the reverse, foo.txt is partially (or fully) written, but a crash prevents DELETE from committing.
Can't you make the system idempotent, so that exporting the same row more than once doesn't create problems?
You could use a set up to achieve what you are looking for:
1.Create a Cloud Function to extract the information from the database that subscribes to a Pub/Sub topic.
2.Create a Pub/Sub topic to trigger that function.
3.Create a Cloud Scheduler job that invokes the Pub/Sub trigger.
4.Run the Cloud Scheduler job.
5.Then create a trigger which activate another Cloud Function to delete all the data require from the database once the csv has been created.
Here I leave you some documents which could help you if you decide to follow this path.
Using Pub/Sub to trigger a Cloud Function:https://cloud.google.com/scheduler/docs/tut-pub-sub
Connecting to Cloud SQL from Cloud Functions:https://cloud.google.com/sql/docs/mysql/connect-functionsCloud
Storage Tutorial:https://cloud.google.com/functions/docs/tutorials/storage
Another method aside from #jjanes would be to partition your database by date. This would allow you to create an index on the date, making exporting or deleting a days entries very easy. With this implementation, you could also create a Cron Job that deletes all tables older then X days ago.
The documentation provided will walk you through setting up a Ranged partition
The table is partitioned into “ranges” defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example, one might partition by date ranges, or by ranges of identifiers for particular business objects.
Thank you for all your answers. There are multiples ways of doing this, so I'm goint to explain how I did it.
In the database I have included a column which contains the date when the data was inserted.
I used a cloud scheduler with the following body:
{"exportContext":{"fileType": "CSV", "csvExportOptions" :{"selectQuery" : "select \"column1\", \"column2\",... , \"column n\" from public.\"tablename\" where \"Insertion_Date\" = CURRENT_DATE - 1" },"uri": "gs://bucket/filename.csv","databases": ["postgres"]}}
This scheduler will be triggered once a day and it will export only the data of the previous day
Also, I have to noticed that in the query I used in cloud scheduler you can choose which columns you want to export, doing this you can avoid to export the column which include the Insertion_Date and use this column only an auxiliary.
Finally, the cloud scheduler will create automatically the csv file in a bucket
I am trying to access Apache Ignite over the http-rest api . I see that the api mostly provides ability to request data with a specific key ( meaning you should always know/have the key to query data) .
However i would like to understand
1) if we have the ability to say query a set of records which are filtered based on one or more of the value fields of the POJO value.
2) If we can run join like sql-queries through the rest api when my data is as part of more then 1 cache and some fields in them have common values to create the relation among caches.
Take a look at the REST API documentation for the list of the available commands - there are commands that allow to execute SQL.
Also take a look at the Ignite SQL documentation for the syntax reference and some examples.
Finally, please see the Ignite SQL examples - you can find them in the Ignite distribution or in the git repository. E.g. SqlDmlExample should give a notion of how to execute various SQL queries on an Ignite cache.
I have to upsert around 80 K records on daily nightly batch. I was thinking of using the REST service to get the records in an array, and create a batch apex to upsert them in Salesforce. I will not be getting the CSV and is out of question.
Will I able to iterate the record of array in batch apex start method to process them in batch?
Is this right approach?
Is there a better approach for getting the large amount of records and process them in bulk?
The REST API upserts records to the database. You cannot use it to pass data to a batch apex class. You can use the REST API to save data and then later have a batch class perform additional operations on that data but why bother if you can just have that logic performed as part of your upsert.
I have a cloud dataflow job that does a bunch of processing for an appengine app. At one stage in the pipeline, I do a group by a particular key, and for each record that matches that key I would like to write a file to Cloud Storage (using that key as part of the file name).
I don't know in advance how many of these records there will be. So this usage pattern doesn't fit the standard cloud dataflow data sink pattern (where the sharding of that output stage determines the # output files, and I have no control over the output file names per shard).
I am considering writing to Cloud Storage directly as a side-effect in a ParDo function, but have the following queries:
Is writing to cloud storage as a side-effect allowed at all?
If I was writing from outside a dataflow pipeline, it seems I should use the Java client for the JSON cloud storage API. But that involves authenticating via OAUTH to do any work: and that seems inappropriate for a job already running on GCE machines as part of a dataflow pipeline: will this work?
Any advice gratefully received.
Answering the first part of your question:
While nothing directly prevents you from performing side-effects (such as writing to Cloud Storage) in our pipeline code, usually it is a very bad idea. You should consider the fact that your code is not running in a single-threaded fashion on a single machine. You'd need to deal with several problems:
Multiple writers could be writing at the same time. You need to find a way to avoid conflicts between writers. Since Cloud Storage doesn't support appending to an object directly, you might want to use Composite objects techniques.
Workers can be aborted, e.g. in case of transient failures or problems with the infrastructure, which means that you need to be able to handle the interrupted/incomplete writes issue.
Workers can be restarted (after they were aborted). That would cause the side-effect code to run again. Thus you need to be able to handle duplicate entries in your output in one way or another.
Nothing in Dataflow prevents you from writing to a GCS file in your ParDo.
You can use GcpOptions.getCredential() to obtain a credential to use for authentication. This will use a suitable mechanism for obtaining a credential depending on how the job is running. For example, it will use a service account when the job is executing on the Dataflow service.