How to check in Nifi whether all the flowfile or messages in a process group has cleared or not? - rest

Is there a way to check whether all the messages/flow-files in a processor group has cleared or not using rest api in shell script?
Basically, my use case is that I am stopping a nifi processor in my shell script via curl command. Then I need to wait till all the queues are empty and flow files have passed through before proceeding further.

Yes, you can query the status of individual connections or the process group status via the API. The easiest way to do this is perform the action in your browser and use the Developer Tools to monitor the request and copy/paste this to your invoking tool.

Related

How does a command output can be accessed through UDEPLOY REST API

I am using https://host/cli/componentProcessRequest/info/ to get the information about a component process request execution details. But this gives basic information not detailed like logs.
In this process we execute a shell script. I want to get the shell output log through REST API. Is there any way that i can achieve this ?

How Do I Send An Message To A Running Yarn Application?

I want to have my application already started on my YARN cluster and allow the users to send additional commands. I am still in the design phase, but I'm confused on the best way about going about this. Is this possible? Could the user send some sort of REST command to the Application Master or Resource Manager that could then be passed to the running YARN Application?
You can if you're willing to build a custom AM and write your own REST API but writing a custom AM is not trivial. As for the RM, you can kill your application or move to another queue via REST API calls but not much else.
https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html

Find out which program reads from a given MSMQ queue

I have a queue that should only be read by a specific program. However, I have discovered that some other program is "stealing" messages from that queue. Is there a way to determine which program does that? I couldn't come up with anything.
For a given service account, set read-message and peek-message permissions on the queue, and then ensure that only the consumer runs under that service account.
UPDATE
On Windows server 2008 or Windows 7 or higher, MSMQ has a dedicated system event log which records everything the MSMQ subsystem does. It may be possible to see what user account is accessing the queue via this.

Tailing 'Jobs' with Perl under mod_perl

I've got this project running under mod_perl shows some information on a host. On this page is a text box with a dropdown that allows users to ping/nslookup/traceroute the host. The output is shown in the text box like a tail -f.
It works great under CGI. When the user requests a ping it would make an AJAX call to the server, where it essentially starts the ping with the output going to a temp file. Then subsequent ajax calls would 'tail' the file so that the output was updated until the ping finished. Once the job finished, the temp file would be removed.
However, under mod_perl no matter what I do I can's stop it from creating zombie processes. I've tried everything, double forking, using IPC::Run etc. In the end, system calls are not encouraged under mod_perl.
So my question is, maybe there's a better way to do this? Is there a CPAN module available for creating command line jobs and tailing output that will work under mod_perl? I'm just looking for some suggestions.
I know I could probably create some sort of 'job' daemon that I signal with details and get updates from. It would run the commands and keep track of their status etc. But is there a simpler way?
Thanks in advance.
I had a short timeframe on this one and had no luck with CPAN, so I'll provide my solution here (I probably re-invented the wheel). I had to get something done right away.
I'll use ping in this example.
When ping is requested by the user, the AJAX script creates a record in a database with the details of the ping (host, interval, count etc.). The record has an auto-incrementing ID field. It then sends a SIGHUP to to a job daemon, which is just a daemonised perl script.
This job daemon receives the SIGHUP, looks for new jobs in the database and processes each one. When it gets a new job, it forks, writes the PID and 'running' status to the DB record, opens up stdout/stderr files based on the unique job ID and uses IPC::Run to direct STDOUT/STDERR to these files.
The job daemon keeps track of the forked jobs, killing them if they run too long etc.
To tail the output, the AJAX script send back the job ID to the browser. Then on a Javascript timer, the AJAX script is called which basically checks the status of the job via the database record and tails the files.
When the ping finishes, the job daemon sets the record status to 'done'. The AJAX script checks for this on it's regular status checks.
One of the reasons I did it this way is that the AJAX script and the job daemon talk through and authenticated means (the DB).

Scheduled Tasks for Web Applications

What are the different approaches for creating scheduled tasks for web applications, with or without a separate web/desktop application?
If we're talking Microsoft platform, then I'd always develop a separate Windows Service to handle such batch tasks.
You can always reference the same assemblies that are being used by your web application to avoid any nasty code duplication.
Jeff discussed this on the Stack Overflow blog -
https://blog.stackoverflow.com/2008/07/easy-background-tasks-in-aspnet/
Basically, Jeff proposed using the CacheItemRemovedCallback as a timer for calling certain tasks.
I personally believe that automated tasks should be handled as a service, a Windows scheduled task, or a job in SQL Server.
Under Linux, checkout cron.
I think Stack Overflow itself is using an ApplicationCache expiration to run background code at intervals.
If you're on a Linux host, you'll almost certainly be using cron.
Under linux you can use cron jobs (http://www.unixgeeks.org/security/newbie/unix/cron-1.html) to schedule tasks.
Use URL fetchers like wget or curl to make HTTP GET requests.
Secure your URLs with authentication so that no one can execute the tasks without knowing the user/password.
I think Windows' built-in Task Scheduler is the suggested tool for this job. That requires an outside application.
This may or may not be what you're looking for, but read this article, "Simulate a Windows Service using ASP.NET to run scheduled jobs". I think StackOverflow may use this method or it was at least talked about using it.
A very simple method that we've used where I work is this:
Set up a webservice/web method that executes the task. This webservice can be secured with username/pass if desired.
Create a console app that calls this web service. If desired, you can have the console app send parameters and/or get back some sort of metrics for output to the console or external logging.
Schedule this executable in the task scheduler of choice.
It's not pretty, but it is simple and reliable. Since the console app is essentially just a heartbeat to tell the app to go do its work, it does not need to share any libraries with the application. Another plus of this methodology is that it's fairly trivial to kick off manually when needed.
Use URL fetchers like wget or curl to make HTTP GET requests.
Secure your URLs with authentication so that no one can execute the tasks without knowing the user/password.
You can also tell cron to run php scripts directly, for example. And you can set the permissions on the PHP file to prevent other people accessing them or better yet, don't have these utility scripts in a web accessible directory...
Java and Spring -- Use quartz. Very nice and reliable -- http://static.springframework.org/spring/docs/1.2.x/reference/scheduling.html
I think there are easier ways than using cron (Linux) or Task Scheduler (Windows). You can build this into your web-app using:
(a) quartz scheduler,
or if you don't want to integrate another 3rd party library into your application:
(b) create a thread on startup which uses the standard Java 'java.util.Timer' class to run your tasks.
I recently worked on a project that does exactly this (obviously it is an external service but I thought I would share).
https://anticipated.io/
You can receive a webhook or an SQS event at a specific scheduled time. Dealing with these schedulers can be a pain so I thought I'd share in such case someone is looking to offload their concerns.