Google cloud storage gsutil tool with Java - google-cloud-storage

If we have around 30G files (ranged from 50MB to 4GB) need to be uploaded to Google Cloud Storage everyday, according to google docs, gsutil might be the only fitted choice, isn't it?
I want to call gsutil command by Java, now the code below can work. But If I delete that while loop, the program will stop immediately after the runtime.exec(command) but python process was started but doing no uploading and it will soon be killed. I wonder why.
The reason I read from sterr stream is inspired by Pipe gsutil output to file
I decide whether gsutil finish executing by read util the last line of its status output, but is it a reliable way? Is there any better ways to detect whether gsutil execution is end in Java?
String command="python c:/gsutil/gsutil.py cp C:/SFC_Data/gps.txt"
+ " gs://getest/gps.txt";
try {
Process process = Runtime.getRuntime().exec(command);
System.out.println("the output stream is "+process.getErrorStream());
BufferedReader reader=new BufferedReader(new InputStreamReader(process.getErrorStream()));
String s;
while ((s = reader.readLine()) != null){
System.out.println("The inout stream is " + s);
}
} catch (IOException e) {
e.printStackTrace();
}

There are certainly more than one way to upload 30G worth of data per day to GCS. Since you are working in Java, have you considered to use the Cloud Storage API Java client library?
https://developers.google.com/api-client-library/java/apis/storage/v1
As for the specific questions about calling gsutil from Java using Runtime.exec(), I suspect when there is no while loop, the program will exit immediately after creating the sub-process, causing the "process" variable to be GC'ed, which might kill the sub-process.
I think you should wait for the sub-process to complete, which is effectively what the while loop is doing. Or you can just call waitFor() and check the existValue() if you don't care about the output:
http://docs.oracle.com/javase/7/docs/api/java/lang/Process.html

I draw the following pic according to Zhihong Yao's explanation. Hope it can help anyone with the same question as mine.

Related

How to capture command line input from Vert.x

Env: Mac OS 12.1, JDK 17, Vert.x 4.2.4
Question: how to capture command line input from a verticle? Tried so far following in the public void start(Promise<Void> startPromise) throws Exception method:
getVertx().createSharedWorkerExecutor("sys-in").executeBlocking(promise -> {
try (final BufferedReader br = new BufferedReader(new InputStreamReader(System.in))) {
String line;
int count = 0;
do {
System.out.print("message to MC: ");
line = br.readLine();
count++;
//doSth(line); // e.g. send line over multicast
} while (count < 3);
} catch (Throwable t) {
// log.info("<start> ", t);
} finally {
// bye(); // send a final message and close vertx
promise.complete();
}
});
This will start, get 3 nulls from br, and exit. Also tried a separated ExecutorService, in vain. Couldn't find any help in Vert.x doc either. Any hints are appreciated:
aware of the warnings of Vert.x when doing blocking stuff
Vert.x might not meant to be used this way, but would be cool if it (reading from command line) can be done with the same toolkit
I understand what you are trying to accomplish, but the problem is that that goes against fundamentals of verticles concept. Waiting for user input is potentially infinitely blocking operation i.e. there is no guarantee user will ever input the values. In that case, you are left with the verticle that is hung forever, spending resources and stuck in one spot. Multiply this if you are using worker verticles and you might have serious problems with the app. This issue is also emphasized here: https://vertx.io/docs/vertx-core/java/#blocking_code (under Warning).
In the link provided you can also find a suggested solution with a separate thread solution. Non-vertx thread won't mind being blocked and when the user input is provided can inform the vertx part of the application via the event bus that the user input dependent code can now be executed.
This might not be the solution you had in mind since it's not pure vertx, but have in mind that vert.x is just another tool, and that tool is not a good fit for what you are trying to accomplish here. However, it can be paired well with plain Java and it won't mind.

PipedInputStream / PipedOutputStream, ImageIO and ffmpeg

I have the following code in Scala:
val pos = new PipedOutputStream()
val pis = new PipedInputStream(pos)
Future {
LOG.trace("Start rendering")
generateFrames(videoRenderParams.length) {
img ⇒ ImageIO.write(img, "PNG", pos)
}
pos.flush()
IOUtils.closeQuietly(pos)
LOG.trace("Finished rendering")
} onComplete {
case Success(_) ⇒
LOG.trace("Complete successfully")
case Failure(err) ⇒
LOG.error("Can't render stuff", err)
IOUtils.closeQuietly(pis)
IOUtils.closeQuietly(pos)
}
val prc = (ffmpegCli #< pis).!(logger)
the Future simply writes the generated images one by one to the OutputStream. Now the ffmpeg process reads the input images from stdin and converts them to MP4 file.
That works pretty well, but for some reason sometimes I'm getting the following stacktraces:
I/O error Pipe closed for process: <input stream>
java.io.IOException: Pipe closed
at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:260)
at java.io.PipedInputStream.receive(PipedInputStream.java:226)
at java.io.PipedOutputStream.write(PipedOutputStream.java:149)
at scala.sys.process.BasicIO$.loop$1(BasicIO.scala:236)
at scala.sys.process.BasicIO$.transferFullyImpl(BasicIO.scala:242)
at scala.sys.process.BasicIO$.transferFully(BasicIO.scala:223)
at scala.sys.process.ProcessImpl$PipeThread.runloop(ProcessImpl.scala:159)
at scala.sys.process.ProcessImpl$PipeSource.run(ProcessImpl.scala:179)
At the same time I'm getting the following error from another stream:
javax.imageio.IIOException: I/O error writing PNG file!
at com.sun.imageio.plugins.png.PNGImageWriter.write(PNGImageWriter.java:1168)
at javax.imageio.ImageWriter.write(ImageWriter.java:615)
at javax.imageio.ImageIO.doWrite(ImageIO.java:1612)
at javax.imageio.ImageIO.write(ImageIO.java:1578)
at
So it seems that the streams were broken somewhere in between, so ffmpeg can not read the data, and ImageIO can not write the data.
What is even more interesting - the problem is reproducible only on certain Linux server (Amazon). It works flawlessly on other Linux boxes. So I wonder if somebody could point me out to the possible causes of this error.
What I've tried so far:
use Oracle JDK 8 and OpenJDK
use different versions of FFMPEG
Nothing worked by the moment.
The problem was kind of predictable and weird at the same time. So there were ten concurrent ffmpeg processes scheduled to handle the input, and the input was a set of hundreds of FullHD pictures. Obviously that takes lot of computation capacity, hence the kernel randomly shut down ffmpeg processes, causing Java wrapper to report broken input and output pipe at the same time.
Thus /var/log/messages contained many of logs like below:
Out of memory: Kill process 25778 (java) score 159 or sacrifice child
Killed process 25931 (ffmpeg) total-vm:2337040kB, anon-rss:966340kB, file-rss:104kB
Reducing the number of concurrent ffmpeg processes solved the issue.

Mirth is reading too slow from disk

I am using Mirth 3.0.1 version. I am reading a file (using File Reader) having 34,000 records. Every record is having 45 columns and are pipe(|) separated. Mirth is taking too much time while reading the file from the disk. Mirth is installed on the same server where file is located.Earlier, I was facing the java head space issue which I resolved after setting the -Xms1024m -Xmx4096m in files mcserver.vmoptions & mcservice.vmoptions. Now I have to solve reading performance issue. Please find in attachment the channel for the same.
The answer to this problem is highly dependent on the solution itself. As an example, if you are doing transformations when you benchmark, it might be that the problem is not with reading the files, but rather with doing massive amounts of filtering and transformations in Mirth. Since Mirth converts everything you configure into basically one gigantic Javascript that executes on the server, it might just as well be that this is causing the performance problem. Pre-processor scripts might also create a problem if you do something that causes Mirth to read the whole file.
It migh also be that your 34.000 lines in the file contains huge quantities of information, simply making the file very big and extensive to process. If every record in the file is supposed to create new messages within Mirth, you might also want to check your batch settings for the reader.
And in addition to this, the performance of the read operations from disk is of course affected a lot by the infrastructure and hardware of the platform itself. You did mention that you are reading the files locally and that you had to increase the memory for Mirth. All of this could of course be a problem in itself. To make a benchmark you would want to compare this to something else. Maybe write a small Java program to just read the file to compare performance outside of Mirth.
Thanks for the suggestions.
I have used router.routeMessage('channelName','PartOfMsg') to route the 5000 records(from one channel to second channel) from the file having 34000 of records. This has helped to read faster from the file and processing the records at the same time.
For Mirth Community, below is the code to route the msg from one channel to other channel, this solution is also for the requirement if you have bulk of records to process in batches
In Source Transformer,
debug = "ON";
XML.ignoreWhitespace = true;
logger.debug('Inside source transformer "SplitFileIntoFiles" of channel: SplitFile');
var
subSegmentCounter = 0,
xmlMessageProcessCounter = 0,
singleFileLimit = 5000,
isError = false,
xmlMessageProcess = new XML(<delimited><row><column1></column1><column2></column2></row></delimited>),
newSubSegment = <row><column1></column1><column2></column2></row>,
totalPatientRecords = msg.children().length();
logger.debug('Total number of records found in the patient input file are: ');
logger.debug(totalPatientRecords);
try{
for each (seg in msg.children())
{
xmlMessageProcess.appendChild(newSubSegment);
xmlMessageProcess['row'][xmlMessageProcessCounter] = msg['row'][subSegmentCounter];
if (xmlMessageProcessCounter == singleFileLimit -1)
{
logger.debug('Now sending the 5000 records to the next channel from channel DOR Batch File Process IHI');
router.routeMessage('DOR SendPatientsToMedicare',xmlMessageProcess);
logger.debug('After sending the 5000 records to the next channel from channel DOR Batch File Process IHI');
xmlMessageProcessCounter = 0;
delete xmlMessageProcess['row'];
}
subSegmentCounter++;
xmlMessageProcessCounter++;
}// End of FOR loop
}// End of try block
catch (exception)
{
logger.error('The exception has been raised in source transformer "SplitFileIntoFiles" of channel: SplitFile');
logger.error(exception);
globalChannelMap.put('isFailed',true);
globalChannelMap.put('errDesc',exception);
return true;
}
if (xmlMessageProcessCounter > 1)
{
try
{
logger.debug('Now sending the remaining records to the next channel from channel DOR Batch File Process IHI');
router.routeMessage('DOR SendPatientsToMedicare',xmlMessageProcess);
logger.debug('After sending the remaining records to the next channel from channel DOR Batch File Process IHI');
delete xmlMessageProcess['row'];
}
catch (exception)
{
logger.error('The exception has been raised in source transformer "SplitFileIntoFiles" of channel: SplitFile');
logger.error(exception);
globalChannelMap.put('isFailed',true);
globalChannelMap.put('errDesc',exception);
return true;
}
}
return true;
// End of JavaScript
Hope, this will help.

Perl IPC - FIFO and daemons & CPU Usage

I have a mail parser perl script which is called every time a mail arrives for a user (using .qmail). It extracts a calendar attachment out of the mail and places the "path" of the file in a FIFO queue implemented using the Directory::Queue module.
Another perl script which reads the path of the calendar attachment and performs certain file operations on the local system as well as on the remote CalDAV server, is being run as a daemon, as explained here. So basically this script looks like:
my $declarations
sub foo {
.
.
}
sub bar {
.
.
}
while ($keep_running) {
for(keep-checking-the-queue-for-new-entries) {
sub caldav_logic1 {
.
.
}
sub caldav_logic2 {
.
.
}
}
}
I am using Proc::Daemon for running the script as a daemon. Now the problem is, this process has almost 100% CPU usage. What are the suggested ways to implement the daemon in a more standard, safer way ? I am using pretty much the same code as mentioned in the link mentioned for usage of Proc::Daemon.
I bet it is your for loop and checking for new queue entries.
There are ways to watch a directory for file changes. These ways are OS dependent but there might be a Perl module that wraps them up for you. Use that instead of busy looping. Even with a sleep delay, the looping is inefficient when you can have your program told exactly when to wake up by an OS event.
File::ChangeNotify looks promising.
Maybe you don't want truly continuous polling. Is keep-checking-the-queue-for-new-entries a CPU-intensive part of the code, even when the queue is empty? That would explain why your processor is always busy.
Try putting a sleep 1 statement at the very top (or very bottom) of the while loop to let the processor rest between queue checks. If that doesn't degrade the program performance too much (i.e., if everyone can tolerate waiting an extra second before the company calendars get updated) and if the CPU usage still seems high, try sleep 2, sleep 5, etc.
cpan Linux::Inotify2
The kernel knows when files change and sends this information to your program which runs the sub. Maybe this will be better because the program will run the sub only when the file is changed.

SWFUpload + jQuery.SWFUpload - Remove File From Queue

I'm facing a big issue IMO.
First, here's my code:
.bind('uploadSuccess', function(event, file, serverData){
if(serverData === 'nofile') {
var swfu = $.swfupload.getInstance('#form');
swfu.cancelUpload(file.id); // This part is not working :(
} else {
alert('File uploaded');
}
})
In this part I'm checking server response (I'm have strict validation restrictions). Now my question. Is it possible to remove uploaded file from queue? Basically, if server returns error I display error message, but... this file still exsit in the queue (I've implemented checking filename and filesize to avoid duplicated uploads) and user is not possible to replace this file (due to upload and queue limit).
I was trying to search for a solution, but without success. Any ideas?
Regards,
Tom
From the link
http://swfupload.org/forum/generaldiscussion/881
"The cancelUpload(file_id) function
allows you to cancel any file you have
queued.
You just have to keep the file's ID
value so you can pass it to
cancelUpload when you call it."
Probably you have to keep the file ID before sending anything to the server