How can I play HLS streams from Wasabi in Ant Media Server? - s3fs

I want to play my HLS streams from Wasabi. I enabled S3 options in Ant Media Server Dashboard. But it seems that Ant Media Server uploads HLS files after the stream ends. How can I play HLS chunks on Wasabi?

s3fs 1.88 and later buffers data locally and flushes according to the -o max_dirty_data flag, defaulting to 5 GB. If you reduce this value you should see updates more often. Note that these flushes require server-side copies and may do more IO than you anticipate.

We recommend S3 Fuse for instant transfer and deletion of your HLS files to S3. You do not need to activate S3 in the panel. If the streams folder of the application in the Ant Media directory is linked to a folder under s3, it automatically syncs to S3.
I briefly list the steps below:
Install s3fs
sudo apt install s3fs
You need to add the access key and secret key from wasabi account.
echo ACCESS_KEY_ID:SECRET_ACCESS_KEY > ${HOME}/.passwd-s3fs
chmod 600 ${HOME}/.passwd-s3fs
In order to mount S3, you need to update the mybucket below with the bucket in wasabi, add the folder you will mount and add the endpoint url to the url. For example: https://s3.us-west-1.wasabisys.com
You need to replace us-west-1 with your own region. You can access the Region parameter from the bucket list.
sudo s3fs -o dbglevel=info -o curldbg -o allow_other -o use_cache=/tmp/s3-cache mybucket /usr/local/antmedia/webapps/LiveApp/streams/ -o url=https://s3.us-west-1.wasabisys.com -o use_path_request_style
-o passwd_file=${HOME}/.passwd-s3fs
Please check the disk if mount is successful. You should see a line similar to below in the output when you run df
s3fs 274877906944 0 274877906944 0% /usr/local/antmedia/webapps/LiveApp/streams

Related

RMAN backup into Google Cloud Storage

I want to take Oracle database backup using RMAN directly into the Google Cloud Storage
I am unable to find the plugin to use to take the RMAN backups into Cloud Storage. We have a plugin for Amazon S3 and am looking for one such related to Google Cloud Storage.
I don't believe there's an official way of doing this. Although I did file a Feature Request for the Cloud Storage engineering team to look into that you can find here.
I recommend you to star the Feature Request, for easy visibility and access, allowing you to view its status updates. The Cloud Storage team might ask questions there too.
You can use gcsfuse to mount GCS bucket as file systems on your machine and use RMAN to create backups there.
You can find more information about gcsfuse on its github page. Here are the basic steps to mount a bucket and run RMAN:
Create a bucket oracle_bucket. Check that it doesn't have a retention policy defined on it (it looks like gcsfuse has some issues with retention policies).
Please have a look at mounting.md that describes credentials for GCS. For example, I created a service account with Storage Admin role and created a JSON key for it.
Next, set up credentials for gcsfuse on your machine. In my case, I set GOOGLE_APPLICATION_CREDENTIALS to the path to JSON key from step 1. Run:
sudo su - oracle
mkdir ./mnt_bucket
gcsfuse --dir-mode 755 --file-mode 777 --implicit-dirs --debug_fuse oracle_bucket ./mnt_bucket
From gcsfuse docs:
Important: You should run gcsfuse as the user who will be using the
file system, not as root. Do not use sudo.
Configure RMAN to create a backup in mnt_bucket. For example:
configure controlfile autobackup format for device type disk to '/home/oracle/mnt_bucket/%F';
configure channel device type disk format '/home/oracle/mnt_bucket/%U';
After you run backup database you'll see a backup files created in your GCS bucket.

How to copy yarn ssh logs automatically using scala to blob storage

We have a requirement to download the yarn ssh logs to blob storage automatically. I found that the yarn logs does get added to storage account under /app-logs/user/logs/ etc path but they are in a binary format and there is no documented way to convert these into text format. So we are trying to run the external command yarn logs -application <application_id> using scala at the end of our application run to capture the logs and save them to the blob storage but facing issues with that. Looking for a solution to get these logs automatically downloaded to storage account as part of the spark pipeline itself.
I tried redirecting the output of the yarn logs command to a temp file and then copying the file from local to blob storage. These commands work fine when I ssh into the head node of the spark cluster and run them. But they are not working when executed from jupyter notebook or scala application.
("yarn logs -applicationId application_1561088998595_xxx > /tmp/yarnlog_2.txt") !!
("hadoop dfs -fs wasbs://dev52mss#sahdimssperfdev.blob.core.windows.net -copyFromLocal /tmp/yarnlog_2.txt /tmp/") !!
When I run these commands using jupyter notebook, the first command works fine to redirect to a local file but the second one to move the file to blob fails with the following error:
warning: there was one feature warning; re-run with -feature for details
java.lang.RuntimeException: Nonzero exit value: 1
at scala.sys.package$.error(package.scala:27)
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.slurp(ProcessBuilderImpl.scala:132)
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang$bang(ProcessBuilderImpl.scala:102)
... 56 elided
Initially I tried capturing the output of the command as a Dataframe and writing the dataframe to blob. It succeeded for small logs but for huge logs it failed with the error:
Serialized task 15:0 was 137500581 bytes, which exceeds max allowed: spark.rpc.message.maxSize (134217728 bytes). Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values
val yarnLog = Seq(Process("yarn logs -applicationId " + "application_1560960859861_0003").!!).toDF()
yarnLog.write.mode("overwrite").text("wasbs://container#storageAccount.blob.core.windows.net/Dev/Logs/application_1560960859861_0003.txt")
Note: You can directly access the log files using Azure Storage => Blobs => Select Container => app logs
Azure HDInsight stores its log files both in the cluster file system and in Azure storage. You can examine log files in the cluster by opening an SSH connection to the cluster and browsing the file system, or by using the Hadoop YARN Status portal on the remote head node server. You can examine the log files in Azure storage using any of the tools that can access and download data from Azure storage.
Examples are AzCopy, CloudXplorer, and the Visual Studio Server Explorer. You can also use PowerShell and the Azure Storage Client libraries, or the Azure .NET SDKs, to access data in Azure blob storage.
For more details, refer "Manage logs for Azure HDInsight cluster".
Hope this helps.
Currently, you will need to use the 'yarn logs' command to view Yarn logs.
As regards your requirement, there are two methods to achieve this;
Method 1:
Schedule a daily copy of the app-logs folder into a desired container within the blob storage. This will do a differential copy every day at a specific time of the day. For this one, I had to use Azure Data Factory to achieve the scheduling. Quite easy and no manual copy or coding required.
However, because the yarn applications logs are stored in TFile binary format and can only be read using ‘yarn logs’ command, it means that you need to have another tool application to read the file when from the destination later on. You can use the tool here to read the files https://github.com/shanyu/hadooplogparser
Alternatively, you can have your own simple script that converts it to a readable file before the transfer. Sample script below
**
yarn logs -applicationId application_15645293xxxxx > /tmp/source/applog_back.txt
hadoop dfs -fs wasbs://hdiblob #sandboxblob.blob.core.windows.net -copyFromLocal /tmp/source/applog_back.txt /tmp/destination
**
Method 2:
This is the simplest and cheapest method. You can disable the retention period of the Yarn Application logs, this means the logs will be retained indefinitely. To do this, change the config “yarn.log-aggregation.retain-seconds” to value -1. This config can be found in yarn-site.xml.
Once this is done, you can always read your Yarn Applications logs anytime from the cluster using the Yarn UI or CLI.
Hope this helps

Performance of gsutil cp command has declined

We have observed that the gsutil cp command for copying a single file to google storage was better when few such processes where running to copy different single files to different location on google storage. The normal speed at that time was ~50mbps. But as "gsutil cp" processes to copy a single file to google storage have increased, the average speed these days has dropped to ~10mbps.
I suppose "gsutil -m cp" command will not improve performace as there is only 1 file to be copied.
What can be attributed to this low speed with increase in number of gsutil cp processes to copy many single files. What can we do increase the speed of these processes
gsutil can upload a single large file in parallel. It does so by uploading parts of the file as separate objects in GCS and then asking GCS to compose them together afterwards and then deleting the individual sub-objects.
N.B. Because this involves uploading objects and then almost immediately deleting them, you shouldn't do this on Nearline buckets, since there's an extra charge for deleting objects that have been recently uploaded.
You can set a file size above which gsutil will use this behavior. Try this:
gsutil -o GSUtil:parallel_composite_upload_threshold=100M cp bigfile gs://your-bucket
More documentation on the feature is available here: https://cloud.google.com/storage/docs/gsutil/commands/cp#parallel-composite-uploads

What is the fastest way to duplicate google storage bucket?

I have one 10TB bucket and need to create it's copy as quickly as possible. What is the fastest and most effective way of doing this?
Assuming you want to copy the bucket to another bucket in the same location and storage class, you could run gsutil rsync on a GCE instance:
gsutil -m rsync -r -d -p gs://source-bucket gs://dest-bucket
If you want to copy across locations or storage classes the above command will still work, but it will take longer because in that case the data (not just metadata) need to be copied.
Either way, you should check the result status and re-run the rsync command if any errors occurred. (The rsync command will avoid re-copying objects that have already been copied.) You should repeat the rsync command until the bucket has successfully been fully copied.
One simple way is to use Google's Cloud Storage Transfer Service. It may also be the fastest, though I have not confirmed this.
You can achieve this easily with gsutil.
gsutil -m cp -r gs://source-bucket gs://duplicate-bucket
Are you copying within Google Cloud Storage to a bucket with the same location and storage class? If so, this operation should be very fast. If the buckets have different locations and/or storage classes, the operation will be slower (and more expensive), but this will still be the fastest way.

how to rotate file while doing a streaming transfer to google cloud storage

We are working on a POC where we want to stream our web logs to google cloud storage. We learnt that objects on google cloud storage are immutable and cannot be appended from java api. However, we can do streaming transfers using gsutil according to this link https://cloud.google.com/storage/docs/concepts-techniques?hl=en#streaming
Now we would like to write hourly files. Is there a way to change the file name every hour like logrotate?
gsutil doesn't offer any logrotate-style features for object naming.
With a gsutil streaming transfer, the resulting cloud object is named according to the destination object in your gsutil cp command. To achieve rotation, your job that produces the stream could close the stream on an hourly basis, select a new filename, and issue a new streaming gsutil cp command.