Generate Audio using a VTT as the text source with Google Text to Speech? - google-text-to-speech

For sign language videos, there is no audio already present normally; however, we do provide the VTT caption file for captioning into English. The VTT does have a time-start and time-end for each text block of the cues.
I am wondering if it's possible to use the VTT as the text source to generate the Text to Speech audio, wherein the speed is controlled by the time codes in the caption file.
Currently not finding anything. Usually it's the other way around--audio to subtitles--but I want to work subtitles to audio (US English).

Related

Subtitles for DRM content

I use Azure Media Player v2.3.2. There are existing encoded videos that work great to which I should include subtitles (VTT). Videos are DRM protected. I cannot find any documentation discussing the topic, just few (too) simple samples.
Should I use one separate shared container holding ALL subtitles of ALL media assets, or should I create one separate asset for subtitles of each media asset?
If I should create separate assets for each video, how can the media player know what subtitle asset belongs to the media asset as they're separete?
If I should create one shared asset for all subtitles of all media files, is there any limitations on how many subtitles I can have in this asset?
How does the media player know what all subtitles are available in the subtitle asset, to begin with?
How should I secure this subtitle asset to prevent downloading subtitles from it?
Jussi,
Thanks for asking. It is a difficult topic to find a lot of information on in our docs and samples.
We recommend keeping all the VTT files in the same Asset. We recently introduced a new "Tracks" API on Asset as well to make it easier to "late-bind" caption and audio tracks to an Asset. Docs are still being worked on, but I have a sample in Typescript up here that shows how to add a VTT track to an existing asset - https://github.com/Azure-Samples/media-services-v3-node-tutorials/blob/main/Assets/add-WebVTT-tracks.ts
Once you add a track of VTT, AMS will convert that to an IMSC text track that AMP and other players can understand. You have to set the language code correctly, and then tell AMP which language code IMSC1 tracks to display
Don't create separate assets. Not required
I don't believe we set a limit on Tracks at all... But I need to check with dev.
AMP only knows what subtitles are there if you configure it to find them. The AMP API has a setting on it to describe which IMSC1 to load. You can do this in the imsc1caption settings. For example, if you add an 'en-us' VTT file with the tracks API, you have to set the AMP player up to look for the 'en-us' IMSC1 file. It will read that from the DASH manifest.
https://amp.azure.net/libs/amp/latest/docs/index.html#amp.player.imsc1captionssettings
For other 3rd party players, they will read the caption track information from the DASH or HLS manifest file. We decorate them in the manifest appropriately according to those specifications so that the players can parse and load them. Each player differs a bit in support, and there are some known issues across players.

Writing MP4 tags for M4A or MP4 audio files

I have a strange problem with MP4 tagging.. I can figure out 2 styles of tags, one that works with mp3tag and tagscanner, another that works with MusicBee.. But I can't figure out one that universally works with all of those. So I write 2 sets of tags into the file...
and even this isn't enough.. Players like AIMP and Clementine still can't read MP4 files I tagged this way. I need to open mp3tag load my files and save them.. then it will write tags that those music players understand.. but I can't find good documentation anywhere.
Does anyone know what kind of tags I need to write to make all of them be able to read the tags? I tried to look mp4s that work in all of them and it is no use, I see tags like "Artist".. I already write a tag called "Artist".. I mean it looks like "Artist" in exif also, this is the tag that I wrote that MusicBee understands.
I use the AudioGenie Windows Library to write the tags. There are 2 different methods for writing a tag.. one is called an ISLT text frame (which I have no idea what that is) and requires an integer code as well as text when writing. Another is called an iTune text frame and requires a string frame ID as well as text.
I tried to shove MP3 ID3v2 tags in both of those as well, to see if that was what the third group of players that can't read my tags wanted. But that didn't work. I only tried this because I read somewhere that ID3v2 tags are widely used in MP4 files (it was only on one comment in stackoverflow that I read this, so I'm skeptical)
Could someone point me in the right direction?

Extract subtitles from mp4 without time marks or position locators

I have an mp4 from a university lecture that has embedded subtitles. I am aware there are tools to extract the captions.
The lecturer is reading from a script which we don't have access to. I just want to extract all the text of the subtitles without the time stamps so I have the text in a word document to study.
Is this possible? If not, is there any tool or script that could help me eliminate the post-extraction time stamps?

asp.net web application to convert pdf to word

Is there any clear and proper process to convert a pdf file into a word file with all formatting and images in asp.net web application?
The best way to do that is by using the OCR. It will recognize the text and the images in the PDF file, and then you can save it on a DOC file. I know a third party toolkit named leadtools that should help you doing your requirements, since it support the ASP.NET environment. You can check their Online OCR Demo
Also, you can check their website for more information, or contact their support team.
PDF is a presentational format where all the content is placed by absolute positions. There are no paragraphs and other structured elements (unless it is a Tagged PDF). Technically, you can output every word character by character in any order, but visually it would look like a normal text. Thus, to make a proper conversion to word it is required to do content recognition or some kind of OCR (e.g. ABBYY FineReader)
There are some paid components on the market that allow to do text extraction and some do converting pages to images (obviously, this is not a desired approach for converting into word).

Looking for a way to output the top viewed youtube video to a text file for a search term

I would like to output just the top youtube video for a particular search term, e.g Tennis to a text file. Command line options are what I prefer but am open to other solutions.
You can fetch the data you need in XML format from YouTube's API.
(Note: The results may differ from the HTML website)
Then parse the XML with anything you want, e.g. Perl's XML::LibXML::XPathContext. It's a bit fiddly though, if you haven't used that module before.
Once you have the video URL, you can pass it to youtube-dl.