Avformat cannot seek to beginning of file - libavcodec

I need to quickly seek thru H.264 encoded video stream in MP4 container. I am using libav to decode frames, so I stumbled upon avformat_seek_file() method.
My problem is, assuming H.264 stream begins with keyframe, when I seek to timestamp 0 (regardless of time_base), I should be at the beggining of the stream. But Im not. I usually get few seconds into video. Also, if I seek to, for example 10 seconds, I usually get around 12 or so. Is it possible for keyframes to be so "rare"? It seems that AVSEEK_FLAG_ANY has no impact on seek result. Tested on multiple FullHD H.264 MP4 videos.
Code:
unsigned long seekTo = 0;
//Doesen´t actually matter for 0 since it will be also 0
seekTo = av_rescale_q(seekTo, AVRational{1, AV_TIME_BASE}, pFormatCtx->streams[videoStream]->time_base);
int result = avformat_seek_file(pFormatCtx, videoStream, INT_FAST64_MIN, seekTo, seekTo, AVSEEK_FLAG_ANY);
avcodec_flush_buffers(pCodecCtx);

Try using av_seek_frame instead. Read here for some gotchas about using that and seeking around.

My problem is, assuming H.264 stream begins with keyframe, when I seek to timestamp 0 (regardless of time_base), I should be at the beggining of the stream
Note that some files can have their first keyframe at a negative DTS, e.g. you need to seek to timestamp -1 or something like this.
You can set the flag inside AVFMT_SEEK_TO_PTS into AVInputFormat::flags before opening the AVFormatContext to use PTS which will be 0-based.

Related

Is it possible to reorder and recombine fragments in fmp4?

I have fragmented mp4 which I want to send users through HLS. It's ok if I just send it as is. But I need opportunity to reorder fragments in this video.
For example initial video, which looks like this:
original video format
I want reorganize fragments and get this:
expected video format
I try make it locally, and it's work in VLS player (HLS). For this I modified sequence number for fragments in moof (mfhd). But when I try play it remotely (HLS) it does not work. I think, that some players (js) expect some additional information from each fragment, probably for example time offset. But I can not find which atom (box) contain this information. I spent a lot of time searching and I'm still at the very beginning of the problem.
I tried to modify the fragment sequence number, but it doesn't work.
The "Track Fragment Media Decode Time Box" (tfdt) stores the baseMediaDecodeTime which is the accumulative decode time.
Consider the following...
baseMediaDecodeTime must increase monotonically for each chunk.
This means you must update (replace) the tfdt entry of chunk with expected next tftd entry.
When you naively reorder the chunks, the baseMediaDecodeTime will be invalid.
The "Track Fragment Media Decode Time Box" (tfdt) is located inside each moof header at:
moof --> traf --> tfdt

Gapless playback in pyglet

I understood this page to mean that queuing in pyglet provides a gapless transition between audio tracks. But when I test it out, there is a noticeable gap. Has anyone here worked with gapless audio in pyglet?
Example:
player = pyglet.media.Player()
source1 = pyglet.media.load([file1]) # adding streaming=False doesn't fix the issue
source2 = pyglet.media.load([file2])
player.queue(source1)
player.queue(source2)
player.play()
player.seek([time]) # to avoid having to wait until the end of the track. removing this doesn't fix the gap issue
pyglet.app.run()
I would suggest you either edit your url1 and url2 into caching them locally if they're external sources. And then use Player().time to identify when you're about to reach the end. And then call player.next_source.
Or if it's local files and you don't want to programatically solve the problem you could chop up the audio files in something like Audacity to make them seamless on start/stop.
You could also experiment with having multiple players and layer them on top of each other. But if you're only interested in audio playback, there's other alternatives.
It turns out that there were 2 problems.
The first one: I should have used
source_group = pyglet.media.SourceGroup()
source_group.add(source1)
source_group.add(source2)
player.queue(source_group)
The second one: mp3 files are apparently slightly padded at the beginning and at the end, so that is where the gap is coming from. However, this does not seem to be an issue with any other file type.

MFCreateFMPEG4MediaSink does not generate MSE-compatible MP4

I'm attempting to stream a H.264 video feed to a web browser. Media Foundation is used for encoding a fragmented MPEG4 stream (MFCreateFMPEG4MediaSink with MFTranscodeContainerType_FMPEG4, MF_LOW_LATENCY and MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS enabled). The stream is then connected to a web server through IMFByteStream.
Streaming of the H.264 video works fine when it's being consumed by a <video src=".."/> tag. However, the resulting latency is ~2sec, which is too much for the application in question. My suspicion is that client-side buffering causes most of the latency. Therefore, I'm experimenting with Media Source Extensions (MSE) for programmatic control over the in-browser streaming. Chrome does, however, fail with the following error when consuming the same MPEG4 stream through MSE:
Failure parsing MP4: TFHD base-data-offset not allowed by MSE. See
https://www.w3.org/TR/mse-byte-stream-format-isobmff/#movie-fragment-relative-addressing
mp4dump of a moof/mdat fragment in the MPEG4 stream. This clearly shows that the TFHD contains an "illegal" base data offset parameter:
[moof] size=8+200
[mfhd] size=12+4
sequence number = 3
[traf] size=8+176
[tfhd] size=12+16, flags=1
track ID = 1
base data offset = 36690
[trun] size=12+136, version=1, flags=f01
sample count = 8
data offset = 0
[mdat] size=8+1624
I'm using Chrome 65.0.3325.181 (Official Build) (32-bit), running on Win10 version 1709 (16299.309).
Is there any way of generating a MSE-compatible H.264/MPEG4 video stream using Media Foundation?
Status Update:
Based on roman-r advise, I managed to fix the problem myself by intercepting the generated MPEG4 stream and perform the following modifications:
Modify Track Fragment Header Box (tfhd):
remove base_data_offset parameter (reduces stream size by 8bytes)
set default-base-is-moof flag
Add missing Track Fragment Decode Time (tfdt) (increases stream size by 20bytes)
set baseMediaDecodeTime parameter
Modify Track fragment Run box (trun):
adjust data_offset parameter
The field descriptions are documented in https://www.iso.org/standard/68960.html (free download).
Switching to MSE-based video streaming reduced the latency from ~2.0 to 0.7 sec. The latency was furthermore reduced to 0-1 frames by calling IMFSinkWriter::NotifyEndOfSegment after each IMFSinkWriter::WriteSample call.
There's a sample implementation available on https://github.com/forderud/AppWebStream
I was getting the same error (Failure parsing MP4: TFHD base-data-offset not allowed by MSE) when trying to play a fmp4 via MSE. The fmp4 had been created from a mp4 using the following ffmpeg comand:
ffmpeg -i myvideo.mp4 -g 52 -vcodec copy -f mp4 -movflags frag_keyframe+empty_moov myfmp4video.mp4
Based on this question I was able to find out that to have the fmp4 working in Chrome I had to add the "default_base_moof" flag. So, after creating the fmp4 with the following command:
ffmpeg -i myvideo.mp4 -g 52 -vcodec copy -f mp4 -movflags frag_keyframe+empty_moov+default_base_moof myfmp4video.mp4
I was able to play successfully the video using Media Source Extensions.
This Mozilla article helped to find out that missing flag:
https://developer.mozilla.org/en-US/docs/Web/API/Media_Source_Extensions_API/Transcoding_assets_for_MSE
The mentioned 0.7 sec latency (in your Status Update) is caused by the Media Foundation's MFTranscodeContainerType_FMPEG4 containterizer which gathers and outputs each roughly 1/3 seconds (from unknown reason) of frames in one MP4 moof/mdat box pair. This means that you need to wait 19 frames before getting any output from MFTranscodeContainerType_FMPEG4 at 60 FPS.
To output single MP4 moof/mdat per each frame, simply lie that MF_MT_FRAME_RATE is 1 FPS (or anything higher than 1/3 sec). To play the video at the correct speed, use Media Source Extensions' <video>.playbackRate or rather update timescale (i.e. multiply by real FPS) of mvhd and mdhd boxes in your MP4 stream interceptor to get the correctly timed MP4 stream.
Doing that, the latency can be squeezed to under 20 ms. This is barely recognizable when you see the output side by side on localhost in chains such as Unity (research) -> NvEnc -> MFTranscodeContainerType_FMPEG4 -> WebSocket -> Chrome Media Source Extensions display.
Note that MFTranscodeContainerType_FMPEG4 still introduces 1 frame delay (1st frame in, no output, 2nd frame in, 1st frame out, ...), hence the 20 ms latency at 60 FPS. The only solution to that seems to be writing own FMPEG4 containerizer. But that is order of magnitude more complex than intercepting of Media Foundation's MP4 streams.
The problem was solved by following roman-r's advise, and modifying the generated MPEG4 stream. See answer above.
Another way to do this is again using the same code #Fredrik mentioned but I write my own IMFByteStream and and I check the chunks written to the IMFByteStream.
FFMpeg writes the atoms almost once at a time. So you can check the atom name and do the mods. It is the same thing. I wish there was an MSE compliant windows sinker.
Is there one that can generate .ts files for HLS?

GPUImageMovieWriter frame presentationTime

I have a GPUImageColorDodgeBlend filter with two inputs connected:
A GPUImageVideoCamera which is getting frames from the iPhone video camera.
A GPUImageMovie which is an (MP4) video file that I want to have laid over the live camera feed.
The GPUImageColorDodgeBlend is then connected to two outputs:
A GPUImageImageView to provide a live preview of the blend in action.
A GPUImageMovieWriter to write the movie to storage once a record button is pressed.
Now, before the video starts recording, everything works OK 100% of the time. The GPUImageVideo is blended over the live camera video fine, and no issues or warnings are reported.
However, when the GPUImageMovieWriter starts recording, things start to go wrong randomly. About 80-90% of the time, the GPUImageMovieWriter works perfectly, there are no errors or warnings and the output video is written correctly.
However, about 10-20% of the time (and from what I can see, this is fairly random), things seem to go wrong during the recording process (although the on-screen preview continues to work fine).
Specifically, I start getting hundreds & hundreds of Program appending pixel buffer at time: errors.
This error originates from the - (void)newFrameReadyAtTime:(CMTime)frameTime atIndex:(NSInteger)textureIndex method in GPUImageWriter.
This issue is triggered by problems with the frameTime values that are reported to this method.
From what I can see, the problem is caused by the writer sometimes receiving frames numbered by the video camera (which tend to have extremely high time values like 64616612394291 with a timescale of 1000000000). But, then sometimes the writer gets frames numbered by the GPUImageMovie which are numbered much lower (like 200200 with a timescale of 30000).
It seems that GPUImageWriter is happy as long as the frame values are increasing, but once the frame value decreases, it stops writing and just emits Program appending pixel buffer at time: errors.
I seem to be doing something fairly common, and this hasn't been reported anywhere as a bug, so my questions are (answers to any or all of these are appreciated -- they don't all need to necessarily be answered sequentially as separate questions):
Where do the frameTime values come from -- why does it seem so arbitrary whether the frameTime is numbered according to the GPUImageVideoCamera source or the GPUImageMovie source? Why does it alternative between each -- shouldn't the frame numbering scheme be uniform across all frames?
Am I correct in thinking that this issue is caused by non-increasing frameTimes?
...if so, why does GPUImageView accept and display the frameTimes just fine on the screen 100% of the time, yet GPUImageMovieWriter requires them to be ordered?
...and if so, how can I ensure that the frameTimes that come in are valid? I tried adding if (frameTime.value < previousFrameTime.value) return; to skip any lesser-numbered frames which works -- most of the time. Unfortunately, when I set playsAtActualSpeed on the GPUImageMovie this tends to become far less effective as all the frames end up getting skipped after a certain point.
...or perhaps this is a bug, in which case I'll need to report it on GitHub -- but I'd be interested to know if there's something I've overlooked here in how the frameTimes work.
I've found a potential solution to this issue, which I've implemented as a hack for now, but could conceivably be extended to a proper solution.
I've traced the source of the timing back to GPUImageTwoInputFilter which essentially multiplexes the two input sources into a single output of frames.
In the method - (void)newFrameReadyAtTime:(CMTime)frameTime atIndex:(NSInteger)textureIndex, the filter waits until it has collected a frame from the first source (textureInput == 0) and the second, and then forwards on these frames to its targets.
The problem (the way I see it) is that the method simply uses the frameTime of whichever frame comes in second (excluding the cases of still images for which CMTIME_IS_INDEFINTE(frameTime) == YES which I'm not considering for now because I don't work with still images) which may not always be the same frame (for whatever reason).
The relevant code which checks for both frames and sends them on for processing is as follows:
if ((hasReceivedFirstFrame && hasReceivedSecondFrame) || updatedMovieFrameOppositeStillImage)
{
[super newFrameReadyAtTime:frameTime atIndex:0]; // this line has the problem
hasReceivedFirstFrame = NO;
hasReceivedSecondFrame = NO;
}
What I've done is adjusted the above code to [super newFrameReadyAtTime:firstFrameTime atIndex:0] so that it always uses the frameTime from the first input and totally ignores the frameTime from the second input. So far, it's all working fine like this. (Would still be interested for someone to let me know why this is written this way, given that GPUImageMovieWriter seems to insist on increasing frameTimes, which the method as-is doesn't guarantee.)
Caveat: This will almost certainly break entirely if you work only with still images, in which case you will have CMTIME_IS_INDEFINITE(frameTime) == YES for your first input'sframeTime.

Playback skipping/seeking in an MP4 file

I'm trying to figure out the proper technique for performing skipping ahead or seeking within an mp4 (or m4a) audio file while playing it using the AudioFileStream and AudioQueue APIs on the iPhone.
If I pass the complete mp4 header (up to the mdat box) to an open AudioFileStream, the underlying audio file type is properly identified (in my case, AAC) and when I then pass the actual mdat data portion of the file, the AudioFileStream correctly begins generating audio packets and these can be sent to the AudioQueue and playback works.
However, if I try a random access approach to the playing back the file, I can't seem to get it to work properly, unless I always send the first frame of the mdat box to the AudioFileStream. If instead, after sending the mp4 header to the AudioFileStream, I then attempt to initially skip ahead to a later frame in the mdat by first calling AudioFileStreamSeek() and then passing the data for the associated packets, the AudioFileStream appears to generate audio packets, but when I pass these on to the AudioQueue and call AudioQueuePrime(), I always get an error of 'nope' returned.
My question is this: am I always required to at least pass in the first packet of the mdat box before attempting to do random playback of other packets in the mp4 file?
I can't seem to find any documentation on doing random playback of sections of an mp4 file while using an AudioFileStream and an AudioQueue. I've found Apple's QuickTime File Format pdf which describes the technique of randomly seeking within an mp4 file, but it's just a high level description and doesn't have any mention of using specific APIs (such as AudioFileStream).
Thanks for any insights.
It turns out the approach I was using with AudioFileStreamSeek() is valid, I just wasn't sending the full initial mp4 header to the AudioFileStreamParseBytes() routine.
The problem was I had assumed the packets began immediately after the mdat box tag. By examining the data offset value (kAudioFileStreamProperty_DataOffset) returned by the AudioFileStream Property Listener callback, I discovered the true start of the packet data was 18 bytes later.
These 18 bytes are considered part of the initial mp4 header that must be sent to the AudioFileStream parser before sending the data of arbitrary packets after calls to AudioFileStreamSeek().
If these extra bytes are left out, then the AudioQueuePrime() call will always fail with a 'nope' error even though you may have sent valid parsed audio packets to the AudioQueue.