I'm wondering why Vorbis needs any container at all? I know you can stick Vorbis in an Ogg container or a Matroska container, but if I'm not going to bundle it with any video or any other multimedia why can't the Vorbis data stand alone in its own file?
Has anyone had any experience doing this? I googled before searching SO and I only found a single mention in the oggvorbis mailing list with no details.
It is completely possible. You do not need to know before hand the length of any Vorbis packet (whether they are headers or audio) to be able to decode them. Without the Ogg wrapper (or an alternative wrapper) you will miss out on a few things, but they might not be important for your application:
The page checksums - not too important if you are reading from a disk or other rather reliable source
The page granule/last sample positions - useful for improving seeking performance, and for specifying
However, you can pretty trivially make a pure Vorbis bytestream from a ogg file (given there is only one Vorbis stream in it) by:
Skipping 26 bytes
n = read 1 byte
countOfBytesToRead = sum of next n bytes
Read countOfBytesToRead bytes into your Vorbis bytestream
Repeat 1-4 until Ogg file is exhausted
Related
Does Exoplayer download a chunk completely before processing (decrypting, decoding) it. Is there a way to override this and start decoding / playback before the chunk is completely downloaded.The content is an MPEG-DASH content with a 6 second chunk size.
I am on the latest version of Exoplayer. I am trying to improve the Video Start Time and hence this query. Also, will smaller chunk sizes impact the Video start time ?
I think you mean a dash segment when you say chunk - the terminology is important because DASH segments can contain subsegments, and each of these may be decodable, but it is also confusing as the term chunks and segments are both used in the ExoPlayer code.
Its useful when discussing this area to remember that the video download is actually a series of requests and responses, rather than a constant stream of media.
To start decoding earlier you typically have to request smaller 'pieces' (trying to avoid tripping over terminology...) of the video.
To be decodable, and video piece usually needs to start with a frame which does not reference any previous frames - an IDR frame or Stream Access Point (SAP).
Looking at ExoPlayer itself, you can set the number of segments you can download per chunk (Exoplayer terminology for the bit of the video you download) - take a look at the 'maxSegmentsPerLoad' attribute in the 'DefaultDashChunkSource': https://github.com/google/ExoPlayer/blob/bd54394391b0527893f382c9d641b8a55ffca765/library/dash/src/main/java/com/google/android/exoplayer2/source/dash/DashChunkSource.java
However, I think this is the opposite of what you are looking for - you would like to request a smaller piece of video - e.g. subsegments rather than the whole segments.
For that you most likely want to look at the new low latency mechanisms introduced for DASH and for HLS - ExoPlayer has added support for these and there is a public design document which provides a very good explanation of the background and the approach here (link correct at the time of writing - original ExoPlauer git issue also for reference - https://github.com/google/ExoPlayer/issues/4904):
https://docs.google.com/document/d/1z9qwuP7ff9sf3DZboXnhEF9hzW3Ng5rfJVqlGn8N38k/edit#
The diagrams in this document explain it well, but the short answer to your question is that yes, this approach does allow smaller 'pieces' of video be downloaded and played back and it does indeed help with video start time.
I have 2 questions
1)How do I write raw data to a file sinker. I am trying to mux.
2)How do I make sure the sinked data is not written to a file but to a memory buffer
So in Detail:
I am trying to use windows MPEG-4 File Sink to write some Intel SDK Encoded avc or hevc to memory and send it to websocket.
what is the right approach?
Can I just feed raw hevc or avc as (byte*, length) to MPEG-4 File Sink?
Or Do I need to wrap the Intel Encoder into a Custom Windows Media Foundation Encoder(well I can just use GUID to get the Intel Encoder anyway) from Windows Media Frame work. Correct me If I am wrong please.
So I have 2 problems, How do I write my raw data(avc||hevc) to MP4 Sinker(Encoded by a 3rd Party Encoder)
Do I need to implement a custom Sinker , And how custom is it. Can I inherit part of the MPEG4 Sinker(After all I do not want to re implement a full container for Mp4)
Or Modify MPEG4 Sinker behavior so that it does not write it to a file but writes to a Memory
I know I feel like I re iterated myself A few times. Sorry about that.
1) If you wrap the encoded bitstream in an IMFSample you can just call IMFStreamSink::ProcessSample. To wrap it in the IMFSample, create a memory buffer IMFMediaBuffer with MFCreateMemoryBuffer , then create an IMFSample with MFCreateSample and add the buffer to it with IMFSample::AddBuffer. And then pass it to the stream sink. Also, if you can constrain the output bitstream length you can actually use the underlying memofy from IMFMediaBuffer by using IMFMediaBuffer::Lock to obtain the pointer to the underlying memory and passing that to the Intel SDK.
2) When creating the MPEG-4 sink via MFCreateMPEG4MediaSink you pass in an IMFByteStream instance. You can make your own class which implements this interface and writes the data directly to memory or wherever you need. If you do not want to do a full implementation there are also MFCreateMFByteStreamOnStream and MFCreateMFByteStreamOnStreamEx which can wrap an IStream instance into a IMFByteStream but I have never used those and I am not aware of the underlying memory semantics. You can create a memory backed IStream with SHCreateMemStream and CreateStreamOnHGlobal.
I have used Intel SDK quite long ago but if I remember it had a MFT compatible encoder, but I always used the plain C++ one, and thus I am not sure how they differ in terms of configuration etc. But if the MFT one works, then you can setup a proper pipeline without processing the bitstream samples yourself as stated in (1) and just handle (2).
Also, performance wise, since as far as I remember Intel SDK did work on Direct3D surfaces as well, you could look into MFCreateDXSurfaceBuffer to used Direct3D surfaces instead of memory buffers for wrapping the data.
I want to decode a MP3 file. I manage to find the 32 bits in the header (sync word, ID, Layer, Bitrate, etc). The problem is I have no idea on how to find the starting (the position) of main_data_begin (side information). I am using MATLAB in this case.
I know it may be a simple question, but I really need your help. Please.
Thank you.
MPEG1/2 Layer III uses main_data_begin as a kind of pseudo-VBR over the granule headers & data. The simplest way to do it is to implement a circular buffer that receives all the physical frame data after the side info and throws-away the unused bytes at the beginning of the buffer (as indicated by main_data_begin) before starting frame decode.
Your best bet is to read an existing decoder's source. The spec is also really good for this, but main_data_begin is mis-documented in publicly-available versions (as best as I can find).
I am a newbie in socket programming(in C), maybe this question is a litter bit stupid. In C socket programming, how should I determine the size of buffer of the function recv()/read()? As in many cases, we don't know the size of data sent using send()/write(). Thanks a lot!
how should I determine the size of buffer of the function
recv()/read()
Ideally one shouldn't look at these buffers and keep to the olden TCP model: keep reading bytes while bytes are available.
If you are asking this question for things like: "how big should be the buffer into which I receive?", the simple answer is to pick a size and just pass that. If there's more data you can read again.
Back to your original question, different stacks give you different APIs. For example on some Unixes you have things like SIOCINQ and FIONREAD. These give you the amount of data the kernel has in its receive buffer, waiting for you to copy it out.
If you don't really know how many bytes are expected, use a large buffer and pass a large buffer size to recv/read. These functions will return how many bytes were put into the buffer. Then you can deal with this data printing it, for example.
But keep in mind that data is often either sent in chunks of known size or sent with a message-size in the first bytes, so the receiver side is able to identify how many bytes should be read.
I am making an iphone app that will transmit data via sound. It takes a binary string and plays a tone for each 1 and silence for each 0.
(String example)
NSString* asciiString = #"www.google.com";
NSString* binaryString = AsciiToBinaryString(asciiString);
// binaryString == #"01110111 01110111 01110111 00101110 01100111 01101111 01101111 01100111 01101100 01100101 00101110 01100011 01101111 01101101"
Howver, this simple tone method is prone to errors and I think I need to use binary phase shift keying.
I would like to know how to apply binary phase shift keying to my binary string.
I'm not sure how to implement this. Any suggestions or code samples would be appreciated.
PS:
I did do a search on stack overflow and google and was not satisfied with what I found.
I looked at the GNU Radio project but don't understand python.
PSK is not gonna be easy if possible at all on iPhone. Frequency Shift Keying (FSK) should be possible. But I think what you need first is a higher level protocol that breaks up the data stream in bytes for example, and includes some checks. Think of the good old RS-232 serial protocol with start/stop bits and parity.
On the other hand, there are various apps that use the iPhone's audio port for data transfer. I did not look, but I can imagine that people must have written about this. And I would not be surprised if there're open source or commercial projects that give you this functionality out-of-the-box.
Good luck & enjoy, it sounds like a fun project!