Difference between 'display_aspect_ratio' and 'sample_aspect_ratio' in ffprobe [duplicate] - encoding

I am trying to change the dimensions of the video file through FFMPEG.
I want to convert any video file to 480*360 .
This is the command that I am using...
ffmpeg -i oldVideo.mp4 -vf scale=480:360 newVideo.mp4
After this command 1280*720 dimensions are converted to 640*360.
I have also attached video. it will take less than minute for any experts out there. Is there anything wrong ?
You can see here. (in Video, after 20 seconds, direclty jump to 1:35 , rest is just processing time).
UPDATE :
I found the command from this tutorial

Every video has a Sample Aspect Ratio associated with it. A video player will multiply the video width with this SAR to produce the display width. The height remains the same. So, a 640x720 video with a SAR of 2 will be displayed as 1280x720. The ratio of 1280 to 720 i.e. 16:9 is labelled the Display Aspect Ratio.
The scale filter preserves the input's DAR in the output, so that the output does not look distorted. It does this by adjusting the SAR of the output. The remedy is to reset the SAR after scaling.
ffmpeg -i oldVideo.mp4 -vf scale=480:360,setsar=1 newVideo.mp4
Since the DAR may no longer be the same, the output can look distorted. One way to avoid this is by scaling proportionally and then padding with black to achieve target resolution.
ffmpeg -i oldVideo.mp4 -vf scale=480:360:force_original_aspect_ratio=decrease,pad=480:360:(ow-iw)/2:(oh-ih)/2,setsar=1 newVideo.mp4

Related

FFMPEG Interpolated Video | First frame frozen without any interpolation to the second one

I've a small issue when generating interpolated videos from a short image sequence for a VQGAN+Clip Art Porject
The problem i've is just that the first frame stucks for a moment, then it jumps to the second one, the second one is also stuck for a moment but then it starts and works nicely. My biggest problem with it is the harsh transition from 1st to 2nd frame, the 2nd frame "start delay" is not that big an issue to me, but woul also be nice to get rid of for
Thats my command for generating the Video from an image sequence
ffmpeg -framerate 1 -i Upscaled\%d_out.png -vcodec h264_nvenc -pix_fmt yuv420p -strict -2 -vf minterpolate="mi_mode=mci:me=hexbs:me_mode=bidir:mc_mode=aobmc:vsbmc=1:mb_size=8:search_param=32:fps=30" InterpolatedVideo.mp4
You can see the result >> HERE <<
Now my question is if thats fixable by editing the command & if so, how?
I'd like to keep the first frame, but having it interpolate to the second frame.
I want to avoid to manually cut it afterwards, as i'd need to know the time to cut etc.
Thanks for any help in advance
Greetings from Vienna
Okay so what the problem was is the scene change detection aka. the scd parameter of the interpolation instruction. It's set to fdiff(frame difference) by default. Setting it to none with scd=none in the interpolation instruction gets rid of it
I also had to copy the 1st frame TWICE at the end to create a smooth loop. With only one, it entirely missed the last(copied first frame). I now copied it once morre at the end and it works now super smoothly. I guess the very last frame could be anything, as it misses it anyway

FFMPEG scene detection: overlay original frame number

I'm able to extract all frames that are not similar to the previous frame from a video file using ffmpeg -i video.mp4 -vf "select=gt(scene\,0.003),setpts=N/(30*TB)" frame%d.jpg (source)
I would like to overlay the frame number onto each selected frame. I tried adding drawtext=fontfile=/Windows/Fonts/Arial.ttf: text='frame\: %{frame_num}': x=(w-tw)/2: y=h-(2*lh): fontcolor=white: box=1: boxcolor=0x00000000#1: fontsize=30 to the filter after select and setpts, however %{frame_num} returns 1, 2, 3, ... (source)
If I put drawtext before select and setpts, I get something like 16, 42, 181, ... as frame numbers (which is exactly what I want), but since the scene detection runs after adding the text overlay, changes in the overlay may be detected as well.
Is it possible to do the scene detection and overlay independently from another? [in] split [out0][out1] can be used to apply filters separately, but I don't know how to "combine" the results again.
You are on the right track. Use split first to create two streams. Run scene detection on one, and draw text on another. Then use overlay to paint the numbered stream on the pruned stream - only the corresponding pruned numbered frames will be emitted.
ffmpeg -i video.mp4 -vf "split=2[num][raw];[raw]select=gt(scene\,0.003)[raw];[num]drawtext=fontfile=/Windows/Fonts/Arial.ttf: text='frame\: %{frame_num}': x=(w-tw)/2: y=h-(2*lh): fontcolor=white: box=1: boxcolor=0x00000000#1: fontsize=30[num];[raw][num]overlay=shortest=1,setpts=N/(30*TB)" -r 30 frame%d.jpg

Lossless compression of a sequence of similar grayscale images

I would like to have the best compression ratio of a sequence of similar grayscale images. I note that I need an absolute lossless solution (meaning I should be able to check it with an hash algorithm).
What I tried
I had the idea to convert my images into a video because there is a chronology between images. The encoding algorithm would compress using the fact that not all the scene change between 2 pictures. So I tried using ffmpeg but I had several problems due to sRGB -> YUV colorspace compression. I didn't understand all the thing but it's seems like a nightmare.
Example of code used :
ffmpeg -i %04d.png -c:v libx265 -crf 0 video.mp4 #To convert into video
ffmpeg -i video.mp4 %04d.png #To recover images
My second idea was to do it by hand with imagemagik. So I took the first image as reference and create a new image that is the difference between image1 and image2. Then I tried to add the difference image with the image 1 (trying to recover image 2) but it didn't work. Noticing the size of the recreated picture, it's clear that the image is not the same. I think there was an unwanted compression during the process.
Example of code used :
composite -compose difference 0001.png 0002.png diff.png #To create the diff image
composite -compose difference 0001.png diff.png recover.png #To recover image 2
Do you have any idea about my problem ?
And why I don't manage to do the perfect recover with iamgemagik ?
Thanks ;)
Here are 20 samples images : https://cloud.damien.gdn/d/f1a7954a557441989432/
I tried a few ideas with your dataset and summarise what I found below. My calculations and percentages assume that 578kB is a representative image size.
Method 1 - crush - 69%
I just ran pngcrush on one of your images like this:
pngcrush -bruteforce input.png crushed.png
The output size was 400kB, so your image is now only taking 69% of the original space on disk.
Method 2 - rotate and crush - 34%
I rotated your images through 90 degrees and crushed the result:
magick input.png -rotate 90 result.png
pngcrush -bruteforce result.png crushed.png
The rotated crushed image takes 34% of the original space on disk.
Method 3 - rotate and difference - 24%
I rotated your images with ImageMagick, then differenced two adjacent images in the series and saved the result. I then "pngcrushed" that which resulted in 142kB, or 24% of the original space.
Method 4 - combined to RGB - 28%
I combined three of your single channel images into a 3-channel RGB image and pngcrushed the result:
magick 000[123].png -combine result.png
pngcrush -bruteforce result.png crushed.png
That resulted in a 490kB file containing 3 images, i.e. 163kB per image or 28% of the original size.
I suspect video with "motion" estimation/detection would yield the best results if you are able to do it losslessly.
You might get some gain out of MNG, which is intended for lossless animation compression. You can use libmng to try it out.

How to Split frame into two images (even field and odd field) from Interlaced (NV12 format )raw data

I have raw NV12 YUV progressive Data and required to split each frame into images with even and odd fields (interlaced data).
If you want to do all the jobs manally:
Extract each frame from .yuv file
Depending on the format and resolution of your stream, caculate the size of one frame. Then, you can do the extration.
Split .yuv frame into .yuv field
Caculate the size of each line, and split the frame by odd/even line. Please take care of the uv line if the format is yuv420.
Covert .yuv field to .bmp image
If the format is not yuv444, then convert it to yuv444 first. Then, do the yuv to rgb convertion, and store the image into .bmp format.
With the help of ffmpeg and ImageMagick, it can also be done (more easier) by two steps (supposing that the resolution of frame is 1920x1080 and field is 1920x540) :
Convert YUV to Images
ffmpeg -s 1920x1080 -i input.yuv frame_%3d.bmp
-pix_fmt can be used to specify the format(pixel layout) of .yuv file.
Split Images to Odd/Even
convert frame_000.bmp -define sample:offset=25 -sample 100%x50% frame_000_top.bmp
convert frame_000.bmp -define sample:offset=75 -sample 100%x50% frame_000_bot.bmp
These two commands can be found in the last part of de-interlace a video frame.

Tesseract Trained data

Am trying to extract data from reciepts and bills using Tessaract , am using tesseract 3.02 version .
am using only english data , Still the output accuracy is about 60%.
Is there any trained data available which i just replace in tessdata folder
This is the image nicky provided as a "typical example file":
Looking at it I'd clearly say: "Forget it, nicky! You cannot train Tesseract to recognize 100% of text from this type of image!"
However, you could train yourself to make better photos with your iPhone 3GS (that's the device which was used for the example pictures) from such type of receipts. Here are a few tips:
Don't use a dark background. Use white instead.
Don't let the receipt paper crumble. Straighten it out.
Don't place the receipt loosely on an uneven underground. Fix it to a flat surface:
Either place it on a white sheet of paper and put a glas platen over it.
Or use some glue and glue it flat on a white sheet of paper without any bend-up edges or corners.
Don't use a low resolution like just 640x480 pixels (as the example picture has). Use a higher one, such as 1280x960 pixels instead.
Don't use standard exposure. Set the camera to use extremely high contrast. You want the letters to be black and the white background to be really white (you don't need the grays in the picture...)
Try to make it so that any character of a 10-12 pt font uses about 24-30 pixels in height (that is, make the image to be about 300 dpi for 100% zoom).
That said, something like the following ImageMagick command will probably increase Tesseract's recognition rate by some degree:
convert \
http://i.stack.imgur.com/q3Ad4.jpg \
-colorspace gray \
-rotate 90 \
-crop 260x540+110+75 +repage \
-scale 166% \
-normalize \
-colors 32 \
out1.png
It produces the following output:
You could even add something like -threshold 30% as the last commandline option to above command to get this:
(You should play a bit with some variations to the 30% value to tweak the result... I don't have the time for this.)
Taking accurate info from a receipt is not impossible with tesseract. You will need to add image filters and some other tools such as OpenCV, NumPy ImageMagick alongside Tesseract. There was a presentation at PyCon 2013 by Franck Chastagnol where he describes how his company did it.
Here is the link:
http://pyvideo.org/video/1702/building-an-image-processing-pipeline-with-python
You can get a much cleaner post-processed image before using Tesseract to OCR the text. Try using the Background Surface Thresholding (BST) technique rather than other simple thresholding methods. You can find a white paper on the subject here.
There is an implementation of BST for OpenCV that works pretty well https://stackoverflow.com/a/22127181/3475075
i needed exactly the same thing and i tried some image optimisations to improve the output
you can find my experiment with tessaract here
https://github.com/aryansbtloe/ExperimentWithTesseract