ffmpeg: Stripping Audio and Scaling Down Video

Rather than use animated GIFs when I’m trying to show a video without sound, I prefer to use ffmpeg to strip out the audio and scale down the video. I’ve looked this command up way too many times:

ffmpeg -i test.mp4 -c:v libx264 -profile:v baseline -vf scale=640:-1 -an test-640.mp4

Update, 7 October 2018

Twitter has some video requirements such as a maximum framerate, that need to be accounted for:

ffmpeg -i test.mp4 -c:v libx264 -profile:v baseline -vf scale=540:-1 -t 30 -r:0.0 30 test-540.mp4

The above command will rescale the video to a quarter of Full HD resolution in portrait orientation, set a duration of 30 seconds (the -t option) and set the output video stream 0.0 framerate to 30 frames per second.

Update, 2 July 2019

ffmpeg -i input.mp4 -c:v libx264 -profile:v baseline -vf scale=1280:-1 -an -ss 5.5 test-1280.mp4

Set the start time at 5.500 seconds and scale to 1280px width.

ffmpeg -i input.mp4 -vframes 1 -vf scale=1280:-1 "poster.jpg"

Create a poster image for the video file.

Update, 30 August 2019

Measure-Command { ffmpeg -i video-nonoisereduct-noedgeenhance-with-log-medium.mp4 -c:v ffvhuff -an -filter:v "scale=640:-1" test-output-ffvhuff-640.mkv }

Measuring the amount of time ffmpeg takes to transcode on Windows using Powershell (similar to the time command on Linux).

This was a test of ffvhuff as a codec, and pushing the file down into a proxy-file sized format, something I’ll be learning more about in the future.

Update, 11 December 2019

This one had a bit of audio hum in it that I wanted to remove using Audacity.

Step 1: Copy and Trim Off 15s

ffmpeg -i "input.avi" -c:a copy -c:v copy -ss 15 output-1.mkv

Step 2: Copy audio out

ffmpeg -i output-1.mkv -c:a copy output-2.wav

Step 3: Edit in Audacity

Amplify:

Remove the 60Hz hum using https://wiki.audacityteam.org/wiki/Nyquist_Effect_Plug-ins#Hum_Remover and the following settings:

Remove remaining noise with Noise Reduction effect:

Step 4: Remerge audio, replacing existing audio, re-encode video, and deinterlace.

ffmpeg -i output-1.mkv -i output-3.wav -ar 32k -c:a aac -b:a 128k -c:v libx264 -profile:v main -pix_fmt yuv420p -movflags +faststart -preset veryslow -crf 17 -vf "yadif=mode=1" -map 0:v:0 -map 1:a:0 -t 10 output-4.mp4

Update, 30 May 2020

GitHub Markdown only allows images at the moment, so I actually do use animated GIFs there.

You can convert a video to GIF directly, with a generated palette based on the most-used colors in the video, which is important to make the GIF look good.

ffmpeg -i test.mp4 -an -filter_complex "[0:v] palettegen [palette]; [0:v][palette] paletteuse" test.gif

The filter_complex filtergraph feeds the video feed from test.mp4 into palettegen, which outputs its information to the palette feed; then, the filtergraph feeds the video feed from test.mp4 and the palette into paletteuse which is then applied to the output frames to create the GIF.

Animated GIFs are most useful for things like command-line captures:

Animated GIF using palette generated by ffmpeg.

This animated GIF comes from the GoodParallel project.

Update, 8 June 2020 (Screencast Edition)

When doing the initial recordings with OBS, use -crf0 -preset ultrafast to ensure that the screencast is losslessly captured at the full refresh rate, also using BT.709 and the Full color profile.

On my ancient Intel i7-3770 CPU, I can capture ~150 frames/sec using the ultrafast preset, which is more than enough to do 1440p60 screencasts and prevent dropped frames.

Note that the files produced with this method will not play back directly in the Windows 10 Movie Player. I use ffplay from the command line to view them. The files are also enormous, because we are trading storage for CPU time during capture.

For shrinking and archiving those videos, use -crf 0 -preset veryslow -color_primaries bt709 -color_trc bt709 -colorspace bt709. The output is again lossless and we tell ffmpeg to preserve the original colorspace information, which otherwise is dropped.

Using the veryslow preset, we trade CPU time for storage and use more computationally-expensive bidirectional predictor frames to eliminate as much redundant information from the input file as possible.

Storage savings of up to 90% are possible and the output remains lossless. I haven’t yet experimented with -c:v libx265 to see if that offers even better lossless archival compression. For screencast captures, which usually have lots of low-entropy regions, it should be reasonable to expect a high compression ratio.

For publishing, I might use -vf “scale=1920:-2” -crf 17 -preset veryslow -color_primaries bt709 -color_trc bt709 -colorspace bt709 or similar, though generation loss could be an issue if running the video again through a transcoder at YouTube, et al.

Remember to always provide the colorspace flags, and check the source material to see if it provides this information using mediainfo.

Different encoders yield wildly different colorspace info.

Here’s a sample of 2160p30 video from my smartphone (Qualcomm Snapdragon 835 + Adreno 540), it uses BT.601 NTSC:

Video ID : 2 Format : AVC Format/Info : Advanced Video Codec Format profile : High@L5.1 Format settings : CABAC / 1 Ref Frames Format settings, CABAC : Yes Format settings, ReFrames : 1 frame Format settings, GOP : M=1, N=30 Codec ID : avc1 Codec ID/Info : Advanced Video Coding Duration : 57 s 319 ms Source duration : 57 s 318 ms Bit rate : 48.0 Mb/s Width : 3 840 pixels Height : 2 160 pixels Display aspect ratio : 16:9 Frame rate mode : Variable Frame rate : 30.000 FPS Minimum frame rate : 29.479 FPS Maximum frame rate : 30.364 FPS Standard : NTSC Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 8 bits Scan type : Progressive Bits/(Pixel*Frame) : 0.193 Stream size : 328 MiB (100%) Source stream size : 328 MiB (100%) Title : VideoHandle Language : English Encoded date : UTC 2020-06-09 11:49:52 Tagged date : UTC 2020-06-09 11:49:52 Color range : Full Color primaries : BT.601 NTSC Transfer characteristics : BT.601 Matrix coefficients : BT.601 mdhd_Duration : 57319

And here’s a sample of the 720p timelapse video information from the same smartphone, it uses BT.601 PAL (which is weird that the normal and timelapse modes use different colorspaces):

Video ID : 1 Format : AVC Format/Info : Advanced Video Codec Format profile : High@L3.1 Format settings : CABAC / 1 Ref Frames Format settings, CABAC : Yes Format settings, ReFrames : 1 frame Format settings, GOP : M=1, N=30 Codec ID : avc1 Codec ID/Info : Advanced Video Coding Duration : 54 s 700 ms Bit rate : 12.0 Mb/s Width : 1 280 pixels Height : 720 pixels Display aspect ratio : 16:9 Frame rate mode : Constant Frame rate : 30.000 FPS Standard : NTSC Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 8 bits Scan type : Progressive Bits/(Pixel*Frame) : 0.434 Stream size : 78.3 MiB (100%) Title : VideoHandle Language : English Encoded date : UTC 2020-06-09 13:55:33 Tagged date : UTC 2020-06-09 13:55:33 Color range : Full Color primaries : BT.601 PAL Transfer characteristics : BT.601 Matrix coefficients : BT.601

And here’s the 1440p60 screencast using OBS as the recording software, it uses BT.709:

Video ID : 1 Format : AVC Format/Info : Advanced Video Codec Format profile : High 4:4:4 Predictive@L5.1 Format settings : 1 Ref Frames Format settings, CABAC : No Format settings, ReFrames : 1 frame Codec ID : V_MPEG4/ISO/AVC Duration : 15 min 10 s Width : 2 560 pixels Height : 1 440 pixels Display aspect ratio : 16:9 Frame rate mode : Constant Frame rate : 60.000 FPS Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 8 bits Scan type : Progressive Writing library : x264 core 157 r2945 72db437 Encoding settings : cabac=0 / ref=1 / deblock=0:0:0 / analyse=0:0 / me=dia / subme=0 / psy=0 / mixed_ref=0 / me_range=16 / chroma_me=1 / trellis=0 / 8x8dct=0 / cqm=0 / deadzone=21,11 / fast_pskip=0 / chroma_qp_offset=0 / threads=12 / lookahead_threads=2 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=0 / weightp=0 / keyint=250 / keyint_min=25 / scenecut=0 / intra_refresh=0 / rc=cqp / mbtree=0 / qp=0 Default : Yes Forced : No Color range : Full Color primaries : BT.709 Transfer characteristics : BT.709 Matrix coefficients : BT.709

Update, 10 June 2020

Not sure yet if it makes sense to push all input videos to the same colorspace, but here’s how to do BT.601 to BT.709 and use the Ut Video editing codec.

ffmpeg -t 10 -i input.mp4 -vf "scale=2560:-2:in_color_matrix=bt601:out_color_matrix =bt709" -map_metadata -1 -c:v utvideo -c:a copy output.mkv

Update, 12 June 2020

Extracting the last frame in a video, or specific time offset:

ffmpeg -sseof -3 -i input.mp4 -update 1 -frames:v 1 -q:v 1 output.jpg

ffmpeg -ss 5:00 -i input.mp4 -frames:v 1 -q:v 1 output.png

Selecting and merging video from one file and audio from another, this is good if you only need to rework a small section of audio and don’t want to re-render the entire video:

ffmpeg -i video.mkv -i audio.wav -map 0:v -map 1:a -c:v copy -c:a copy output.mkv

Normalizing the audio using two-pass, use ffmpeg-normalize as follows:

ffmpeg-normalize -o output.mp4 -p -c:a aac -b:a 128K -ar 48000 -e="-color_primaries bt709" -e="-color_trc bt709" -e="-colorspace bt709" -e="-movflags +faststart" input.mkv

This runs the ffmpeg loudnorm filter, figures out the right normalization, applies it, and codes that into AAC at 128Kbit/s.

Update, 20 October 2020

Converting MP4 files to APNG, it’s similar to the GIF conversion and you still want to do a most-used color palette analysis:

ffmpeg -ss 2.5 -i test.mp4 -an -filter_complex "[0:v] palettegen [palette]; [0:v][palette] paletteuse" -r 6 -t 5.5 -plays 0 test.apng

Update, 30 October 2020

Archiving losslessly-captured screencasts:

ffmpeg -benchmark -i input.mkv -crf 0 -preset veryslow -c:a copy -color_primaries bt709 -color_trc bt709 -colorspace bt709 output.mkv