How to use ffmpeg

21.08.202221.08.2022 admin 0 Comments how, how to made, как, как сделать, как создать, совими руками

How to use ffmpeg

19 команд ffmpeg для любых нужд

От переводчика:
Многие знают, что ffmpeg — это сила, но не все знают, какая именно. Он многогранен и безграничен, а его man объёмен и местами малопонятен, лишь немногие постигли дао профессиональной работы с ним. И тем не менее, этот инструмент может быть полезен почти всем, кто хоть иногда работает с видео и звуком, даже на бытовом уровне. О некоторых полезных консольных командах ffmpeg и пойдёт речь в статье. В некоторых местах я взял на себя смелость вставить ссылки на поясняющие статьи.

ffmpeg — это кроссплатформенная open-source библиотека для обработки видео- и аудиофайлов. Я собрал 19 полезных и удивительных команд, покрывающих почти все нужды: конвертация видео, извлечение звуковой дорожки, конвертирование для iPod или PSP, и многое другое.

1. Получение информации о видеофайле

2. Превратить набор картинок в видео

Эта команда преобразует все картинки из текущей директории (названные image1.jpg, image2.jpg и т.д.) в видеофайл video.mpg

(примечание переводчика: мне больше нравится такой формат:

здесь задаётся frame rate (12) для видео, формат «image_%010d.png» означает, что картинки будут искаться в виде image_0000000001.png, image_0000000002.png и тд, то есть, в формате printf)

3. Порезать видео на картинки

Эта команда создаст файлы image1.jpg, image2.jpg и т.д., поддерживаются так же форматы PGM, PPM, PAM, PGMYUV, JPEG, GIF, PNG, TIFF, SGI.

FFmpeg

From the project home page:

Installation

Encoding examples

Screen capture

FFmpeg includes the x11grab and ALSA virtual devices that enable capturing the entire user display and audio input.

To take a screenshot screen.png :

To take a screencast screen.mkv with lossless encoding and without audio:

Here, the Huffyuv codec is used, which is fast, but produces huge file sizes.

To take a screencast screen.mp4 with lossy encoding and with audio:

Here, the x264 codec with the fastest possible encoding speed is used. Other codecs can be used; if writing each frame is too slow (either due to inadequate disk performance or slow encoding), then frames will be dropped and video output will be choppy.

Recording webcam

FFmpeg includes the video4linux2 and ALSA input devices that enable capturing webcam and audio input.

The following command will record a video webcam.mp4 from the webcam without audio, assuming that the webcam is correctly recognized under /dev/video0 :

The above produces a silent video. To record a video webcam.mp4 from the webcam with audio:

VOB to any container

Concatenate the desired VOB files into a single stream and mux them to MPEG-2:

Lossless

The ultrafast preset will provide the fastest encoding and is useful for quick capturing (such as screencasting):

On the opposite end of the preset spectrum is veryslow and will encode slower than ultrafast but provide a smaller output file size:

Both examples will provide the same quality output.

Constant rate factor

Two-pass (very high-quality)

Audio deactivated as only video statistics are recorded during the first of multiple pass runs:

Video stabilization

Video stablization using the vid.stab plugin entails two passes.

First pass

The first pass records stabilization parameters to a file and/or a test video for visual analysis.

Second pass

The second pass parses the stabilization parameters generated from the first pass and applies them to produce «output-stab_final». You will want to apply any additional filters at this point so as to avoid subsequent transcoding to preserve as much video quality as possible. The following example performs the following in addition to video stabilization:

Example command showing the defaults when libx265 is invoked without any parameters (Constant Rate Factor encoding):

See FFmpeg H.265/HEVC Video Encoding Guide for more information.

Single-pass MPEG-2 (near lossless)

Allow FFmpeg to automatically set DVD standardized parameters. Encode to DVD MPEG-2 at

Encode to DVD MPEG-2 at

Subtitles

Extracting

Subtitles embedded in container files, such as MPEG-2 and Matroska, can be extracted and converted into SRT, SSA, WebVTT among other subtitle formats [1].

Hardsubbing

(instructions based on HowToBurnSubtitlesIntoVideo at the FFmpeg wiki)

Hardsubbing entails merging subtitles with the video. Hardsubs cannot be disabled, nor language switched.

Volume gain

Here volume=1.5 provides a 150% volume gain, instead of 1.5 use for example 0.5 to half the volume. The volume filter can also take a decibel measure, use volume=3dB to increase the volume by 3dB or volume=-3dB to decrease it by 3dB.

Volume normalization

In this example, print_format=summary is also added to display the input and output loudness values of the audio file.

Extracting audio

-vn disables the processing of the video stream.

Extract audio stream with certain time interval:

Stripping audio

Splitting files

You can use the copy codec to perform operations on a file without changing the encoding. For example, this allows you to easily split any kind of media file into two:

Hardware video acceleration

Encoding/decoding performance may be improved by using hardware acceleration API’s, however only a specific kind of codec(s) are allowed and/or may not always produce the same result when using software encoding.

VA-API

VA-API can be used for encoding and decoding on Intel CPUs (requires libva-intel-driver ) and on certain AMD GPUs when using the open-source AMDGPU driver (requires libva-mesa-driver ). See the FFmpeg documentation or Libav documentation for information about available parameters and supported platforms.

An example of encoding using the supported H.264 codec:

For a quick reference, a constant quality encoding can be achieved with:

NVIDIA NVENC/NVDEC

NVENC and NVDEC can be used for encoding/decoding when using the proprietary NVIDIA driver with the nvidia-utils package installed. Minimum supported GPUs are from 600 series, see Hardware video acceleration#NVIDIA for details.

This old gist provides some techniques. NVENC is somewhat similar to CUDA, thus it works even from terminal session. Depending on hardware NVENC is several times faster than Intel’s VA-API encoders.

To print available options execute ( hevc_nvenc may also be available):

Intel QuickSync (QSV)

Intel® Quick Sync Video uses media processing capabilities of an Intel GPU to decode and encode fast, enabling the processor to complete other tasks and improving system responsiveness.

The usage of QuickSync is describe in the FFmpeg Wiki. It is recommended to use VA-API [2] with either the iHD or i965 driver instead of using libmfx directly, see the FFmpeg Wiki section Hybrid transcode for encoding examples and Hardware video acceleration#Configuring VA-API for driver instructions.

AMD AMF

AMD added support for H264 only video encoding on Linux through AMD Video Coding Engine (GPU encoding) with the AMDGPU PRO proprietary packages, and ffmpeg added support for AMF video encoding, so in order to encode using the h264_amf video encoder, amf-amdgpu-pro AUR is required. You may need to link to the ICD file provided by the AMDGPU PRO packages as a variable or ffmpeg could use the open AMDGPU’s ICD file and not be able to use this video encoder. An example of a command for encoding could be as follows:

For a quick reference, a constant quality encoding can be achieved with:

Animated GIF

Whilst animated GIFs are generally a poor choice of video format due to their poor image quality, relatively large file size and lack of audio support, they are still in frequent use on the web. The following command can be used to turn a video into an animated GIF:

See http://blog.pkh.me/p/21-high-quality-gif-with-ffmpeg.html for more information on using the palette filters to generate high quality GIFs.

Preset files

/.ffmpeg with the default preset files:

Create new and/or modify the default preset files:

Using preset files

libavcodec-vhq.ffpreset

Tips and tricks

Reduce verbosity

Use a combination of the following options to reduce verbosity to the desired level:

protrolium / ffmpeg.md

Converting Audio into Different Formats / Sample Rates

vn is no video.
acodec copy says use the same audio stream that’s already in there.

Replace Audio on a Video without re-encoding.

You say you want to «extract audio from them (mp3 or ogg)». But what if the audio in the mp4 file is not one of those? you’d have to transcode anyway. So why not leave the audio format detection up to ffmpeg?

To convert one file:

To convert many files:

You can of course select any ffmpeg parameters for audio encoding that you like, to set things like bitrate and so on.

Note that in this case, the audiofile format has to be consistent with what the container has (i.e. if the audio is AAC format, you have to say audiofile.aac). You can use the ffprobe command to see which formats you have, this may provide some information:

A possible way to automatically parse the audio codec and name the audio file accordingly would be:

Note that this command uses sed to parse output from ffprobe for each file, it assumes a 3-letter audio codec name (e.g. mp3, ogg, aac) and will break with anything different.

Encoding multiple files

You can use a Bash «for loop» to encode all files in a directory:

Extract Single Image from a Video at Specified Frame

ss offset = frame number divided by FPS of video = the decimal (in milliseconds) ffmpeg needs i.e. 130.5

Merge Multiple Videos

concat demuxer
$ cat mylist.txt
file ‘/path/to/file1’
file ‘/path/to/file2’
file ‘/path/to/file3’

Rotate Video by editing metadata (without re-encoding).

Split a Video into Images

Convert Images into a Video

Convert Single Image into a Video

Convert non-sequentially named Images in a directory

Convert image sequence of many different sizes and conform to specific frame size

Guarantee aspect ratio from image sequence

Evaluate which ratio to apply for scaling, then scale with the requisite amount of padding

Simple FLAC convert

Mix Stereo to Mono

If you want to use the right channel, write 0.1.1 instead of 0.1.0.

Trim End of file (mp3)

Subdivide an audio file by time interval

use ffmpeg cut mp4 video without re-encoding

use ffmpeg cut mp4 video with re-encoding

It reduced a 100mb video to 9mb.. Very little change in video quality.

make a grayscale version and scale to 640×480

Convert MP4 to WEBM

Convert MKV to MP4

check for streams that you want (video/audio). be sure to convert/specify DTS 6 channel audio stream

Add Watermark overlay (png) to the center of a video

Reverse a video

Concat a video with a reversed copy of itself for ping-pong looping effect

Convert to different frame rate while preserving audio sync

Руководство по FFmpeg libav

Долго искал книгу, в которой было бы разжёвано, как использовать FFmpeg-подобную библиотеку, известную как libav (название расшифровывается как library audio video). Обнаружил учебник «Как написать видеоплеер и уложиться в менее чем тысячу строк». К сожалению, информация там устаревшая, так что пришлось создавать мануал своими силами.

Большая часть кода будет на C, однако не волнуйтесь: Вы легко всё поймёте и сможете применить на любимом языке. У FFmpeg libav уйма привязок ко многим языкам (в том числе и к Python и к Go). Но даже если Ваш язык прямой совместимости не имеет, всё равно можно привязаться через ffi (вот пример с Lua).

Начнём с краткого экскурса о том, что такое видео, аудио, кодеки и контейнеры. Затем перейдем к ускоренному курсу, посвященному использованию командной строки FFmpeg, и, наконец, напишем код. Не стесняйтесь переходить сразу в раздел «Тернистый путь изучения FFmpeg libav».

Есть мнение (и не только моё), что потоковое интернет-видео уже приняло эстафету от традиционного телевидения. Как бы то ни было, FFmpeg libav точно достоин изучения.

Вступление ↑

Видео — это то, что ты видишь! ↑

Если последовательность изображений менять с заданной частотой (скажем, 24 изображения в секунду), то создаётся иллюзия движения. Это и есть основная идея видео: серия изображений (кадров), движущихся с заданной скоростью.

Иллюстрация 1886 года.

Аудио — это то, что ты слышишь! ↑

Хотя немое видео может вызывать самые разные чувства, добавление звука резко повышает степень удовольствия.

Звук — это колебательные волны, распространяемые в воздухе или в любой других средах передачи (таких как газ, жидкость или твердое вещество).

В цифровой аудиосистеме микрофон преобразует звук в аналоговый электрический сигнал. Затем аналого-цифровой преобразователь (АЦП) — обычно с использованием импульсной кодовой модуляции (ИКМ) — преобразует аналоговый сигнал в цифровой.

Кодек — сжатие данных ↑

Кодек — это электронная схема или программное обеспечение, сжимающее или распаковывающее цифровое аудио/видео. Он преобразует необработанное (несжатое) цифровое аудио/видео в сжатый формат (или наоборот).

Но если мы решим упаковать миллионы изображений в один файл и назовем его фильмом, у нас может получиться огромный файл. Давайте посчитаем:

Допустим, создаём видео с разрешением 1080×1920 (высота × ширина). Тратим 3 байта на пиксель (минимальную точку на экране) для цветового кодирования (24-битного цвета, что дает нам разных цветов). Это видео работает со скоростью 24 кадра в секунду, общая продолжительность 30 минут.

Для этого видео потребуется приблизительно 250,28 Гб памяти или 1,11 Гбит/с! Вот поэтому и придётся использовать кодек.

Контейнер — удобный способ хранения аудио/видео ↑

Формат контейнера (оболочки) — это формат метафайла, спецификация которого описывает, как различные элементы данных и метаданных сосуществуют в компьютерном файле.

Это единый файл, содержащий все потоки (в основном, аудио и видео), обеспечивающий синхронизацию, содержащий общие метаданные (такие как заголовок, разрешение) и т.п.

Обычно формат файла определяется по его расширению: например, video.webm — это, скорее всего, видео с использованием контейнера webm.

Командная строка FFmpeg↑

Самодостаточное кроссплатформенное решение для записи, конвертации и потоковой передачи аудио/видео.

Для работы с мультимедиа у нас есть восхитительный инструмент — библиотека под названием FFmpeg. Даже если Вы не используете её в своём программном коде, то всё равно используете её (Вы ведь используете Chrome?).

В библиотеке есть консольная программка для ввода командной строки под названием ffmpeg (маленькими буквами, в отличие от названия самой библиотеки). Это простой и мощный бинарник. Например, можно конвертировать из mp4 в avi, просто набрав такую команду:

Мы только что сделали ремиксинг — сконвертировали из одного контейнера в другой. Технически FFmpeg также может выполнять транскодирование, но об этом позже.

Инструмент командной строки FFmpeg 101 ↑

У FFmpeg есть документация, где всё отлично объяснено, как что работает.

Схематично, программа командной строки FFmpeg ожидает, что следующий формат аргументов выполнит свои действия — ffmpeg <1> <2>-i <3> <4>, где:

— глобальные параметры
— параметры входного файла
— входящий URL
— параметры выходного файла
— исходящий URL

В частях <2>, <3>, <4>, <5>указывается столько аргументов, сколько нужно. Проще понять формат передачи аргументов на примере:

# ПРЕДУПРЕЖДЕНИЕ: файл по ссылке весит 300 МБ

Эта команда берет входящий mp4-файл, содержащий два потока (аудио, закодированный с помощью кодека aac, и видео, закодированный с использованием кодека h264), и преобразует его в webm, изменяя также кодеки аудио и видео.

Если упростить приведенную выше команду, то следует учесть, что FFmpeg примет значения по умолчанию вместо Вас. Например, если просто набрать

то, какой аудио/видео кодек он использует для создания output.mp4?

Вернер Робица написал руководство по чтению/исполнению, посвященное кодированию и редактированию с помощью FFmpeg.

Основные операции над видео↑

При работе с аудио/видео мы обычно выполняем ряд задач связанных с мультимедиа.

Транскодирование (перекодирование)↑

Что это? Процесс преобразования потокового или аудио или видео (или и то и другое одновременно) из одного кодека в другой. Формат файла (контейнер) при этом не меняется.

Для чего? Бывает, что некоторые устройства (телевизоры, смартфоны, консоли и т. д.) не поддерживают формат аудио/видео X, но поддерживают формат аудио/видео Y. Или же более новые кодеки предпочтительнее, поскольку обеспечивают лучшую степень сжатия.

Как? Преобразуем, к примеру, видео H264 (AVC) в H265 (HEVC):

Трансмультиплексирование↑

Что это? Преобразование из одного формата (контейнера) в другой.

Для чего? Бывает, что некоторые устройства (телевизоры, смартфоны, консоли и т. д.) не поддерживают формат файла X, но поддерживают формат файла Y. Или же более новые контейнеры, в отличие от устаревших, предоставляют современные требуемые функции.

Как? Конвертируем mp4 в webm:

Трансрейтинг↑

Что это? Изменение скорости передачи данных или создание другого представления.

Для чего? Пользователь может смотреть Ваше видео как в сети 2G на маломощном смартфоне, так и через оптоволоконную интернет-связь на 4K-телевизоре. Поэтому следует предлагать более одного варианта воспроизведения одного и того же видео с разными скоростями передачи данных.

Как? производит воспроизведение с битрейтом между 3856K и 2000K.

Обычно трансрейтинг осуществляется в связке с перекалибровкой. Вернер Робица написал еще одну обязательную для ознакомления статью о контроле скорости FFmpeg.

Трансайзинг (перекалибровка)↑

Что это? Изменение разрешающей способности. Как сказано выше, транссайзинг часто проводится одновременно с трансрейтингом.

Для чего? По тем же причинам, что и с трансрейтингом.

Как? Уменьшим разрешение 1080 до 480:

Бонус: адаптивный стриминг↑

Что это? Создание множества разрешений (битрейтов) и разбиение медиа на части и их передача по протоколу http.

Для чего? Ради обеспечения гибкого мультимедиа, которое можно просматривать хоть на бюджетном смартфоне хоть на 4K-плазме, чтобы можно было легко масштабировать и развертывать (но это может добавить задержку).

Как? Создадим адаптивный WebM с использованием DASH:

Выходя за рамки↑

Несть числа другим применениям FFmpeg. Я использую его вместе с iMovie для создания/правки некоторых видео для YouTube. И Вам, безусловно, использовать его профессионально тоже ничего не препятствует.

Тернистый путь изучения FFmpeg libav↑

Разве время от времени не поразительно то, что воспринимается через слух и зрение?

Биолог Дэвид Роберт Джонс

FFmpeg крайне полезен как инструмент в виде командной строки для выполнения важных операций с мультимедийными файлами. Может и в программах его тоже получится использовать?

FFmpeg состоит из нескольких библиотек, которые можно интегрировать в наши собственные программы. Обычно, при установке FFmpeg, автоматически устанавливаются все эти библиотеки. Я буду ссылаться на набор этих библиотек как FFmpeg libav.

Название раздела является данью уважения серии Зеда Шоу «Тернистый путь обучения [. ]», в частности его книге «Тернистый путь обучения языку C».

Глава 0 — простенький «Hello World»↑

В нашем Hello World на самом деле не будет приветствовать мир на языке консоли. Вместо этого распечатаем следующую информацию о видео: формат (контейнер), продолжительность, разрешение, аудиоканалы и, напоследок, расшифруем некоторые кадры и сохраним их как файлы изображений.

Архитектура FFmpeg libav↑

Но прежде чем начнём писать код, давайте посмотрим, как вообще работает архитектура FFmpeg libav и как ее компоненты взаимодействуют с другими.

Вот схема процесса декодирования видео:

Сначала медиафайл загружается в компонент по имени AVFormatContext (контейнер видео также является форматом). На самом деле он не полностью загружает весь файл: часто читается только заголовок.

Как только загрузили минимальный заголовок нашего контейнера, можно получить доступ к его потокам (их можно представить как элементарные аудио- и видео-данные). Каждый поток будет доступен в компоненте AVStream.

Предположим, наше видео имеет два потока: аудио, закодированное с помощью кодека AAC, и видео, закодированное с помощью кодека H264 (AVC). Из каждого потока можем извлечь фрагменты данных, называемые пакетами, которые загружаются в компоненты, называемые AVPacket.

Данные внутри пакетов по-прежнему кодируются (сжимаются), и для декодирования пакетов нам необходимо передать их конкретному AVCodec.

AVCodec декодирует их в AVFrame, в результате чего этот компонент выдает нам несжатый кадр. Отметим, что терминология и процесс одинаковы как для аудио- так и видео-потока.

Требования↑

Так как иногда возникают проблемы при компиляции или запуске примеров, мы будем использовать Docker в качестве среды разработки/выполнения. Также будем использовать видео с большим кроликом, поэтому, если у вас его нет на локальном компьютере, просто проведите в консоли команду make fetch_small_bunny_video.

Собственно, код↑

TLDR; покажи мне пример выполянемого кода, бро:

Мы опустим некоторые детали, но не волнуйтесь: исходный код доступен на github.

Мы собираемся выделить память для компонента AVFormatContext, который будет содержать информацию о формате (контейнере).

Теперь мы собираемся открыть файл, прочитать его заголовок и заполнить AVFormatContext минимальной информацией о формате (обратите внимание, что обычно кодеки не открываются). Для этого используется функция avformat_open_input. Он ожидает AVFormatContext, имя файла и два необязательных аргумента: AVInputFormat (если вы передадите NULL, FFmpeg определит формат) и AVDictionary (которые являются опциями демультиплексора).

Также можно напечатать название формата и длительность медиа:

Чтобы получить доступ к потокам, нам нужно прочитать данные с носителя. Это делает функция avformat_find_stream_info. Теперь pFormatContext-> nb_streams будет содержать количество потоков, а pFormatContext-> streams[i] даст нам i-й по счёту поток (AVStream).

Пройдемся в цикле по всем потокам:

Для каждого потока мы собираемся сохранить AVCodecParameters, описывающий свойства кодека, используемого i-м потоком:

Используя свойства кодеков можем найти соответствующий, запрашивая функцию avcodec_find_decoder, также можем найти зарегистрированный декодер для идентификатора кодека и вернуть AVCodec — компонент, который знает, как кодировать и декодировать поток:

Теперь мы можем распечатать информацию о кодеках:

С помощью кодека выделяем память для AVCodecContext, который будет содержать контекст для нашего процесса декодирования/кодирования. Но затем нужно заполнить этот контекст кодека параметрами CODEC — мы делаем это с помощью avcodec_parameters_to_context.

После того, как мы заполнили контекст кодека, необходимо открыть кодек. Вызываем функцию avcodec_open2 и затем можем ее использовать:

Теперь мы собираемся прочитать пакеты из потока и декодировать их в кадры, но сначала нам нужно выделить память для обоих компонентов (AVPacket и AVFrame).

Давайте скормим наши пакеты из потоков функции av_read_frame, пока у нее есть пакеты:

Теперь отправим пакет необработанных данных (сжатый кадр) в декодер через контекст кодека, используя функцию avcodec_send_packet:

И давайте получим кадр необработанных данных (несжатый кадр) от декодера через тот же контекст кодека, используя функцию avcodec_receive_frame:

Мы можем напечатать номер кадра, PTS, DTS, тип кадра и т.д.:

И напоследок, можем сохранить наш декодированный кадр в простое серое изображение. Процесс очень прост: мы будем использовать pFrame->data, где индекс связан с цветовыми пространствами Y, Cb и Cr. Просто выбираем 0 (Y), чтобы сохранить наше серое изображени:

И вуаля! Теперь у нас есть полутоновое изображение размером 2Мб:

Глава 1 — синхронизация аудио и видео↑

Быть в игре — это когда юный JS-разработчик пишет новый MSE-видеоплеер.

Прежде чем перейдем написанию кода транскодирования, давайте поговорим о синхронизации или о том, как видеоплеер узнаёт правильное время для воспроизведения кадра.

В предыдущем примере мы сохранили несколько кадров:

Когда мы проектируем видеоплеер, нам нужно воспроизводить каждый кадр в определенном темпе, иначе трудно получить удовольствие от видео либо из-за того, что оно воспроизводится слишком быстро, либо слишком медленно.

Поэтому нам нужно определить некую логику для плавного воспроизведения каждого кадра. В этом отношении каждый кадр имеет временнýю метку представления (PTS — от presentation timestamp), которая представляет собой увеличивающееся число, учитываемое в переменной timebase, которая представляет собой рациональное число (где знаменатель известен как временно́й масштаб — timescale), делимое на частоту кадров (fps).

Проще понять на примерах. Давайте смоделируем некоторые сценарии.

Для fps = 60/1 и timebase = 1/60000 каждый PTS будет увеличивать timescale / fps = 1000, поэтому реальное время PTS для каждого кадра может быть (при условии, что оно начинается с 0):

Почти по тому же сценарию, но с timescale, равной 1/60:

Для fps = 25/1 и timebase = 1/75 каждая PTS будет увеличивать timescale / fps = 3, и время PTS может быть:

Теперь с pts_time мы можем найти способ визуализировать это синхронизированным со звуком pts_time или с системными часами. FFmpeg libav предоставляет эту информацию через свой API:

Просто из любопытства, сохраненные нами кадры были отправлены в порядке DTS (кадры: 1, 6, 4, 2, 3, 5), но воспроизведены в порядке PTS (кадры: 1, 2, 3, 4, 5). Также обратите внимание, насколько дешевле обходятся B-кадры по сравнению с P или I-кадрами:

Глава 2 — ремультиплексирование↑

Ремультиплексирование (перекомпоновка, remuxing) — переход от одного формата (контейнера) к другому. Например, мы можем без особого труда заменить видео MPEG-4 на MPEG-TS с помощью FFmpeg:

MP4-файл будет демультиплексирован, при этом файл не будет декодирован или кодирован (-c copy), и, в конце концов, на выходет получим mpegts-файл. Если не указывать формат -f, ffmpeg попытается угадать его на основании расширения файла.

Общее использование FFmpeg или libav следует такому шаблону/архитектуре или рабочему процессу:

Теперь давайте создадим пример с использованием libav, чтобы обеспечить тот же эффект, что и при выполнении такой команды:

Мы собираемся читать из ввода (input_format_context) и изменять его на другой вывод (output_format_context):

Обычно начинаем с того, что выделяем память и открываем формат ввода. Для этого конкретного случая мы собираемся открыть входной файл и выделить память для выходного файла:

Будем ремультиплексировать только потоки видео, аудио и субтитров. Поэтому фиксируем, какие потоки будем использовать, в массив индексов:

Сразу после того, как выделим необходимую память, нужно выполнить цикл по всем потокам, и для каждого из которых надо создать новый выходной поток в нашем контексте выходного формата, используя функцию avformat_new_stream. Обратите внимание, что мы помечаем все потоки, которые не являются видео, аудио или субтитрами, чтобы была возможность пропустить их.

Теперь создаём выходной файл:

После этого можно копировать потоки, пакет за пакетом, из нашего ввода в наши выходные потоки. Это происходит в цикле, пока есть пакеты (av_read_frame), для каждого пакета нужно пересчитать PTS и DTS, чтобы наконец записать его (av_interleaved_write_frame) в наш контекст выходного формата.

Для завершения нам нужно записать трейлер потока в выходной медиафайл с помощью функции av_write_trailer:

Это работает! Не верите?! Проверьте с помощью ffprobe:

Подводя итог тому, что мы сделали, теперь можем вернуться к нашей первоначальной идее о том, как работает libav. Но мы пропустили часть кодека, что отображено на схеме.

Прежде чем закончим эту главу, хотелось бы показать такую важную часть процесса ремультиплексрования, где можно передавать параметры мультиплексору. Допустим, надо предоставить формат MPEG-DASH, поэтому нужно использовать фрагментированный mp4 (иногда называемый fmp4) вместо MPEG-TS или обычного MPEG-4.

С помощью командной строки это легко:

Почти так же просто это и в libav-версии, просто передаём опции при записи выходного заголовка, непосредственно перед копированием пакетов:

Теперь можем сгенерировать этот фрагментированный mp4-файл:

Чтобы убедиться, что тут всё по-честному, Вы можете использовать удивительный сайт-инструмент gpac/mp4box.js или сайт http://mp4parser.com/, дабы увидеть различия — сначала загрузите mp4.

Как видно, он имеет один неделимый блок mdat — это место, где находятся видео и аудио кадры. Теперь загрузите фрагментированный mp4, чтобы увидеть, как он расширяет блоки mdat:

Глава 3 — транскодирование↑

TLDR; покажи мне код и исполнение:

Мы пропустим некоторые детали, но не волнуйтесь: исходный код доступен на github.

В этой главе создадим минималистичннй транскодер, написанный на C, который может конвертировать видео из H264 в H265 с использованием библиотек FFmpeg libav, в частности libavcodec, libavformat и libavutil.

AVFormatContext — это абстракция для формата медиа-файла, т.е. для контейнера (MKV, MP4, Webm, TS)
AVStream представляет каждый тип данных для данного формата (например: аудио, видео, субтитры, метаданные)
AVPacket — это фрагмент сжатых данных, полученных из AVStream, которые могут быть декодированы с помощью AVCodec (например: av1, h264, vp9, hevc), генерирующих необработанные данные, называемые AVFrame.

Трансмультиплексирование↑

Начнем с простого преобразования, затем загрузим входной файл.

Теперь настроим декодер. AVFormatContext предоставит нам доступ ко всем компонентам AVStream, и для каждого из которых можем получить их AVCodec и создать конкретный AVCodecContext. И, наконец, можем открыть данный кодек, чтобы перейти к процессу декодирования.

AVCodecContext содержит данные о конфигурации мультимедиа, такие как скорость передачи данных, частота кадров, частота дискретизации, каналы, высота и многие другие.

Также нужно подготовить выходной медиа-файл для преобразования. Сначала выделим память для выходного AVFormatContext. Создадим каждый поток в выходном формате. Чтобы правильно упаковать поток, копируем параметры кодека из декодера.

Устанавливаем флаг AV_CODEC_FLAG_GLOBAL_HEADER, который сообщает кодировщику, что он может использовать глобальные заголовки, и, наконец, открываем выходной файл для записи и сохраняем заголовки:

Получаем AVPacket от декодера, корректируем метки времени и записываем пакет правильно в выходной файл. Несмотря на то, что функция av_interleaved_write_frame сообщает «write frame», сохраняем пакет. Заканчиваем процесс перестановки, записывая трейлер потока в файл.

Транскодирование↑

В предыдущем разделе была простая программа для преобразования, теперь добавим возможность кодировать файлы, в частности, перекодирование видео с h264 на h265.

После того, как подготовлен декодер, но перед тем, как организовать выходной медиафайл, настроим кодировщик.

Необходимо расширить цикл декодирования для транскодирования видеопотока:

Мы преобразовали поток мультимедиа из h264 в h265. Как и ожидалось, версия медиа-файла h265 меньше, чем h264, при этом у программы широкие возможности:

Положа руку на сердце, признаюсь, что было несколько посложнее, чем представлялось в начале. Пришлось ковыряться в исходном коде командной строки FFmpeg и много тестировать. Наверное, что-то где-то упустил, потому что пришлось применять force-cfr для h264, и всё ещё выскакивают некоторые предупреждающие сообщения, например о том, что тип кадра (5) принудительно был изменен на тип кадра (3).

How to use ffmpeg

ffmpeg is a very fast video and audio converter that can also grab from a live audio/video source. It can also convert between arbitrary sample rates and resize video on the fly with a high quality polyphase filter.

As a general rule, options are applied to the next specified file. Therefore, order is important, and you can have the same option on the command line multiple times. Each occurrence is then applied to the next input or output file. Exceptions from this rule are the global options (e.g. verbosity level), which should be specified first.

Do not mix input and output files – first specify all input files, then all output files. Also do not mix options which belong to different files. All options apply ONLY to the next input or output file and are reset between files.

The format option may be needed for raw input files.

3 Detailed description

The transcoding process in ffmpeg for each output can be described by the following diagram:

ffmpeg calls the libavformat library (containing demuxers) to read input files and get packets containing encoded data from them. When there are multiple input files, ffmpeg tries to keep them synchronized by tracking lowest timestamp on any active input stream.

Encoded packets are then passed to the decoder (unless streamcopy is selected for the stream, see further for a description). The decoder produces uncompressed frames (raw video/PCM audio/. ) which can be processed further by filtering (see next section). After filtering, the frames are passed to the encoder, which encodes them and outputs encoded packets. Finally those are passed to the muxer, which writes the encoded packets to the output file.

3.1 Filtering

Before encoding, ffmpeg can process raw audio and video frames using filters from the libavfilter library. Several chained filters form a filter graph. ffmpeg distinguishes between two types of filtergraphs: simple and complex.

3.1.1 Simple filtergraphs

Simple filtergraphs are those that have exactly one input and output, both of the same type. In the above diagram they can be represented by simply inserting an additional step between decoding and encoding:

Note that some filters change frame properties but not frame contents. E.g. the fps filter in the example above changes number of frames, but does not touch the frame contents. Another example is the setpts filter, which only sets timestamps and otherwise passes the frames unchanged.

3.1.2 Complex filtergraphs

Complex filtergraphs are those which cannot be described as simply a linear processing chain applied to one stream. This is the case, for example, when the graph has more than one input and/or output, or when output stream type is different from input. They can be represented with the following diagram:

A trivial example of a complex filtergraph is the overlay filter, which has two video inputs and one video output, containing one video overlaid on top of the other. Its audio counterpart is the amix filter.

3.2 Stream copy

Since there is no decoding or encoding, it is very fast and there is no quality loss. However, it might not work in some cases because of many factors. Applying filters is obviously also impossible, since filters work on uncompressed data.

4 Stream selection

4.1 Description

The sub-sections that follow describe the various rules that are involved in stream selection. The examples that follow next show how these rules are applied in practice.

While every effort is made to accurately reflect the behavior of the program, FFmpeg is under continuous development and the code may have changed since the time of this writing.

4.1.1 Automatic stream selection

In the absence of any map options for a particular output file, ffmpeg inspects the output format to check which type of streams can be included in it, viz. video, audio and/or subtitles. For each acceptable stream type, ffmpeg will pick one stream, when available, from among all the inputs.

It will select that stream based upon the following criteria:

In the case where several streams of the same type rate equally, the stream with the lowest index is chosen.

4.1.2 Manual stream selection

4.1.3 Complex filtergraphs

If there are any complex filtergraph output streams with unlabeled pads, they will be added to the first output file. This will lead to a fatal error if the stream type is not supported by the output format. In the absence of the map option, the inclusion of these streams leads to the automatic stream selection of their types being skipped. If map options are present, these filtergraph streams are included in addition to the mapped streams.

Complex filtergraph output streams with labeled pads must be mapped once and exactly once.

4.1.4 Stream handling

An exception exists for subtitles. If a subtitle encoder is specified for an output file, the first subtitle stream found of any type, text or image, will be included. ffmpeg does not validate if the specified encoder can convert the selected stream or if the converted stream is acceptable within the output format. This applies generally as well: when the user sets an encoder manually, the stream selection process cannot check if the encoded stream can be muxed into the output file. If it cannot, ffmpeg will abort and all output files will fail to be processed.

4.2 Examples

The following examples illustrate the behavior, quirks and limitations of ffmpeg’s stream selection methods.

They assume the following three input files.

Example: automatic stream selection

out2.wav accepts only audio streams, so only stream 3 from B.mp4 is selected.

For the first two outputs, all included streams will be transcoded. The encoders chosen will be the default ones registered by each output format, which may not match the codec of the selected input streams.

Example: automatic subtitles selection

Example: unlabeled filtergraph outputs

Example: labeled filtergraph outputs

The above command will fail, as the output pad labelled [outv] has been mapped twice. None of the output files shall be processed.

The command should be modified as follows,

The video stream from B.mp4 is sent to the hue filter, whose output is cloned once using the split filter, and both outputs labelled. Then a copy each is mapped to the first and third output files.

The video, audio and subtitle streams mapped to out2.mkv are entirely determined by automatic stream selection.

5 Options

All the numerical options, if not specified otherwise, accept a string representing a number as input, which may be followed by one of the SI unit prefixes, for example: ’K’, ’M’, or ’G’.

If ’i’ is appended to the SI unit prefix, the complete prefix will be interpreted as a unit prefix for binary multiples, which are based on powers of 1024 instead of powers of 1000. Appending ’B’ to the SI unit prefix multiplies the value by 8. This allows using, for example: ’KB’, ’MiB’, ’G’ and ’B’ as number suffixes.

Options which do not take arguments are boolean options, and set the corresponding value to true. They can be set to false by prefixing the option name with «no». For example using «-nofoo» will set the boolean option with name «foo» to false.

5.1 Stream specifiers

Some options are applied per-stream, e.g. bitrate or codec. Stream specifiers are used to precisely specify which stream(s) a given option belongs to.

Possible forms of stream specifiers are:

p: program_id [: additional_stream_specifier ]

# stream_id or i: stream_id

Match the stream by stream id (e.g. PID in MPEG-TS container).

Matches streams with the metadata tag key having the specified value. If value is not given, matches streams that contain the given tag with any value.

Matches streams with usable configuration, the codec must be defined and the essential information such as video dimension or audio sample rate must be present.

5.2 Generic options

These options are shared amongst the ff* tools.

Show help. An optional parameter may be specified to print help about a specific item. If no argument is specified, only basic (non advanced) tool options are shown.

Possible values of arg are:

Print advanced tool options in addition to the basic tool options.

Print complete list of options, including shared and private options for encoders, decoders, demuxers, muxers, filters, etc.

Show the build configuration, one option per line.

Show available formats (including devices).

Show available demuxers.

Show available muxers.

Show available devices.

Show all codecs known to libavcodec.

Note that the term ’codec’ is used throughout this documentation as a shortcut for what is more correctly called a media bitstream format.

Show available decoders.

Show all available encoders.

Show available bitstream filters.

Show available protocols.

Show available libavfilter filters.

Show available pixel formats.

Show available sample formats.

Show channel names and standard channel layouts.

Show stream dispositions.

Show recognized color names.

-sources device [, opt1 = val1 [, opt2 = val2 ]. ]

Show autodetected sources of the input device. Some devices may provide system-dependent source names that cannot be autodetected. The returned list cannot be assumed to be always complete.

Show autodetected sinks of the output device. Some devices may provide system-dependent sink names that cannot be autodetected. The returned list cannot be assumed to be always complete.

Set logging level and flags used by the library.

The optional flags prefix can consist of the following values:

Indicates that repeated log output should not be compressed to the first line and the «Last message repeated n times» line will be omitted.

Indicates that log output should add a [level] prefix to each message line. This can be used as an alternative to log coloring, e.g. when dumping the log to file.

loglevel is a string or a number containing one of the following values:

Show nothing at all; be silent.

Only show fatal errors which could lead the process to crash, such as an assertion failure. This is not currently used for anything.

Only show fatal errors. These are errors after which the process absolutely cannot continue.

Show all errors, including ones which can be recovered from.

Show all warnings and errors. Any message related to possibly incorrect or unexpected events will be shown.

Show informative messages during processing. This is in addition to warnings and errors. This is the default value.

Show everything, including debugging information.

For example to enable repeated log output, add the level prefix, and set loglevel to verbose :

Another example that enables repeated log output without affecting current state of level prefix flag or loglevel :

Setting the environment variable FFREPORT to any value has the same effect. If the value is a ’:’-separated key=value sequence, these options will affect the report; option values must be escaped if they contain special characters or the options delimiter ’:’ (see the “Quoting and escaping” section in the ffmpeg-utils manual).

The following options are recognized:

set the file name to use for the report; %p is expanded to the name of the program, %t is expanded to a timestamp, %% is expanded to a plain %

For example, to output a report to a file named ffreport.log using a log level of 32 (alias for log level info ):

Errors in parsing the environment variable are not fatal, and will not appear in the report.

Suppress printing banner.

All FFmpeg tools will normally show a copyright notice, build options and library versions. This option can be used to suppress printing this information.

-cpuflags flags (global)

Allows setting and clearing cpu flags. This option is intended for testing. Do not use it unless you know what you’re doing.

Possible flags for this option are:

Override detection of CPU count. This option is intended for testing. Do not use it unless you know what you’re doing.

Set the maximum size limit for allocating a block on the heap by ffmpeg’s family of malloc functions. Exercise extreme caution when using this option. Don’t use if you do not understand the full consequence of doing so. Default is INT_MAX.

5.3 AVOptions

These options can be set for any container, codec or device. Generic options are listed under AVFormatContext options for containers/devices and under AVCodecContext options for codecs.

These options are specific to the given container, device or codec. Private options are listed under their corresponding containers/devices/codecs.

For example to write an ID3v2.3 header instead of a default ID3v2.4 to an MP3 file, use the id3v2_version private option of the MP3 muxer:

All codec AVOptions are per-stream, and thus a stream specifier should be attached to them:

In the above example, a multichannel audio stream is mapped twice for output. The first instance is encoded with codec ac3 and bitrate 640k. The second instance is downmixed to 2 channels and encoded with codec aac. A bitrate of 128k is specified for it using absolute index of the output stream.

Note: the old undocumented way of specifying per-stream AVOptions by prepending v/a/s to the options name is now obsolete and will be removed soon.

5.4 Main options

Force input or output file format. The format is normally auto detected for input files and guessed from the file extension for output files, so this option is not needed in most cases.

Overwrite output files without asking.

Do not overwrite output files, and exit immediately if a specified output file already exists.

-stream_loop number (input)

Allow forcing a decoder of a different media type than the one detected or designated by the demuxer. Useful for decoding media data muxed as data streams.

Select an encoder (when used before an output file) or a decoder (when used before an input file) for one or more streams. codec is the name of a decoder/encoder or a special value copy (output only) to indicate that the stream is not to be re-encoded.

encodes all video streams with libx264 and copies all audio streams.

For each stream, the last matching c option is applied, so

will copy all the streams except the second video, which will be encoded with libx264, and the 138th audio, which will be encoded with libvorbis.

-t duration (input/output)

-to position (input/output)

-fs limit_size (output)

Set the file size limit, expressed in bytes. No further chunk of bytes is written after the limit is exceeded. The size of the output file is slightly more than the requested file size.

-ss position (input/output)

-sseof position (input)

-isync input_index (input)

Assign an input as a sync source.

This will take the difference between the start times of the target and reference inputs and offset the timestamps of the target file by that difference. The source timestamps of the two inputs should derive from the same clock source for expected results. If copyts is set then start_at_zero must also be set. If either of the inputs has no starting timestamp then no sync adjustment is made.

-itsoffset offset (input)

Set the input time offset.

-itsscale scale (input,per-stream)

Rescale input timestamps. scale should be a floating point number.

-timestamp date (output)

Set the recording timestamp in the container.

-metadata[:metadata_specifier] key = value (output,per-metadata)

Set a metadata key/value pair.

For example, for setting the title in the output file:

To set the language of the first audio stream:

Sets the disposition for a stream.

value is a sequence of items separated by ’+’ or ’-’. The first item may also be prefixed with ’+’ or ’-’, in which case this option modifies the default value. Otherwise (the first item is not prefixed) this options overrides the default value. A ’+’ prefix adds the given disposition, ’-’ removes it. It is also possible to clear the disposition by setting it to 0.

For example, to make the second audio stream the default stream:

To make the second subtitle stream the default stream and remove the default disposition from the first subtitle stream:

To add an embedded cover/thumbnail:

Not all muxers support embedded thumbnails, and those who do, only support a few formats, like JPEG or PNG.

-target type (output)

Nevertheless you can specify additional options as long as you know they do not conflict with the standard, as in:

The parameters set for each target are as follows.

VCD

SVCD

DVD

The dv50 target is identical to the dv target except that the pixel format set is yuv422p for all three standards.

Any user-set value for a parameter above will override the target preset value. In that case, the output may not comply with the target standard.

-dframes number (output)

-frames[: stream_specifier ] framecount (output,per-stream)

Stop writing to the stream after framecount frames.

Use fixed quality scale (VBR). The meaning of q / qscale is codec-dependent. If qscale is used without a stream_specifier then it applies only to the video stream, this is to maintain compatibility with previous behavior and as specifying the same codec specific value to 2 different codecs that is audio and video generally is not what is intended when no stream_specifier is used.

-filter[: stream_specifier ] filtergraph (output,per-stream)

Create the filtergraph specified by filtergraph and use it to filter the stream.

-filter_script[: stream_specifier ] filename (output,per-stream)

-reinit_filter[: stream_specifier ] integer (input,per-stream)

This boolean option determines if the filtergraph(s) to which this stream is fed gets reinitialized when input frame parameters change mid-stream. This option is enabled by default as most video and all audio filters cannot handle deviation in input frame properties. Upon reinitialization, existing filter state is lost, like e.g. the frame count n reference available in some filters. Any frames buffered at time of reinitialization are lost. The properties where a change triggers reinitialization are, for video, frame resolution or pixel format; for audio, sample format, sample rate, channel count or channel layout.

-filter_threads nb_threads (global)

Defines how many threads are used to process a filter pipeline. Each pipeline will produce a thread pool with this many threads available for parallel processing. The default is the number of available CPUs.

-pre[: stream_specifier ] preset_name (output,per-stream)

Specify the preset for matching stream(s).

-stats_period time (global)

Set period at which encoding progress/statistics are updated. Default is 0.5 seconds.

-progress url (global)

Progress information is written periodically and at the end of the encoding process. It is made of » key = value » lines. key consists of only alphanumeric characters. The last key of a sequence of progress information is always «progress».

Print timestamp information. It is off by default. This option is mostly useful for testing and debugging purposes, and the output format may change from one version to another, so it should not be employed by portable scripts.

-attach filename (output)

Note that for Matroska you also have to set the mimetype metadata tag:

(assuming that the attachment stream will be third in the output file).

-dump_attachment[: stream_specifier ] filename (input,per-stream)

E.g. to extract the first attachment to a file named ’out.ttf’:

To extract all attachments to files determined by the filename tag:

Technical note – attachments are implemented as codec extradata, so this option can actually be used to extract extradata from any stream, not just attachments.

5.5 Video Options

-r[: stream_specifier ] fps (input/output,per-stream)

Set frame rate (Hz value, fraction or abbreviation).

-fpsmax[: stream_specifier ] fps (output,per-stream)

Set maximum frame rate (Hz value, fraction or abbreviation).

-s[: stream_specifier ] size (input/output,per-stream)

As an input option, this is a shortcut for the video_size private option, recognized by some demuxers for which the frame size is either not stored in the file or is configurable – e.g. raw video or video grabbers.

As an output option, this inserts the scale video filter to the end of the corresponding filtergraph. Please use the scale filter directly to insert it at the beginning or some other place.

-aspect[: stream_specifier ] aspect (output,per-stream)

-vcodec codec (output)

-pass[: stream_specifier ] n (output,per-stream)

-vf filtergraph (output)

Create the filtergraph specified by filtergraph and use it to filter the stream.

5.6 Advanced Video options

-sws_flags flags (input/output)

Set SwScaler flags.

-rc_override[: stream_specifier ] override (output,per-stream)

Rate control override for specific intervals, formatted as «int,int,int» list separated with slashes. Two first values are the beginning and end frame numbers, last one is quantizer to use if positive, or quality factor if negative.

Specifies which version of the vstats format to use. Default is 2.

frame= %5d q= %2.1f PSNR= %6.2f f_size= %6d s_size= %8.0fkB time= %0.3f br= %7.1fkbits/s avg_br= %7.1fkbits/s

out= %2d st= %2d frame= %5d q= %2.1f PSNR= %6.2f f_size= %6d s_size= %8.0fkB time= %0.3f br= %7.1fkbits/s avg_br= %7.1fkbits/s

-top[: stream_specifier ] n (output,per-stream)

top=1/bottom=0/auto=-1 field first

-vtag fourcc/tag (output)

Show QP histogram

force_key_frames can take arguments of the following form:

For example, to insert a key frame at 5 minutes, plus key frames 0.1 second before the beginning of every chapter:

The expression in expr can contain the following constants:

the number of current processed frame, starting from 0

the number of forced frames

the number of the previous forced frame, it is NAN when no keyframe was forced yet

the time of the previous forced frame, it is NAN when no keyframe was forced yet

the time of the current processed frame

For example to force a key frame every 5 seconds, you can specify:

To force a key frame 5 seconds after the time of the last forced one, starting from second 13:

Note that forcing too many keyframes is very harmful for the lookahead algorithms of certain encoders: using fixed-GOP options or similar would be more efficient.

-copyinkf[: stream_specifier ] (output,per-stream)

When doing stream copy, copy also non-key frames found at the beginning.

The meaning of device and the following arguments depends on the device type:

device is the number of the CUDA device.

The following options are recognized:

If set to 1, uses the primary device context instead of creating a new one.

Choose the second device on the system.

Choose the first device and use the primary device context.

device is the number of the Direct3D 9 display adapter.

device is the number of the Direct3D 11 display adapter.

device is either an X11 display name or a DRM render node. If not specified, it will attempt to open the default X11 display ($DISPLAY) and then the first DRM render node (/dev/dri/renderD128).

device is an X11 display name. If not specified, it will attempt to open the default X11 display ($DISPLAY).

device selects a value in ‘ MFX_IMPL_* ’. Allowed values are:

auto sw hw auto_any hw_any hw2 hw3 hw4

If not specified, ‘ auto_any ’ is used. (Note that it may be easier to achieve the desired result for QSV by creating the platform-appropriate subdevice (‘ dxva2 ’ or ‘ d3d11va ’ or ‘ vaapi ’) and then deriving a QSV device from that.)

Alternatively, ‘ child_device_type ’ helps to choose platform-appropriate subdevice type. On Windows ‘ d3d11va ’ is used as default subdevice type.

Choose the GPU subdevice with type ‘ d3d11va ’ and create QSV device with ‘ MFX_IMPL_HARDWARE ’.

Choose the GPU subdevice with type ‘ dxva2 ’ and create QSV device with ‘ MFX_IMPL_HARDWARE ’.

device selects the platform and device as platform_index.device_index.

The set of devices can also be filtered using the key-value pairs to find only devices matching particular platform or device strings.

The strings usable as filters are:

platform_profile platform_version platform_name platform_vendor platform_extensions device_name device_vendor driver_version device_version device_profile device_extensions device_type

The indices and filters must together uniquely select a device.

Choose the second device on the first platform.

Choose the device with a name containing the string Foo9000.

Choose the GPU device on the second platform supporting the cl_khr_fp16 extension.

If device is an integer, it selects the device by its index in a system-dependent list of devices. If device is any other string, it selects the first device with a name containing that string as a substring.

The following options are recognized:

If set to 1, enables the validation layer, if installed.

If set to 1, images allocated by the hwcontext will be linear and locally mappable.

A plus separated list of additional instance extensions to enable.

A plus separated list of additional device extensions to enable.

Choose the second device on the system.

Choose the first device with a name containing the string RADV.

Choose the first device and enable the Wayland and XCB instance extensions.

-init_hw_device type [= name ]@ source

List all hardware device types supported in this build of ffmpeg.

This is a global setting, so all filters will receive the same device.

-hwaccel[: stream_specifier ] hwaccel (input,per-stream)

Use hardware acceleration to decode the matching stream(s). The allowed values of hwaccel are:

Do not use any hardware acceleration (the default).

Automatically select the hardware acceleration method.

Use VDPAU (Video Decode and Presentation API for Unix) hardware acceleration.

Use DXVA2 (DirectX Video Acceleration) hardware acceleration.

Use D3D11VA (DirectX Video Acceleration) hardware acceleration.

Use VAAPI (Video Acceleration API) hardware acceleration.

Use the Intel QuickSync Video acceleration for video transcoding.

Unlike most other values, this option does not enable accelerated decoding (that is used automatically whenever a qsv decoder is selected), but accelerated transcoding, without copying the frames into the system memory.

For it to work, both the decoder and the encoder must support QSV acceleration and no filters must be used.

This option has no effect if the selected hwaccel is not available or not supported by the chosen decoder.

Note that most acceleration methods are intended for playback and will not be faster than software decoding on modern CPUs. Additionally, ffmpeg will usually need to copy the decoded frames from the GPU memory into the system memory, resulting in further performance loss. This option is thus mainly useful for testing.

-hwaccel_device[: stream_specifier ] hwaccel_device (input,per-stream)

Select a device to use for hardware acceleration.

List all hardware acceleration components enabled in this build of ffmpeg. Actual runtime availability depends on the hardware and its suitable driver being installed.

5.7 Audio Options

-ar[: stream_specifier ] freq (input/output,per-stream)

Set the audio sampling frequency. For output streams it is set by default to the frequency of the corresponding input stream. For input streams this option only makes sense for audio grabbing devices and raw demuxers and is mapped to the corresponding demuxer options.

-ac[: stream_specifier ] channels (input/output,per-stream)

Set the number of audio channels. For output streams it is set by default to the number of input audio channels. For input streams this option only makes sense for audio grabbing devices and raw demuxers and is mapped to the corresponding demuxer options.

-acodec codec (input/output)

-sample_fmt[: stream_specifier ] sample_fmt (output,per-stream)

-af filtergraph (output)

Create the filtergraph specified by filtergraph and use it to filter the stream.

5.8 Advanced Audio options

-guess_layout_max channels (input,per-stream)

If some input channel layout is not known, try to guess only if it corresponds to at most the specified number of channels. For example, 2 tells to ffmpeg to recognize 1 channel as mono and 2 channels as stereo but not 6 channels as 5.1. The default is to always try to guess. Use 0 to disable all guessing.

5.9 Subtitle options

5.10 Advanced Subtitle options

Fix subtitles durations. For each subtitle, wait for the next packet in the same stream and adjust the duration of the first to avoid overlap. This is necessary with some subtitles codecs, especially DVB subtitles, because the duration in the original packet is only a rough estimate and the end is actually marked by an empty subtitle frame. Failing to use this option when necessary can result in exaggerated durations or muxing failures due to non-monotonic timestamps.

Note that this option will delay the output of all data until the next subtitle packet is decoded: it may increase memory consumption and latency a lot.

Set the size of the canvas used to render subtitles.

5.11 Advanced options

Using this option disables the default mappings for this output file.

To map ALL streams from the first input file to output

create multiple streams

To select the stream with index 2 from input file a.mov (specified by the identifier 0:2 ), and stream with index 6 from input b.mov (specified by the identifier 1:6 ), and copy them to the output file out.mov :

To select all video and the third audio stream from an input file:

To map all the streams except the second audio, use negative mappings

To pick the English audio stream:

Ignore input streams with unknown type instead of failing if copying such streams is attempted.

Allow input streams with unknown type to be copied instead of failing if copying such streams is attempted.

For example, assuming INPUT is a stereo audio file, you can switch the two audio channels with the following command:

If you want to mute the first channel and keep the second:

The order of the «-map_channel» option specifies the order of the channels in the output stream. The output channel layout is guessed from the number of channels mapped (mono if one «-map_channel», stereo if two, etc.). Using «-ac» in combination of «-map_channel» makes the channel gain levels to be updated if input and output channel layouts don’t match (for instance two «-map_channel» options and «-ac 6»).

You can also extract each channel of an input to specific outputs; the following command extracts two channels of the INPUT audio stream (file 0, stream 0) to the respective OUTPUT_CH0 and OUTPUT_CH1 outputs:

The following example splits the channels of a stereo input into two separate streams, which are put into the same output file:

Note that currently each output stream can only contain channels from a single input stream; you can’t for example use «-map_channel» to pick multiple input audio channels contained in different streams (from the same or different files) and merge them into a single output stream. It is therefore not currently possible, for example, to turn two separate mono streams into a single stereo stream. However splitting a stereo stream into two single channel mono streams is possible.

If you need this feature, a possible workaround is to use the amerge filter. For example, if you need to merge a media (here input.mkv ) with 2 mono audio streams into one single stereo channel audio stream (and keep the video stream), you can use the following command:

global metadata, i.e. metadata that applies to the whole file

per-stream metadata. stream_spec is a stream specifier as described in the Stream specifiers chapter. In an input metadata specifier, the first matching stream is copied from. In an output metadata specifier, all matching streams are copied to.

per-chapter metadata. chapter_index is the zero-based chapter index.

per-program metadata. program_index is the zero-based program index.

If metadata specifier is omitted, it defaults to global.

By default, global metadata is copied from the first input file, per-stream and per-chapter metadata is copied along with streams/chapters. These default mappings are disabled by creating any mapping of the relevant type. A negative file index can be used to create a dummy mapping that just disables automatic copying.

For example to copy metadata from the first stream of the input file to global metadata of the output file:

To do the reverse, i.e. copy global metadata to all audio streams:

Note that simple 0 would work as well in this example, since global metadata is assumed by default.

-map_chapters input_file_index (output)

Copy chapters from input file with index input_file_index to the next output file. If no chapter mapping is specified, then chapters are copied from the first input file with at least one chapter. Use a negative file index to disable any chapter copying.

Show benchmarking information at the end of an encode. Shows real, system and user time used and maximum memory consumption. Maximum memory consumption is not supported on all systems, it will usually display as 0 if not supported.

Show benchmarking information during the encode. Shows real, system and user time used in various steps (audio/video encode/decode).

-timelimit duration (global)

Exit after ffmpeg has been running for duration seconds in CPU user time.

Dump each input packet to stderr.

When dumping packets, also dump the payload.

-readrate speed (input)

Limit input read speed.

Mainly used to simulate a capture device or live input stream (e.g. when reading from a file). Should not be used with a low value when input is an actual capture device or live stream as it may cause packet loss.

It is useful for when flow speed of output packets is important, such as live streaming.

Set video sync method / framerate mode. vsync is applied to all output video streams but can be overridden for a stream by setting fps_mode. vsync is deprecated and will be removed in the future.

For compatibility reasons some of the values for vsync can be specified as numbers (shown in parentheses in the following table).

Each frame is passed with its timestamp from the demuxer to the muxer.

Frames will be duplicated and dropped to achieve exactly the requested constant frame rate.

Frames are passed through with their timestamp or dropped so as to prevent 2 frames from having the same timestamp.

As passthrough but destroys all timestamps, making the muxer generate fresh timestamps based on frame-rate.

Chooses between cfr and vfr depending on muxer capabilities. This is the default method.

Note that the timestamps may be further modified by the muxer, after this. For example, in the case that the format option avoid_negative_ts is enabled.

This option has been deprecated. Use the aresample audio filter instead.

-apad parameters (output,per-stream)

Do not process input timestamps, but keep their values without trying to sanitize them. In particular, do not remove the initial start time offset value.

Note that, depending on the vsync option or on specific muxer processing (e.g. in case the format option avoid_negative_ts is enabled) the output timestamps may mismatch with the input timestamps even when this option is selected.

Specify how to set the encoder timebase when stream copying. mode is an integer numeric value, and can assume one of the following values:

Use the demuxer timebase.

The time base is copied to the output encoder from the corresponding input demuxer. This is sometimes required to avoid non monotonically increasing timestamps when copying video streams with variable frame rate.

Use the decoder timebase.

The time base is copied to the output encoder from the corresponding input decoder.

Try to make the choice automatically, in order to generate a sane output.

-enc_time_base[: stream_specifier ] timebase (output,per-stream)

Set the encoder timebase. timebase is a floating point number, and can assume one of the following values:

Assign a default value according to the media type.

Use the input stream timebase when possible.

If an input stream is not available, the default timebase will be used.

Use the provided number as the timebase.

This field can be provided as a ratio of two integers (e.g. 1:24, 1:48000) or as a floating point number (e.g. 0.04166, 2.0833e-5)

Default value is 0.

Enable bitexact mode for (de)muxer and (de/en)coder

Finish encoding when the shortest output stream ends.

-shortest_buf_duration duration (output)

The default value is 10 seconds.

Timestamp discontinuity delta threshold.

Timestamp error delta threshold. This threshold use to discard crazy/damaged timestamps and the default is 30 hours which is arbitrarily picked and quite conservative.

-muxdelay seconds (output)

Set the maximum demux-decode delay.

-muxpreload seconds (output)

Set the initial demux-decode delay.

-streamid output-stream-index : new-value (output)

Assign a new stream-id value to an output stream. This option should be specified prior to the output filename to which it applies. For the situation where multiple output files exist, a streamid may be reassigned to a different value.

For example, to set the stream 0 PID to 33 and the stream 1 PID to 36 for an output mpegts file:

Force a tag/fourcc for matching streams.

-timecode hh : mm : ss SEP ff

Specify Timecode for writing. SEP is ’:’ for non drop timecode and ’;’ (or ’.’) for drop.

Note that with this option it is possible to use only lavfi sources without normal input files.

For example, to overlay an image over video

Here [0:v] refers to the first video stream in the first input file, which is linked to the first (main) input of the overlay filter. Similarly the first video stream in the second input is linked to the second (overlay) input of overlay.

Assuming there is only one video stream in each input file, we can omit input labels, so the above is equivalent to

Furthermore we can omit the output label and the single output from the filter graph will be added to the output file automatically, so we can simply write

As a special exception, you can use a bitmap subtitle stream as input: it will be converted into a video with the same size as the largest video in the file, or 720×576 if no video is present. Note that this is an experimental and temporary solution. It will be removed once libavfilter has proper support for subtitles.

For example, to hardcode subtitles on top of a DVB-T recording stored in MPEG-TS format, delaying the subtitles by 1 second:

(0x2d0, 0x2dc and 0x2ef are the MPEG-TS PIDs of respectively the video, audio and subtitles streams; 0:0, 0:3 and 0:7 would have worked too)

To generate 5 seconds of pure red video using lavfi color source:

-lavfi filtergraph (global)

-filter_complex_script filename (global)

-thread_queue_size size (input/output)

For input, this option sets the maximum number of queued packets when reading from the file or device. With low latency / high rate live streams, packets may be discarded if they are not read in a timely manner; setting this value can force ffmpeg to use a separate input thread and read packets as soon as they arrive. By default ffmpeg only does this if multiple inputs are specified.

For output, this option specified the maximum number of packets that may be queued to each muxing thread.

-sdp_file file (global)

Allows discarding specific streams or frames from streams. Any input stream can be fully discarded, using value all whereas selective discarding of frames from a stream occurs at the demuxer and is not supported by all demuxers.

Discard no frame.

Default, which discards no frames.

Discard all non-reference frames.

Discard all bidirectional frames.

Discard all frames excepts keyframes.

Discard all frames.

-abort_on flags (global)

Stop and abort on various conditions. The following flags are available:

No packets were passed to the muxer, the output is empty.

No packets were passed to the muxer in some of the output streams.

Set fraction of decoding frame failures across all inputs which when crossed ffmpeg will return exit code 69. Crossing this threshold does not terminate processing. Range is a floating-point number between 0 to 1. Default is 2/3.

Stop and exit on error

-max_muxing_queue_size packets (output,per-stream)

When transcoding audio and/or video streams, ffmpeg will not begin writing into the output until it has one packet for each such stream. While waiting for that to happen, packets for other streams are buffered. This option sets the size of this buffer, in packets, for the matching output stream.

The default value of this option should be high enough for most uses, so only touch this option if you are sure that you need it.

-muxing_queue_data_threshold bytes (output,per-stream)

This is a minimum threshold until which the muxing queue size is not taken into account. Defaults to 50 megabytes per stream, and is based on the overall size of packets passed to the muxer.

-bits_per_raw_sample[: stream_specifier ] value (output,per-stream)

5.12 Preset files

A preset file contains a sequence of option = value pairs, one for each line, specifying a sequence of options which would be awkward to specify on the command line. Lines starting with the hash (’#’) character are ignored and are used to provide comments. Check the presets directory in the FFmpeg source tree for examples.

There are two types of preset files: ffpreset and avpreset files.

5.12.1 ffpreset files

5.12.2 avpreset files

avpreset files are specified with the pre option. They work similar to ffpreset files, but they only allow encoder- specific options. Therefore, an option = value pair specifying an encoder cannot be used.

6 Examples

6.1 Video and Audio grabbing

If you specify the input format and device then ffmpeg can grab video and audio directly.

Or with an ALSA audio source (mono input, card id 1) instead of OSS:

Note that you must activate the right video source and channel before launching ffmpeg with any TV viewer such as xawtv by Gerd Knorr. You also have to set the audio recording levels correctly with a standard mixer.

6.2 X11 grabbing

Grab the X11 display with ffmpeg via

0.0 is display.screen number of your X11 server, same as the DISPLAY environment variable.

0.0 is display.screen number of your X11 server, same as the DISPLAY environment variable. 10 is the x-offset and 20 the y-offset for the grabbing.

6.3 Video and Audio file format conversion

Any supported file format and protocol can serve as input to ffmpeg:

It will use the files:

test.yuv is a file containing raw YUV planar data. Each frame is composed of the Y plane followed by the U and V planes at half vertical and horizontal resolution.

Converts the audio file a.wav and the raw YUV video file a.yuv to MPEG file a.mpg.

Converts a.wav to MPEG audio at 22050 Hz sample rate.

Converts a.wav to a.mp2 at 64 kbits and to b.mp2 at 128 kbits. ’-map file:index’ specifies which input stream is used for each output stream, in the order of the definition of output streams.

For extracting images from a video:

For creating a video from many images:

The syntax foo-%03d.jpeg specifies to use a decimal number composed of three digits padded with zeroes to express the sequence number. It is the same syntax supported by the C printf function, but only formats accepting a normal integer are suitable.

For example, for creating a video from filenames matching the glob pattern foo-*.jpeg :

The resulting output file test12.nut will contain the first four streams from the input files in reverse order.

7 See Also

8 Authors

The FFmpeg developers.

For details about the authorship, see the Git history of the project (git://source.ffmpeg.org/ffmpeg), e.g. by typing the command git log in the FFmpeg source directory, or browsing the online repository at http://source.ffmpeg.org.

Maintainers for the specific components are listed in the file MAINTAINERS in the source code tree.

This document was generated on August 14, 2022 using makeinfo.

Источники информации:
http://wiki.archlinux.org/title/FFmpeg
http://gist.github.com/protrolium/e0dbd4bb0f1a396fcb55
http://habr.com/ru/company/edison/blog/495614/
http://ffmpeg.org/ffmpeg.html

How to use ffmpeg

19 команд ffmpeg для любых нужд

1. Получение информации о видеофайле

2. Превратить набор картинок в видео

3. Порезать видео на картинки

FFmpeg

Contents

Installation

Encoding examples

Screen capture

Recording webcam

VOB to any container

Lossless

Constant rate factor

Two-pass (very high-quality)

Video stabilization

First pass

Second pass

Single-pass MPEG-2 (near lossless)

Subtitles

Extracting

Hardsubbing

Volume gain

Volume normalization

Extracting audio

Stripping audio

Splitting files

Hardware video acceleration

VA-API

NVIDIA NVENC/NVDEC

Intel QuickSync (QSV)

AMD AMF

Animated GIF

Preset files

Using preset files

libavcodec-vhq.ffpreset

Tips and tricks

Reduce verbosity

protrolium / ffmpeg.md

Руководство по FFmpeg libav

Оглавление

Вступление ↑

Видео — это то, что ты видишь! ↑

Аудио — это то, что ты слышишь! ↑

Кодек — сжатие данных ↑

Контейнер — удобный способ хранения аудио/видео ↑

Командная строка FFmpeg↑

Инструмент командной строки FFmpeg 101 ↑

Основные операции над видео↑

Транскодирование (перекодирование)↑

Трансмультиплексирование↑

Трансрейтинг↑

Трансайзинг (перекалибровка)↑

Бонус: адаптивный стриминг↑

Выходя за рамки↑

Тернистый путь изучения FFmpeg libav↑

Глава 0 — простенький «Hello World»↑

Архитектура FFmpeg libav↑

Требования↑

Собственно, код↑

Глава 1 — синхронизация аудио и видео↑

Глава 2 — ремультиплексирование↑

Глава 3 — транскодирование↑

Трансмультиплексирование↑

Транскодирование↑

How to use ffmpeg

3 Detailed description

3.1 Filtering

3.1.1 Simple filtergraphs

3.1.2 Complex filtergraphs

3.2 Stream copy

4 Stream selection

4.1 Description

4.1.1 Automatic stream selection

4.1.2 Manual stream selection

4.1.3 Complex filtergraphs

4.1.4 Stream handling

4.2 Examples

Example: automatic stream selection

Example: automatic subtitles selection