Skip to main content

Encoding raw images to Ogg Theora video using libavcodec

In one of the blog posts we learned how to decode jpeg images using libavcodec. This is the second part of that post. In this we will learn how to encode decoded images (raw images) to theora and write them in ogg video file. In the end of the first part we saved our raw image in raw_data variable and its length in raw_data_size variable. Lets assume that we packaged all our decoding code in one function called "decode_jpeg_image" which has following signature
int decode_jpeg_image(char *filename,int file_name_size,uint8_t *raw_data,int *raw_data_size)
filename = name of jpeg file to decode
file_name_size = length of jpeg file's name
raw_data = contains decoded raw image on return
raw_data_size = contains length of raw_data on return

Now let's start working on how to encode this image in raw_data to theora and write that image to ogg video file.

Finding Theora encoder
We first have to find encoder for THEORA which is represented by AVCodec structure.
Here is how we will find encoder using codec id of theora.
AVCodec *theoraEncoder = avcodec_find_encoder(AV_CODEC_ID_THEORA);
Initializing AVCodecContext and getting PixelFormat supported by encoder
Next step is allocate AVCodecContext and also to check which PixelFormat is supported by encoder. pix_fmts of AVCodec contains array of PixelFormat supported by encoder. Let's just get the very first PixelFormat supported by this encoder:
PixelFormat pixFormat = theoraEncoder->pix_fmts[0];
Now we will allocate AVCodecContext using the AVCodec we found in first step. AVCodecContext is allocated using the function avcodec_alloc_context3. Lets do it:
AVCodecContext *encoderContext=avcodec_alloc_context3(theoraEncoder);
Configuring AVCodecContext
We will now configure our encoderContext to suit our needs. There are many variables that can be configured to many different values. Following is the configuration worked for me best for this project:
encoderContext->bit_rate=(width*height); //width=width of decoded image,height=height of decoded image
encoderContext->bir_rate_tolerance=0;
encoderContext->width=width;
encoderContext->height=height;
encoderContext->time_base.den=frame_rate;
encoderContext->time_base.num=1;
encoderContext->pix_fmt=pixFormat;
encoderContext->gop_size=1;
encoderContext->keyint_min=1;
Actually opening the encoder
Till now we have only find the encoder and allocated AVCodecContext and configured it to our values. Next step is to actually open the encoder so it can be initialized and it uses our configured values. Codecs are opened using avcodec_open2 function:
int retVal=avcodec_open2(encoderContext,theorEncoder,NULL);
Allocating Output Format
So we have opened decoder with our configuration . Now we have to do stuff related to writing our encoded output in ogg file. We will first allocate AVFormatContext using avformat_alloc_output_context2. We can pass AVOutputFormat or format name or filename to this function to detect output type. For this post we will use format name as we know it in advance that is "ogg":
AVOutputFormat *outFormat=NULL;
AVFormatContext *outContext;
avformat_alloc_output_context2(&outContext,NULL,"ogg",NULL);
//set output format returned in outContext to outFormat
outFormat=outContext->oformat;
Adding video stream to newly allocated AVFormatContext
Our outFormat has no stream added to it. Adding a stream is necessary because streams are used to encode/decode data. It is very easy to add a new stream for any AVCodec to AVFormatContext using avformat_new_stream function:
AVStream *videoStream = avformat_new_stream(outContext,theoraEncoder);
videoStream->codec = encoderContext;
videoStream->pts.val = 0;
Allocating Input/Output Context to write to our destination
AVIOContext is a structure that we can set to pb variable of outContext so that outContext will write encoded data to our callback function because we have not provided any output source to our outContext. We need a very large buffer to allocate AVIOContext using avio_alloc_context function:
int ioBuffSize=7200000*30; //random value that worked for me
uint8_t *ioBuff=(uint8_t *)av_malloc(ioBuffSize);
AVIOContext *ioContext=avio_alloc_context(ioBuff,ioBuffSize,1,data_tag,NULL,write_buf,NULL);
ioContext->seekable=0; //we dont need to seek as not required in ogg but required for mp4
outContext->pb=ioContext;
Implementing write_buf function to write data to our ogg file:
static int write_buf(void *data_tag,uint8_t *buff,int buff_size){
    //buff contains the encoded data
    //buff_size contains the length of valid data in buff
    FILE *fp=fopen("encoded.ogg","a+");
    fwrite(buff,1,buff_size,fp);
    fclose(fp);
}
Encoder will call write_buf every time it has some data to write (whether encoded frame,header or trailer). data_tag is a variable that you can pass through callback and can access in write_buf function.

Writing header of file before starting encoding process
Most of the media files have header that contains some information about the encoded data it contains. Ogg file also have a header. We have a very simple function to write header to our output context. Here is how we write header:
avformat_write_header(outContext,NULL);
Allocating AVFrame to hold our raw data
We have are raw data bytes saved in raw_data variable. But libav functions expects AVFrame and AVPicture data types to process picture data, whether its raw or encoded. So we will allocate an AVPicture that will hold our raw (decoded) picture and we will pass this AVPicture to encoder for encoding it into theora frame.
AVPicture *rawPic=(AVPicture *)av_malloc(sizeof(AVPicture));
After allocating AVPicture we will fill allocate its dimensions and pixel format. Let's assume we will use YUV format for our raw picture
avpicture_alloc(rawPic,PIX_FMT_YUV420P,width,height); 
Encoding raw data to THEORA frame
Next step is to pass raw data to encoder and get it encoded into a THEORA frame. We first need to allocate an AVPacket that will hold the encoded THEORA frame returned by encoder.
AVPacket encodedPic;
av_init_packet(&encodedPic);
encodedPic.data=NULL;
encodedPic.size=0;
encodedPic.stream_index=videoStream->index;
Before calling encoding function we need to fill our AVPicture (rawPic) that we allocated to hold our raw data. We fill rawPic with our raw data like this
av_image_fill_arrays(rawPic->data,rawPic->linesize,raw_data,PIX_FMT_YUV420P,width,height,1);
Next we set PTS value of our rawPic by first casting it to AVFrame then set PTS value on it
AVFrame *castedRawPic=(AVFrame *)rawPic;
castedRawPic->pts=frame_count+1; //frame_count can be an integer starting with 0
Now we will call actual encoding function avcodec_encode_video2 that will take our rawPic and return encoded frame in encodedPic. But this function takes AVFrame as input instead of AVPicture, luckily we have already casted our rawPic to AVFrame (castedRawPic). So lets just call this function
int gotit;
avcodec_encode_video2(encoderContext,&encodedPic,castedRawPic,&gotit);
If function is successful you will have encoded data in encodedPic variable.

Writing encoded data to OGG video file
We have successfully encoded our raw(decoded) image to theora frame but we have to save it in ogg file. Instead of using fwrite or anything else we just use libav function av_interleaved_write_frame and pass our encoded picture to it. This function will handle all interleaving and when data is ready to write it will call our write_buf function that we declared earlier because we configured our context to call this function whenever data is available to output. Before calling av_interleaved_write_frame we first have to do some pts calculation on our AVPacket (encodedPic) using time base of our context and stream time base. But its very simple . Here is how to do this
av_packet_rescale_ts(&encodedPic, &encoderContext->time_base, videoStream->time_base);
encodedPic->stream_index=videoStream->index;
Now its time to call our last function that will output this encoded frame to ogg file with proper interleaving
av_interleaved_write_frame(outContext,&encodedPic);
To make a ogg video file using multiple jpeg images you can call decode_jpeg_image in a loop and pass returned data to encoder functions.

Read Part1 to understand how to decode jpeg images

Comments

  1. Nice article... I have one question: AVIOContext has a parameter called "data_tag". Where is that parameter set? The function "write_buf" has an argument with the same name. Can you explain how to set and use it? Thanks!

    ReplyDelete

Post a Comment

Share your wisdom

Popular posts from this blog

Decoding JPEG image file using libavcodec

I got a chance to work on a video encoding application that decodes series of jpeg files and convert them into ogg theora video file. I used the infamous libavcodec library that is used in FFMPEG . I decided to write blog posts explaining how I decode jpeg images and convert them into ogg video file. This is the first part and in this I will explain how to decode jpeg images using libavcodec. To learn how to write decoded images as a ogg video file please read http://random-stuff-mine.blogspot.com/2017/07/encoding-raw-images-to-ogg-theora-video.html Before reading this blog post you must be aware of using and setting up libavcodec. I highly recommend this tutorial to get basics of using libavcodec http://www.ffmpeg.org/doxygen/0.6/api-example_8c-source.html Allocating input format context We will first allocate input format for reading the file. We will use avformat_open_input function that will allocate AVFormatContext structure passed to it , the function detects input typ

CryptographicException: An error occurred while trying to encrypt the provided data. Refer to the inner exception for more information

I created a new Blazor Server app in Visual Studio 2019 and tried to run it. But I was getting this error CryptographicException: An error occurred while trying to encrypt the provided data. Refer to the inner exception for more information. I couldn't find any reason or solution to this problem. I tried creating the project multiple times but same error. I created a new .Net Core Web App and added a new razor component and included that component in a razor page (cshtml file) like this @(await Html.RenderComponentAsync<GeofenceWork>(RenderMode.ServerPrerendered)) and <component type="typeof(GeofenceWork)" render-mode="serverprerendered" /> As soon as I navigate to this page that has component added I got the same error: CryptographicException: An error occurred while trying to encrypt the provided data. Refer to the inner exception for more information. This was very frustrating. After hours of trying and searching I figured out the solution. 

Multithreaded C# TCP server to handle multiple clients

I decided to write a minimal multithreaded TCP based server as a blog post. Following class can serve as a skeleton for any small or large scale multithreaded TCP socket server. It do not contain much error handling , it is only to give an idea that how multithreaded server works and how it can process multiple clients using threading. using System; using System.Text; using System.Net; using System.Net.Sockets; using System.Threading; using System.Collections.Generic; namespace RandomStuffMine { public class MTServer {     public int Port{get;set;}     public Socket ServerSocket{get;set;}     private List<Client> Clients=new List<Client>();     private bool runServer=true;     public MTServer(int port)     {         Port=port;         ServerSocket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);     }     public void Start()     {         Thread thr=new Thread(new ThreadStart(StartServer));         thr.IsBackground=t