I got a chance to work on a video encoding application that decodes series of jpeg files and convert them into ogg theora video file. I used the infamous libavcodec library that is used in FFMPEG.
I decided to write blog posts explaining how I decode jpeg images and convert them into ogg video file. This is the first part and in this I will explain how to decode jpeg images using libavcodec. To learn how to write decoded images as a ogg video file please read
http://random-stuff-mine.blogspot.com/2017/07/encoding-raw-images-to-ogg-theora-video.html
Before reading this blog post you must be aware of using and setting up libavcodec. I highly recommend this tutorial to get basics of using libavcodec
http://www.ffmpeg.org/doxygen/0.6/api-example_8c-source.html
Allocating input format context
We will first allocate input format for reading the file. We will use avformat_open_input function that will allocate AVFormatContext structure passed to it , the function detects input type using filename passed to it.
Here is how we allocat AVFormatContext structure.
Next step is to find AVStream objects in our input context (iFormatContext). For this libavcodec provides a function avformat_find_stream_info. It takes pointer to AVFormatContext and AVDictionary to pass extra parameters which for now we will ignore and pass as NULL.
After avformat_find_stream_info returns success, nb_streams member of AVFormatContext will contain the number of streams found and streams member will contain all the streams found. Our next task is to find video stream from them. As we have opened JPEG file it will contain only one stream and that is the video one. But still to make our code adaptable for other formats we will loop through streams member and find the video stream
videoStreamIndex contains the index of stream of our interest. Now we will find a decoder to decode the frames of this stream.
Now we will read a frame from our input using av_read_frame function. It takes pointer to AVFormatContext and AVPacket as input. On success AVPacket will contain the encoded video frame. Before calling this function we have to initialize our AVPacket using av_init_packet function.
To decode the video frame we will use avcodec_decode_video2 function. It takes pointer to AVCodecContext , pointer to AVFrame , pointer to int and pointer to AVPacket as input. AVFrame passed in parameter contains the decoded video frame on success. We have to allocate AVFrame using avcodec_alloc_frame function.
We mostly use RGB data to represent images. But encoders/decoders use other formats too like YUV etc. The image returned by jpeg decoder is in YUV format. So to convert it into RGB we will use some libavcodec functions like avpicture_alloc and sws_scale.
We now have our decoded picture in RGB format in destPic variable. But we cannot directly access destPic.data like a RGB32 data. To do this we will copy data in destPic to our own BYTE array for easy use and processing.
I decided to write blog posts explaining how I decode jpeg images and convert them into ogg video file. This is the first part and in this I will explain how to decode jpeg images using libavcodec. To learn how to write decoded images as a ogg video file please read
http://random-stuff-mine.blogspot.com/2017/07/encoding-raw-images-to-ogg-theora-video.html
Before reading this blog post you must be aware of using and setting up libavcodec. I highly recommend this tutorial to get basics of using libavcodec
http://www.ffmpeg.org/doxygen/0.6/api-example_8c-source.html
Allocating input format context
We will first allocate input format for reading the file. We will use avformat_open_input function that will allocate AVFormatContext structure passed to it , the function detects input type using filename passed to it.
Here is how we allocat AVFormatContext structure.
AVFormatContext *iFormatContext;Finding stream information
if (avformat_open_input(&iFormatContext,filename,NULL,NULL)!=0)
{
printf("Error in opening input file %s",filename);
return ERROR_CODE;
}
Next step is to find AVStream objects in our input context (iFormatContext). For this libavcodec provides a function avformat_find_stream_info. It takes pointer to AVFormatContext and AVDictionary to pass extra parameters which for now we will ignore and pass as NULL.
if (avformat_find_stream_info(iFormatContext,NULL)<0)Finding video stream from number of streams
{
printf("Error in finding stream info");
avformat_close_input(&iFormatContext); //release AVFormatContext memory
return ERROR_CODE;
}
After avformat_find_stream_info returns success, nb_streams member of AVFormatContext will contain the number of streams found and streams member will contain all the streams found. Our next task is to find video stream from them. As we have opened JPEG file it will contain only one stream and that is the video one. But still to make our code adaptable for other formats we will loop through streams member and find the video stream
int videoStreamIndex=-1;Finding decoder for video stream
for (int a=0;a<iFormatContext->nb_streams;a++)
{
if (iFormatContext->streams[a]->codec->codec_type==AVMEDIA_TYPE_VIDEO)
{
videoStreamIndex=a;
break;
}
}
if (videoStreamIndex==-1)
{
printf("Couldn't find video stream");
avformat_close_input(&iFormatContext);
return ERROR_CODE;
}
videoStreamIndex contains the index of stream of our interest. Now we will find a decoder to decode the frames of this stream.
AVCodecContext *pCodecCtx=iFormatContext->streams[videoStreamIndex]->codec;Reading video frame
AVCodec *pCodec=avcodec_find_decoder(pCodecCtx->codec_id);
if (pCodec==NULL)
{
printf("Cannot find decoder");
avformat_close_input(&iFormatContext);
return ERROR_CODE;
}
if (avcodec_open(pCodecCtx,pCodec)<0)
{
printf("Cannot open decoder");
avformat_close_input(&iFormatContext);
return ERROR_CODE;
}
Now we will read a frame from our input using av_read_frame function. It takes pointer to AVFormatContext and AVPacket as input. On success AVPacket will contain the encoded video frame. Before calling this function we have to initialize our AVPacket using av_init_packet function.
AVPacket encodedPacket;Decoding the encoded video frame
av_init_packet(&encodedPacket);
encodedPacket.data=NULL;
encodedPacket.size=0;
//now read a frame into this AVPacket
if (av_read_frame(iFormatContext,&pkt)<0)
{
printf("Cannot read frame");
av_free_packet(&encodedPacket);
avcodec_close(pCodecCtx);
avformat_input_close(&iFormatContext);
return ERROR_CODE;
}
To decode the video frame we will use avcodec_decode_video2 function. It takes pointer to AVCodecContext , pointer to AVFrame , pointer to int and pointer to AVPacket as input. AVFrame passed in parameter contains the decoded video frame on success. We have to allocate AVFrame using avcodec_alloc_frame function.
int frameFinished=0;Converting the pixel format
AVFrame *decodedFrame=avcodec_alloc_frame();
avcodec_decode_video2(pCodecContext,decodedFrame,&frameFinished,&encodedPkt);
//frameFinished on return contains 1 or 0 .
//If it is 1 it means the whole frame was decoded .
//If it is 0 it means that the frame was not decoded
We mostly use RGB data to represent images. But encoders/decoders use other formats too like YUV etc. The image returned by jpeg decoder is in YUV format. So to convert it into RGB we will use some libavcodec functions like avpicture_alloc and sws_scale.
AVPicture destPic;Copying decoded frame to buffer
//we will convert to RGB32
PixelFormat destFormat=PIX_FMT_RGB32;
avpicture_alloc(&destPic,destFormat, decodedFrame->width,decodedFrame->height);
SwsContext *ctxt = sws_getContext(avFrame->width, avFrame->height,
(PixelFormat)avFrame->format,avFrame->width, avFrame->height,
destFormat,SWS_BILINEAR, nullptr, nullptr, nullptr);
if (ctxt == NULL)
{
printf ("Error while calling sws_getContext");
return ERROR_CODE;
}
sws_scale(ctxt, avFrame->data, avFrame->linesize, 0, avFrame->height, destPic.data, destPic.linesize);
sws_freeContext(ctxt);
We now have our decoded picture in RGB format in destPic variable. But we cannot directly access destPic.data like a RGB32 data. To do this we will copy data in destPic to our own BYTE array for easy use and processing.
uint8_t *raw_data;Now our raw_data byte array contains each pixel of decoded data.
int raw_data_size=avFrame->height*avFrame->width*4; //each pixel is 4 bytes
raw_data=(uint8_t)malloc(raw_data_size);
av_image_copy_to_buffer(raw_data,raw_data_size,destPic.data,destPic.linesize,
PIX_FMT_RGB32,avFrame->width,avFrame->height,1)
Comments
Post a Comment
Share your wisdom