Teaching Activity in SIPL


Project Title: 
Content insertion into H.264 coded video –           supporting B-frames
 

Students:                                  Yan Michalevsky


Supervisor:                               Tamar
Shoham

Semester Registered:                    2009 (Winter)

Submission Date:                       May 2009
 
SIPL Archive number:                 P 17-2-07


Abstract


H.264, the modern ITU standard for video coding, has become increasingly popular, offering solutions for many applications requiring video compression. In some of these applications there is a need to insert content into an already compressed video. This operation incurs high computational cost if a naïve approach is taken. Therefore, the concept of reusing encoding information, called “Guided Encoding”, was developed in SIPL.

In this project, we extend this technique and apply “Guided Encoding” to the Main Profile of H.264 to support features such as Bi-directional prediction, weighted prediction and Direct encoding mode.

The result is a set of recommendations and algorithmic pointers, as well the implementation of the proposed solution within the H.264 reference software. Evaluation of our solution shows a significant improvement in run-time compared to the naive approach.

General scheme of a H.264 encoder

The encoding process of H.264 coded video is illustrated in general by the following block diagram:

Content Insertion – General Description

In a previous work on MPEG-2 encoded video streams, the content insertion was performed in the transform domain (DCT in case of MPEG-2). In the case of H.264 the transform calculation itself is negligible and what takes most of the time is choosing the optimal mode for encoding each macroblock. Therefore the content insertion is done in the Pixel Domain.

The inserted logo (that is how we shall call the inserted content from now on) may be opaque or transparent to a degree defined by the transparency factor  specified as part of the input to the encoder. The values of are real values in the range [0, 1] when 1 indicates an opaque logo and 0 indicates total transparency.

(Based on [4])

Bellow is an example of logo insertion into a frame with transparency factor 0.5:

Short Review of Guided Encoding

A thorough description of the original guided encoding concept can be found in [4].  We will provide a short review without going into too many technical details that are out of the scope of this project.

Our method of performing efficient encoding of video along with content insertion relies on reusing the information that exists in the original encoded video. An encoded H.264 stream contains many useful parameters which are the results of the decision making process performed while encoding the original video. These are the reference frames chosen for each predicted frame, encoding modes chosen for each macroblock, the motion vectors calculated for each block, prediction direction and weighted prediction alpha factor. To extract these parameters we first run a modified version of the JVT decoder (an open-source reference implementation of an H.264 decoder by ITU) that writes picture level information and macroblock level information to a file in a textual format. We call this version of JVT Negev Decoder. Its textual output containing the encoding parameters is called Negev Data.

To insert content into the video sequence we operate on the YUV sequence decoded by the Negev Decoder. We run a modified version of the JVT encoder (that we shall call Negev Encoder) that takes the decoded video, the YUV sequence of the content to be inserted and the Negev Data file as its input and produces an encoded H.264 stream, combining the new content with the original.

Encoding parameters for the area in each frame that contains new content has to be fully recomputed since the old parameters are no longer relevant. This are is called directly affected. We also have to recalculate encoding parameters for macroblocks that are indirectly affected by the content insertion, since they use directly affected macroblocks as their prediction references, and the reference information is of course no longer relevant for prediction as well.

This project was implemented based on the source code of JVT JM ver. 11. 

 

Demonstration of guided encoding

 

Original video sequence:

 

 

Content to be inserted into video sequence:

 

 

Encoded video sequence with inserted content:

 

 

(Apology: this demo uses animated GIF images and therefore doesn't represent the real quality of the encoding)

 

Run-time improvement

 

\s

 

We can see here a significant improvement in run-time for all tested modes. The average time-reduction for different modes is presented bellow:

for Bi-predictive coding without Direct mode – 77%

for Temporal Direct – 86%

for Spatial Direct – 75%.

 

Summary and conclusions:

Applicability to Main Profile features

The concept of "guided" encoding was proved to be applicable to Main profile features, starting with simple ones, like Weighted Prediction and ending with more complex features, like Direct Mode. Furthermore, this idea seems to be applicable to other profiles as well and possibly to other codecs.

Improvement in run-time

By applying the "guided" encoding method to B-frames we were able to speed up their encoding significantly. We perform about 5 times faster (subject to specific video characteristics) in comparison to the naïve approach. The most significant improvement was achieved for Temporal Direct – average run-time reduction was 86%. For Bi-directional prediction without Direct macroblocks an average improvement of 77% was achieved and 75% for Spatial Direct mode.

No degradation in quality

We managed to offer solutions that speed up encoding without loss in quality, and in that sense met an important initial requirement for the project. We also offered a sub-optimal solution in terms of quality for Spatial Direct encoding, leaving the run-time against video quality dilemma to the user.

Bit-rate

Obtaining much better run-time we bear the penalty of bit-rate. We should notice that the increase in bit-rate is actually not controllable since the information in the inserted content may well be much denser than in the original video. To have a predictable and controlled bit-rate, rate control functionality has to be incorporated into the Negev Encoder.

 

References

[1] "H.264 and MPEG-4 Video Compression", Iain E. G. Richardson.

 

[2] "Documentation of code for NEGEV within JVT JM11.0 code", Tamar Shoham.

 

[3] "Direct Mode Coding for Bipredictive Slices in the H.264 Standard", Alexis Michael Tourapis, Feng Wu and Shipeng Li, IEEE transactions on circuits and systems for video technology, Vol. 15, no. 1, January 2005.

 

[4] H.264 content insertion in the coding domain, Dan Vardi and Yuval Bymel.

 

[5] Main Profile H.264 Content Insertion in the Coding Domain, Natan Goldfarb and Ori Rottenstreich.

 

[6] Compressed content insertion into H.264 video, Matan Ziv and Doron Brot.

 

[7] Logo insertion into compressed video, Asaf Tzabari and Itai Shpak.

 

Related Documents

For more information:              Report (pdf)

                                                      Power Point presentation (pdf)

                                           Project Poster (pdf)