|
Project Title: Content insertion into H.264 coded video – supporting B-frames
Students: Yan Michalevsky
Supervisor:
Tamar Shoham
Semester
Registered:
2009 (Winter)
Submission Date:
May 2009
SIPL Archive number:
P 17-2-07
Abstract
H.264, the modern ITU standard for video coding, has become increasingly
popular, offering solutions for many applications requiring video
compression. In some of these applications there is a need to insert
content into an already compressed video. This operation incurs high
computational cost if a naïve approach is taken. Therefore,
the concept of reusing encoding information, called “Guided
Encoding”, was developed in SIPL.
In this project, we extend this technique and
apply “Guided Encoding” to the Main Profile of H.264 to support
features such as Bi-directional prediction, weighted prediction and Direct
encoding mode.
The result is a set of recommendations and algorithmic
pointers, as well the implementation of the proposed
solution within the H.264 reference software. Evaluation of
our solution shows a significant improvement in run-time compared to
the naive approach.
General scheme of a H.264
encoder
The encoding process of H.264 coded video is illustrated in
general by the following block diagram:

In a previous work on MPEG-2 encoded video streams, the
content insertion was performed in the transform domain (DCT in case of
MPEG-2). In the case of H.264 the transform calculation itself is
negligible and what takes most of the time is choosing the optimal mode for
encoding each macroblock. Therefore the content
insertion is done in the Pixel Domain.
The inserted logo (that is how we shall call the inserted
content from now on) may be opaque or transparent to a degree defined by
the transparency factor specified as part
of the input to the encoder. The values of are real values in the range [0, 1] when 1 indicates an
opaque logo and 0 indicates total transparency.

(Based on [4])
Bellow is an example of logo insertion into a frame with
transparency factor 0.5:

Short Review of Guided
Encoding
A thorough description of the original guided encoding
concept can be found in [4]. We will
provide a short review without going into too many technical details that
are out of the scope of this project.
Our method of performing efficient encoding of video along
with content insertion relies on reusing the information that exists in the
original encoded video. An encoded H.264 stream contains many useful
parameters which are the results of the decision making process performed
while encoding the original video. These are the reference frames chosen
for each predicted frame, encoding modes chosen for each macroblock, the motion vectors calculated for each
block, prediction direction and weighted prediction alpha factor. To
extract these parameters we first run a modified version of the JVT decoder
(an open-source reference implementation of an H.264 decoder by ITU) that
writes picture level information and macroblock
level information to a file in a textual format. We call this version of
JVT Negev Decoder. Its textual output containing the encoding parameters is
called Negev Data.
To insert content into the video sequence we operate on the
YUV sequence decoded by the Negev Decoder. We run a modified version of the
JVT encoder (that we shall call Negev Encoder) that takes the decoded
video, the YUV sequence of the content to be inserted and the Negev Data
file as its input and produces an encoded H.264 stream, combining the new
content with the original.
Encoding parameters for the area in each frame that contains
new content has to be fully recomputed since the old parameters are no
longer relevant. This are is called directly affected. We also have to
recalculate encoding parameters for macroblocks
that are indirectly affected by the content insertion, since they use
directly affected macroblocks as their prediction
references, and the reference information is of course no longer relevant
for prediction as well.
This project was implemented based on the source code of JVT
JM ver. 11.
Demonstration of guided encoding
Original video sequence:

Content to be inserted into video
sequence:

Encoded video sequence with
inserted content:

(Apology: this demo uses animated
GIF images and therefore doesn't represent the real quality of the encoding)
Run-time improvement
\s
We can see here a significant improvement in run-time for
all tested modes. The average time-reduction for different modes is
presented bellow:
for Bi-predictive coding without Direct mode – 77%
for Temporal Direct – 86%
for Spatial Direct – 75%.
Summary and conclusions:
Applicability to Main
Profile features
The concept of "guided" encoding was proved to be
applicable to Main profile features, starting with simple ones, like
Weighted Prediction and ending with more complex features, like Direct
Mode. Furthermore, this idea seems to be applicable to other profiles as
well and possibly to other codecs.
Improvement in run-time
By applying the "guided" encoding method to
B-frames we were able to speed up their encoding significantly. We perform
about 5 times faster (subject to specific video characteristics) in comparison
to the naïve approach. The most significant improvement was achieved
for Temporal Direct – average run-time reduction was 86%. For
Bi-directional prediction without Direct macroblocks
an average improvement of 77% was achieved and 75% for Spatial Direct mode.
No degradation in quality
We managed to offer solutions that speed up encoding without
loss in quality, and in that sense met an important initial requirement for
the project. We also offered a sub-optimal solution in terms of quality for
Spatial Direct encoding, leaving the run-time against video quality dilemma
to the user.
Bit-rate
Obtaining much better run-time we
bear the penalty of bit-rate. We should notice that the increase in
bit-rate is actually not controllable since the information in the inserted
content may well be much denser than in the original video. To have a
predictable and controlled bit-rate, rate control functionality has to be
incorporated into the Negev Encoder.
References
[1] "H.264 and MPEG-4 Video Compression", Iain E. G. Richardson.
[2] "Documentation of code
for NEGEV within JVT JM11.0 code", Tamar Shoham.
[3] "Direct Mode Coding for Bipredictive
Slices in the H.264 Standard", Alexis Michael Tourapis,
Feng Wu and Shipeng Li,
IEEE transactions on circuits and systems for video technology, Vol. 15,
no. 1, January 2005.
[4] H.264 content insertion in
the coding domain, Dan Vardi and Yuval Bymel.
[5] Main Profile H.264 Content Insertion in the Coding
Domain, Natan Goldfarb and Ori
Rottenstreich.
[6] Compressed content insertion
into H.264 video, Matan Ziv
and Doron Brot.
[7] Logo insertion into
compressed video, Asaf Tzabari
and Itai Shpak.
Related Documents
For more
information:
Report (pdf)
Power Point presentation (pdf)
Project Poster (pdf)
|