Xvid MP4 AVI and SMI Files

Another type of MPEG4 video with an SMI subtitle file.

Before we dive head first into this file, it’s worth mentioning that due to the huge array of possibilities when trying to decipher an encoded piece of video, it may not be necessary to do everything, every time! You may be able to do what you want, quickly and easily. There may be other times though that something doesn’t look quite right and, in order to identify whats going on, you need to do a bit of research.

The disk contains two files. They both have the same name, but different extensions. One an .avi and the other a .smi. Dropping the .avi directly into SMplayer, results in the video playing with a subtitle overlay displaying the date / time information. The .smi is the subtitle timecode.

Xvid Mp4 file playing in SMplayer with associated subtitle

Xvid Mp4 file playing in SMplayer with associated subtitle

Although it will play, there appears to be some black video at the start. Upon further review inside Virtualdub, there are a series of blank frames at the beginning. If I have to deal with the video, do I have to deal with these as well? To figure out what is going on, we need to take a closer look.


We can now see that although the file extension and structure is the same as those files discussed here, the mpeg4 format is different.

To transcode the video and subtitle together, it is possible to use either ffmpeg or virtualdub.

For Virtualdub, you need either Subtitle Workshop or Subtitle Edit

I have started use Subtitle Edit more as it does not require installing. Once you have loaded your .smi subtitle into Subtitle Edit, simply save it in a .ssa format.

SubTitle Edit

SubTitle Edit

From reviewing the subtitle, we can see that each time display should last exactly 1 second.

Load the .avi file into Virtualdub then add the subtitler filter. This filter is already in the virtualdub included in my Software Pack. If you need to install it yourself, you can find it HERE. This filter will then ask you for your newly created .ssa file. Once this is in, your subtitle will appear over the video. You can then clip and output the video for any use and the date / time information will be on the top of the video.

The ffmpeg route is usually quicker as there is no need to convert the .smi file to .ssa format first. This is what I chose in this instance.

My resulting video had the date / time code on, however I was missing 1 frame. The original was 22500 and my new video with timecode was 22499!

What’s happened?

It all comes down to how the original Mpeg4 has been encoded. It’s plainly obvious from the uniformity of the file, and the xvid encoding, that this is a transcode from what was originally recorded by the DVR. Understanding the encoding reveals the answer.

Gspots Visualisation of the GOP structure

Gspots Visualisation of the GOP structure

Upon examining the GOP structure, a number issues present themselves. Of the 8988 coded frames, there is a pattern of coding within each GOP. The small blue lines in a frame indicate the presence of an N-VOP. These are duplicate markers. By moving through the GOPs, It is possible to see that the sequence changes, although the size of GOP stays the same.

We need to delve deeper!

The usual FFprobe reports were obtained using:

ffprobe -show_streams -count_frames -pretty inputfile.avi > inputfile.txt
ffprobe -show_frames -print_format xml inputfile.avi > inputfile.xml

The first one confirmed some of the data reported by GSpot and the second gave a detailed table of every frame.

I also then created an Avinaptic report.

This gave me something to start with:

[ Video bitstream ]

Bitstream type: MPEG-4 Part 2
Note: it seems like the video doesn’t start with a keyframe
Packed bitstream: No
QPel: No
Interlaced: No
Aspect ratio: Square pixels
Quant type: H.263
Total frames: 22,500
Drop/delay frames: 11
Corrupt frames: 0

I-VOPs: 899 ( 3.996 %) #
P-VOPs: 8089 ( 35.951 %) #######
B-VOPs: 0 ( 0.000 %)
S-VOPs: 0 ( 0.000 %)
N-VOPs: 13501 ( 60.004 %) ############

It detected 11 Drop / Delay frames and there were the 11 frames at the start that were blank. It’s also handy to note that it does not start with a keyframe.

By comparing this to the FFprobe and Gspot reports, gives us the information we need.

A visualisation of how each frame corresponds to each other. The colouring relates to the Gspot output above.

A visualisation of how each frame corresponds to each other. The colouring relates to the Gspot output above.

The full ffprobe report details that each frame should be .40ms duration. When reviewing the presentation timestamps, the majority of durations between frames are longer than .40ms.


Initially, it starts with an encoding pattern of:

  • Coded Picture 1 – 120ms
  • Coded Picture 2 – 120ms
  • Coded Picture 3 – 120ms
  • Coded Picture 4 – 120ms
  • Coded Picture 5 – 120ms
  • Coded Picture 6 – 80ms
  • Coded Picture 7 – 40ms
  • Coded Picture 8 – 120ms
  • Coded Picture 9 – 40ms
  • Coded Picture 10 – 120ms

Total = 1 second

The pattern changes a number of times through the encoding but from using the last GOP, I can see that the pattern should end halfway through a GOP with a P frame and then a duplicate.

The missing frame is the last duplicate. As I used ffmpeg to transcode a new file with the subtitle hard burnt into the video, it ignored the AVI header detailing 22500 frames and purely dealt with the MPEG4 stream. As a result, it gets to the last true frame and then stops.

I can now go ahead and use my new video with one frame less as I can answer the question of where it is, and importantly, why it’s missing!

The analysis of this file type raises again the issue of frame timing and motion.

From analysis of the timecode held in the subtitle file, it can be seen that there are 903 seperate 1 second entries. This makes the timecode cover a period of 15mins and 3 secs. As the footage is 15mins in length – what is with the extra 3 seconds? Should the video be that length and the transcoding has ‘evened’ it out? Should the footage start 3 seconds later and then end when the subtitle ends? The non uniform presentation order is also important if you are observing motion.

The blank frames are important to note as some transcoding programs will produce a fault if the first frame is not a Keyframe. The initial blank frames appear to have been used to ‘pad’ the video out so the first subtitle can be the 0-1 second. This is due to the first keyframe being in the middle of a second.

The usual ffmpeg command of:

ffmpeg -i inputfile.avi -vcodec rawvideo -vsync drop -fflags genpts newfile.avi

…results in a clean, image only, avi consisting of true frames only – No Duplicates, and no blank frames at the start!


As we have our ffprobe report of the original, it is possible to create a new clean file ourselves. The true duration from start to finish, based upon the data is 899.6 seconds. Using the math, this gives a 9.99 FPS rate.

To wrap up,

  • We can transcode and add a subtitle in Virtualdub.
  • We can transcode and add a subtitle in ffmpeg.
  • We can analyse the video to understand its timing structure.
  • We can remove the duplicates.
  • We can cross reference real frame numbers to the subtitle times.
  • We can assess the reliability of the times given, and the motion observed, by identifying the encoded frame durations.

This has been quite a difficult one to explain but, as always, I hope some of this helps in your understanding of the video.

By Spreadys Posted in EEPIP

One comment on “Xvid MP4 AVI and SMI Files

  1. Pingback: More on Samsung’s DX50 AVI and SMI files | Spreadys.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s