While a majority of the video sources were originally created for film, the video standards you will encounter are formatted to broadcast standards. To better understand how to manipulate the footage, you will need to understand the different terms that describe features of broadcast standards. In order to understand the basic structure of these standards, we will examine standard definition (SD) formatted footage from a DVD disc that uses the broadcast standard NTSC (National Television System Committee). Here is a description of video formatted to this standard:
Aspect Ratio: 4:3
Color Space: YUV
Color Matrix: Rec.601
Frame Rate: 29.97
Frame Type: Interlaced
Compression Format: MPEG-2
To understand everything about this description we will break down each part and explain its significance.
The resolution of a video gives you the dimensions and total pixel count of every frame (or image) displayed. A video that is 720×480 is 720 pixels wide by 480 pixels tall and has a total pixel count of 345600.
The aspect ratio is very simple: it’s the lowest common denominator of the width and height. All standard definition video is transmitted at a resolution of 720×480, which has an aspect ratio of 3:2, but almost all video you will encounter is intended to be displayed as fullscreen (4:3) or widescreen (16:9). Since all standard definition videos are transmitted at the wrong aspect ratio, additional information (a flag) in the video signals that will tell the playback device what the correct aspect ratio is. The playback device uses the aspect ratio flag so it will know how to resize and display the images the way they were intended to be viewed. When editing standard definition video it is important to know the proper aspect ratio so you can avoid distortion when resizing. When resizing the video, fullscreen sources are changed to 640×480, while widescreen is typically resized to either 856×480 or 848×480. While the proper resolution should be 853.3~x480, resolutions are required to be divisible by either 4 or 8, so the two resolutions provided are the closest options you have.
Color space deals with how the color information is stored and later decoded. While computers display RGB (Red Green Blue) color space, industry standard video is distributed and broadcast using YUV color space. Both of the color spaces consist of 3 layers that are combined to give you the color images you see. YUV color space was developed during the process of moving from black & white to color broadcasts. The Y layer is the luma (or brightness) layer which when displayed by itself appears to be black & white, and the U & V layers contain the color information. This type of signal allowed for black & white TVs to read just the information for the Y layer and discard the other information, while the color TVs would read all the layers and display color videos. This is not the only advantage that YUV has; it also allows for better compression of video signals to save on bandwidth in transfer or broadcasting. While the Y layer needs to maintain detail, the UV layers can be heavily reduced and distorted and the human eye will not even notice. This make the YUV color space ideal for storage and transmission of video where space or bandwidth is limited. There are different variants of YUV color space, but the only one you will need to be concerned with is YV12 (YUV 4:2:0).
The purpose of a color matrix is to allow proper storage and retrieval of RGB color information when using a YUV colorspace. You don’t necessarily need to understand how color matrices work, but it is important to use the correct one to avoid color distortion. There are only two color matrices you will need to know about: the ones from the Rec.601 and Rec.709 standards. Standard definition footage follows the recommendations of the Rec.601 standard and uses a set color matrix for storing the color information. High definition footage follows the Rec.709 recommendations which use a different color matrix used for storing the color information. AviSynth by default follows the Rec.601 standard for interpreting the color information in YUV footage, so if you are converting high definition footage with it you will need to specify the Rec.709 color matrix when converting to RGB. You will also need to specify the Rec.709 standard when converting from RGB back to YUV for encoding.
Frame Rate & Interlacing
These two aspects of the video footage will be discussed together as they relate one another. The frame rate of the video is simply the number of frames displayed during one second of video playback. Unlike high definition, which has different frame rates allowed in its standard, NTSC only allows 29.97 frames per second, which poses a problem for footage that was animated or captured at 23.976 frames per second. Simply adding frames to 23.976fps footage to make it display 29.97fps would cause an uneven number of frames to be added and would result in jerky footage during playback. This is where interlacing comes in handy and allows for displaying of 23.976fps content on a 60Hz TV set without causing jerkiness in the motion. The term for converting 23.976fps progressive video to 29.97fps interlaced video is called telecine.
Each frame of footage has what are called fields, and there are two fields in every frame. One of the fields consists of every even line of horizontal pixels and the other is every odd line of horizontal pixels. Standard definition footage is displayed on TV sets at 60Hz, or 60 frames per second, by display the two fields in each frame one at a time. This way of storing frames creates a problem for editing, because information of one frame may end up being split between two frames of interlaced footage. To edit with this footage it is import to reassemble the footage back to single image frames at the correct frame rate. This type of video is known as progressive and is the ideal to work with. The process of restoring progressive video from the interlaced source is known as inverse telecine or IVTC. Interlaced footage is less of a concern when using high definition footage as it is rarely used outside of TV broadcasts.
Compression Format & Codecs
A compression format is a set of standards that dictate how video information is stored digitally. AVC (h.264) is an example of a compression format and there are many encoders and decoders that will process video according to this specification. A codec is used for encoding and decoding digital video to a specific video compression. Some codecs, such as xvid, encode digital videos according to a specific compression format, which makes it is possible to decode them using other similar codecs that follow the same standard. Other codecs, such as Lagarith, have a unique video compression that requires the specific codec to be installed to decode those videos.
When compressing there are two methods to choose from, lossless or lossy, and it is important to know the difference between the two. Lossless compression, such as Lagarith, is practically identical to the uncompressed version once it is decompressed and typically achieves a much smaller size than its uncompressed counterpart because it avoids storing redundant information. Lossy compression, such as h.264 and xvid, takes reduction of file size further by also reducing the detail in the video. When it is necessary to convert video to edit with, it is highly recommended that you select lossless to avoid degrading the video quality and losing detail.
Multimedia containers are used to store audio and video in a single file. Some containers, such as MP4, are developed to hold video and audio encoded to specific compression formats. Other containers, such as MKV, allow for a variety of audio and video formats. It is possible to move audio and video from one container to another without needing re-encode, assuming the destination container supports them. This is useful when a particular software supports reading multimedia encoded to a specific compression format but not the container in which it is stored.
Term Reference Guide:
Frame – One image of a sequence of images that make up a video.
Frame Rate – The number of frames shown every second in a video.
FPS – The abbreviation of Frames Per Second, which indicates the frame rate of a video.
Field – The entire collection of even or odd rows of pixels in a video.
Field Order – The order in which the video decoder displays the fields.
Progressive – Video where each frame is a single image displayed all at once.
Interlacing – Method of doubling the perceived frame rate by display fields individually.
Telecine – The process of converting progressive video to interlaced video.
Inverse Telecine – The process of restoring progressive video from interlaced video.
IVTC – The abbreviation of Inverse Telecine.
Aspect Ratio – The shape in which video data should be displayed.
Color Space – The way that video color information is stored.
Compression – Encoding data so that the information stored is less than the original.
Compression Format – Set of specification for how multimedia data should be digitally stored.
Codec – Software that is used to compress and decompress a particular compression format.
Lossless – Compression method that avoids restoring redundant data to reduce the file size.
Lossy – Compression method that discards less noticeable information to reduce the file size.
VapourSynth 101 >