Temporal Rate Conversion
Dave Marsh. Microsoft Technical Evangelist, TV and Video
Updated: December 4, 2001
On This Page
This paper is of great relevance to anyone who wants to display video-originated material on desktop CRT displays at a flicker-free refresh rate (such as 75Hz) with good quality. It is of slightly less importance to people building systems with displays intended for a viewing distance of something like 10 feet for the 60Hz market or for people that plan to use flat-panel desktop displays. It is also of less importance if you are planning only to display film-originated material at high quality and are happy with the video-originated material being juddery.
The nature of video and how it's displayed
Temporal rate and its history
Temporal rate definition
It is best when quoting a TV standard to use the temporal rate figure because this is the rate that the motion was sampled at and is the figure that has the greatest bearing on how the video is subsequently processed.
It is worth noting that in an existing NTSC video-originated interlaced signal, even though the frame rate is only 30Hz, the temporal rate is actually 60Hz and this is the rate at which the motion in the original scene was originally sampled. In this case, however, only a half-resolution image is captured on each temporal sample. Video-originated NTSC should be quoted as "Interlaced active lives with a temporal rate of 60 Hz." To help stress the full consequences of interlace, it is common to actually quote the NTSC format as 480i30, where 30 refers to the frame rate, but this is very misleading when trying to assess the temporal rate conversion problem. In this paper, the NTSC format will be referred to as 480i60, where the 60 refers to the temporal sampling frequency, otherwise known as the "Temporal Rate."
480i60 - Interlaced sampling of ball movement. Temporal sampling rate is 60Hz.
480p24 - Film sampling of ball movement. Temporal sampling rate is 24Hz.
Describing film-originated NTSC material is even more confusing and should best be quoted as "480p24 pseudo-interlaced using 3:2 pulldown." The important figure to quote is the rate at which the motion in the original scene was captured, for example, in the film case, 24Hz. Everything else is really just an implementation detail.
480p60 - Progressive 60Hz sampling provides the best of both worlds for sports action.
The origins of the 60Hz (and 50Hz) field rate
The good thing about the existing field rate of 60Hz (and to a lesser extent, the 50Hz European standard) is that it allows the analog transmission bandwidth to be kept to a minimum and yet is not so low that the picture would fail to do a reasonable job of portraying motion or be seen as excessively flickery when viewed on the small screened TVs envisioned back in the 1930s when the TV system was being designed. Ideally it would have been good to have a higher field rate (above 70Hz), but analog transmission bandwidth was (and still is) too expensive to make this practical.
When TV was being designed in the 1930s and electronics was in its infancy, it was difficult to design oscillator and power regulation circuits, so it was necessary to make the TV field rate the same as the power rate. This power frequency had been arrived at back in the Victorian era, because it was an efficient rate to run a power generation turbine and worked well for the transformers necessary for power distribution. From the 1970s onward, with the advent of modern electronics, the requirement to base TV designs on the power frequency went away, but of course the TV standard was well established by then.
60Hz generators, so 60Hz video.
The current video standards are well entrenched
The following is a run-down of the video standards currently used in each geographic area. Note that each area actually uses two distinct standards that at first sight seem the same, but are actually very different. Some TV shows actually consist of a mixture of the two. (The number of lines in the NTSC signal is simplified to 480, even though it typically has a couple extra. The nominal 60Hz rate, where appropriate, is stated as its correct frequency of 59.94Hz, because this has relevance in discussion of temporal issues.)
480i59.94Hz (video-originated, that is, scene sampled at 59.94Hz). 480i59.94Hz (film originated, that is, scene sampled at 24Hz, changed to 59.94 using 3:2 pulldown run fractionally slow). In many ways this signal is best regarded as 480p23.47Hz.
480i60Hz (video-originated, that is, scene sampled at 60Hz). 480i60Hz (film originated, that is, scene sampled at 24Hz, changed to 60 using 3:2 pulldown). In many ways, this signal is best regarded as 480p24Hz.
The issue of display rate needed to avoid flicker is independent of the rate needed to describe the motion in a scene
Flicker refers to the eye's perception of flashing light
CRT displays have an impulse light output characteristic
CRTs have an impulse light output characteristic, that is, light is given out only by the dot from the electron beam and from a short trail of glowing phosphor behind the moving dot. The phosphor light output from each phosphor target decays away in something like 50 microseconds after the beam has passed it. The persistence needed to turn the moving dot into an image is provided by your eye. Effectively, the moving spot does an impulse update of the image that is already on your retina. Because of eye tracking, the image being updated is a stationary image even if the actual object is moving. This is how you are able to see detail even in moving objects.
An ideal TV system would use impulse sampling of the scene, using an electronic camera with a fast shutter, and a display at the other end that also has an impulse response. Because of this, CRTs still produce the sharpest picture quality and dynamic resolution when compared with the newer flat-panel technologies. Obviously, there are other advantages with flat-panel displays that can make up for the lack of dynamic resolution, such as the ability to hang them on the wall. The other thing to note is that the impulse characteristic of a CRT, which produces good dynamic resolution, is also what causes flicker. To avoid this, you need a faster refresh rate than 60Hz, which means you run into the judder picture quality issue associated with trying to process a 60Hz signal into something faster.
Flat-panel displays have a sample-and-hold characteristic
All of the newer display technologies such as LCD, plasma, DLP, and so on, have essentially a sample-and-hold characteristic. When a pixel is addressed, it is loaded with a value and stays at that light output value until it is next addressed. From an image portrayal point of view, this is the wrong thing to do. The sample of the original scene is only valid for an instant in time. After that instant, the objects in the scene will have moved to different places. It is not valid to try to hold the images of the objects at a fixed position until the next sample comes along that portrays the object as having instantly jumped to a completely different place.
Your eye tracking will be trying to smoothly follow the movement of the object of interest and the display will be holding it in a fixed position for the whole frame. The result will inevitably be a blurred image of the moving object.
Sample and hold pixel characteristic causes blur.
The good thing about displays with a sample-and-hold characteristic is that they do not produce any flicker when driven at 60Hz. This is because the sample values from the video signal are held for the entire frame time, rather than being just flashed onto the screen. The fact that they can be driven at the same rate as the video source is very significant because it avoids the need for temporal rate conversion.
Leaving aside the temporal rate conversion difficulties, displays with a sample-and-hold characteristic, such as LCD and plasma, would produce better motion portrayal if operated at rates above 60Hz. Flat panels are normally run at 60Hz, because it is perceived that this is all you need to do since there is no flicker problem. The reality is that a faster update rate would be beneficial in order to reduce the blurring effect associated with the sample-and-hold characteristic. Pixels with a sample-and-hold characteristic effectively extend what should have been an instantaneous sample into a constant value that lasts for a whole frame. The result of this is motion smearing. This smearing is reduced if you can update the sample and hold circuits more often with new sample values.
There are two reasons why LCD and plasma displays almost all currently operate at 60Hz. Least importantly, the drive electronics only goes that fast. More importantly, there is no standardized faster rate, and the drive electronics designs are all designed for single frequency operation. LCD manufacturers would like it if there were a higher rate standard such as 75Hz, but they won't use it, as there is no standard.
The sample-and-hold pixel characteristic is also the reason why you cannot feed a flat-panel display with an interlaced signal. If you did, you would end up with both fields displayed at once, which would be seen as "feathering" (or "mice teeth") on vertical edges that are moving horizontally. If the movement between fields is far enough, then you actually see two separate images of the object.
Flicker on CRTs is seen on large areas of uniform bright colors
The amount of perceived flicker increases as the light output from the CRT increases
The amount of perceived flicker on a CRT display increases as screens get larger and wider
The human eye's sensitivity to flicker is determined by approximately a power of 4 law
Above 72Hz on a CRT display, whether flicker is seen, depends on the particular person, but many still see flicker if the rate is less than about 85Hz
If you sit close to a low refresh rate CRT screen, as when working on a desktop PC, you will see considerable flicker because of the high subtended angle and the high light output reaching your eye
PC graphics often has large areas of bright white and this causes considerable flicker on a CRT display at 60Hz
Nobody would buy a PC with a 60Hz CRT display these days
It is acceptable to use 60Hz for CRT displays intended for 10-foot viewing
Temporal rate needed to describe the motion
The temporal rate needed to carry the motion information is a subjective issue and is very dependent on the type of motion being portrayed
What it comes down to is that the steadier the motion, the less the temporal sample rate needed to portray that motion. Fast motion that keeps changing direction, such as the drummer's drum sticks in a heavy metal rock band, needs a very fast temporal sampling rate if it is to be fully portrayed. Motion like this needs a temporal sampling rate of many hundreds of Hertz. Luckily most motion is slower and more linear than this and so can be represented by a slower temporal sampling rate.
Whatever temporal sampling rate you choose, it's unlikely to be fast enough
The sampling theorem specifies that you need to sample at a minimum of twice the maximum frequency present in the signal
The theorem applies to all things being sampled including spatial frequencies (that is, detail in the scenes) and temporal frequencies (that is, the rate at which the objects in the scene are moving). There is no fundamental reason why you must avoid aliasing, but it is important to understand its consequences and artifacts.
The temporal sampling rate used by video is not fast enough to avoid temporal aliasing
Quality of existing TV is OK because of eye tracking
If the temporal rate is excessively low, then the result will be "Temporal Sampling Judder"
This type of judder is commonly seen in film-originated material because a 24Hz sampling rate is far too slow for much of the motion in many scenes. It is often referred to as "film judder." It is often seen in backgrounds when the film camera pans horizontally. It is also responsible for the amusing artifact in Westerns, where wagon wheels are seen to go backwards.
Professional film cameramen and other cinematographers try hard to avoid motion in the scenes that would cause Temporal Sampling Judder. Typically, the camera is accurately panned to follow the moving object of interest, thereby making it stationary relative to the camera picture. Also, a small depth of focus is used to avoid judder in the background as the camera pans past. Another factor that helps is the temporal characteristic of the film camera that keeps each film frame exposed for about half of each frame period, thus introducing some temporal smear. This is very different from CCD video cameras that have a very fast electronic shutter.
This Temporal Sampling Judder is one of the components of "the film look" and most people have become used to it, so it's not particularly annoying. Typically (given a constant bandwidth transmission medium) the loss in temporal resolution is made up for by a corresponding increase in spatial resolution.
Another form of judder is "3:2 pulldown judder"
Although 3:2 pulldown judder is found annoying by many Europeans (who are used to seeing movies converted to 50Hz by consistently repeating each film frame twice), people in 60Hz countries are used to it and don't find it particularly annoying. Effectively a 3:2 judder filter has been learned from childhood.
Temporal rate conversion
Why temporal rate conversion judder occurs
It's all due to temporal aliasing
Any processing of the video signal also needs to track the motion if it is not going to be confused by the temporal aliasing and therefore cause annoying judder.
Judder is the brain's way of saying: "What was that ?"
Judder is most noticeable on camera pans
Small difference frequencies cause the most judder
Film-originated material is OK
This film-originated 60Hz material can be upconverted to, say, 75Hz without incurring standards conversion judder, because the material is really just 24Hz material that has been over-sampled at 60Hz. The actual difference frequency is 51Hz, not 15Hz.
Don't be fooled by demos that use film-originated material to "prove" to you that their system has no temporal rate conversion judder.
Receiver factors affect the amount of perceived judder
Noise in the picture helps mask the problem. Obviously this is not the solution to the judder problem, particularly as we move to clean digital signals.
Poor video circuitry, such as currently used in PCs, causes soft pictures, helping to mask the problem. The video being fed to the PC's graphics subsystem for field rate conversion has very little high frequency detail, that is, it tends to be blurred. The aim now is to provide TV quality on a PC that is considerably better than a consumer TV, and therefore it is essential that the temporal rate conversion problem is fully understood and a solution found.
The characteristics of the display also affect the amount of perceived judder
The amount of motion judder that you get on a CRT display when changing the temporal rate also increases as the size of the display increases. If the picture is large (and you stay the same distance away), then the judder will be more noticeable as the distance (angle of view will be greater) over which it judders.
When feeding a high refresh rate signal such as 75Hz that has been linearly converted from a 60Hz source into an flat-panel display, you get less judder than feeding that same signal into a CRT display. The flat-panel display acts as a temporal post filter and is able to reduce the judder. Of course, all that is really happening is that blur and smear are being substituted in place of the judder, but even so, the results can be quite acceptable. A good way to test this is to use an LCD light-valve projector, because it is not restricted to single-frequency operation. It has a thin film of LCD material onto which is written an image using a CRT projection tube.
One way of reducing judder in CRTs would be to increase the persistence of the phosphors from the current time of about 1 line period to closer to a field or frame time. Currently a typical phosphor will decay to about a third of its peak value in about 50ms. The problem of course in increasing its persistence is that this temporal lag would cause smearing, thereby destroying the good thing that CRTs have going for them.
The amount of motion judder that you get when changing the temporal rate is dependent on how much motion blur was introduced in the video capture process
When watching fast-shuttered material in its native form on a display that does not introduce motion blur, such as on a CRT display, your eye tracks the motion of the object of interest and sees a crisp image of the object as it moves. The eye is able to do this because the object is stationary on the retina due to the eye moving and tracking as the object moves.
A linear temporal rate converter cannot track motion. If you don't track the motion and you leave the moving objects sharp, then the result will be judder. The only way a linear converter can reduce the judder is by introducing some of the blur that you would have got from a tube-based camera. The resulting video will have slightly reduced judder, but it will have lost the detail that was present in the original captured image.
Unfortunately, as we fix the other video quality problems on the PC, the judder will jump up and bite us
This is why we need to start thinking of solutions now.
Currently used linear temporal rate conversion methods
The least expensive way to change the refresh rate is the "pulldown" method
The same basic method can be used to convert between rates that are not integer multiples. You can write into a frame store at any input rate and clock it out at any other rate. The implementation complexity does, however, increase slightly since it is necessary to double buffer to stop the operation of writing into the frame store being visible on the screen.
Because of the relative simplicity of the pulldown method, it is possible to make it operate faster than the temporal interpolation method (described in the following section). This is important for the fast display scan rates used in today's PCs.
A slightly more expensive way to change the refresh rate is to use linear temporal interpolation
The temporal interpolation method can produce marginally better results than the pulldown method because it allows you to substitute some blur-and-smear in place of some of the judder. The theory is that blur-and-smear is less objectionable than judder, but blur-and-smear needs to be used sparingly. A temporal interpolation converter could blur-and-smear away the judder, but the picture would be unwatchable. In practice, it is only possible to blur-and-smear out about 30% of the judder. The temporal interpolation method does not cure the judder problem; it just tries to make it marginally less objectionable. The problem with it is that because it is more complex to implement, it is difficult (expensive) to get it to operate at the pixel rate required for modern PCs.
Interpolation involves low pass filtering and is an averaging method. To create a required output frame, it averages together various percentages of the surrounding input frames. It is not hard, when thinking about it in these terms, to see why the technique has problems. Consider a video of a man moving his arm down. The first frame shows his arm up high, by the next frame it is half way down, and on the third frame it is down at his side.
Suppose you want to double the frame rate by creating 2 additional frames. The result will be as shown in the picture below, the man will look like he has grown extra arms. Your eye seeing this will no longer be able to properly track the motion of the arm.
Changing the temporal rate using either the pulldown method or the linear temporal interpolation method will produce "temporal rate conversion judder"
Relating this to the sampling theorem, Judder results from the fact that the temporal sampling rate is not two times the rate of the fastest motion in the scene. As explained by Shannon and Nyquist, this results in temporal aliasing. If the field rate is 60Hz, then by the sampling theorem, the maximum movement frequency allowable in the signal being sampled is 30Hz.
Unfortunately, objects move a lot faster than this, so temporal aliasing nearly always occurs. As stated earlier, this is not too much of a problem when a human views the material on a TV using the native video standard, because of the eye's ability to track moving objects. When the eye tracks the motion of the object of interest, the moving object is stationary relative to the eye's retina, so it's as if it were not moving. This means that the temporal aliases are not seen. Unfortunately, when the video signal passes through a linear temporal rate converter, the aliasing causes interpolation theory to break down. The converter cannot tell the aliasing from genuine signals and resamples both to produce the output fields. These multiple alias images are the cause of the perceived judder.
A linear temporal rate converter is faced with a dilemma of whether to keep the annoying judder or to apply considerable low pass filtering to change the object into a low resolution blur as it moves.