There are two types of redundancy present in a vide signal -- temporal and spatial. Temporal redundancy means that if a pixel is red, the ones near it are quite likely to also be red. Spatial redundancy means that if a pixel is red, the same pixel in the next frame is also likely to be red.
The spatial redundancy is taken care of exactly like it is in JPEG. You break the signal up into its frequency components, and throw away the ones that you are not likely to notice. The "brightness" information is also kept at its full resolution, but "color" and "saturation" values are reduced in resolution because the eye is not as sensitive to these things.
Here is how temporal redundancy is handled. Out of every nine frames, basic MPEG compression fully compresses one of them using JPEG. An additional two of these nine frames are simply stored as error differences since the last full frame. The other six frames can reference either a previous or a future full frame and encode the error. This means that if you have a static shot of a person talking, the background might not move at all, you there would be NO error for those areas. Then, you only need to encode the slight head and lip motion of the actor speaking.
Further MPEG improvements involve motion compensation. You look for a block of pixels that moves around in the frame, and also encode its location and movements. This can greatly reduce the size, but takes a LOT of processing power to look for the movement.
I hope that this helps.