psx h4x0rz in teh wired: 2010

Monday, September 6, 2010

PlayStation Video Decoders:
The Final Showdown

Updated from the previous comparison with the lastest versions, and three new decoders!

Most importantly, I finally captured what it ACTUALLY looks like on PlayStation hardware (in the dead center).

Those on top get it (more) correct, those on the bottom get it (more) wrong (and ffmpeg and Q-gears are just weird).

Naturally jPSXdec dominates in quality and accuracy. :)

The lineup:

PSXPlay
a fixed version of PCSX by Gabriele Gorla
jPSXdec
MAME
PsxMC
PSmplay
PsxTulz
reevengi
PCSX
ffmeg
Q-gears

See the download for all the jucy (and technical) details.

Thursday, August 19, 2010

Immaculate Decoding

Just writing a straight-forward PlayStation 1 video decoder has been a lot of work. However, for the absolute most impeccable quality, there is so much more that can be considered in the process.

Upsampling

When PlayStation videos are created, the pixels are broken up into luma (brightness) components and chroma (color) components. Like with JPEG and MPEG formats, 3/4 of the chroma information is thrown away because the human eye can't really tell (this is an example of lossy compression).

When decoding, that lost chroma information needs to be recreated somehow to convert the pixels back into RGB. Unfortunately there is no one 'right' way to do it, because there's really no way to get that lost information back. All you can do is 'guess' by filling in the blanks based on the information around the pixels using some kind of interpolation. Some of the most well known kinds of interpolation are: nearest neighbor, bilinear, bicubic, and lanczos. I've read about more advanced chroma upsampling approaches that also take into account the luma component. This works because there is often a lot of correlation between the luma and chroma components--when the luma changes, the chroma probably will also. I'd like to try to find the best one, but I haven't had much luck on finding many good resources about them all.

Now, because this is essentially just scaling of a 2D image, I've been worried about this article that points out a nasty little gremlin called gamma correction. It seems nearly everyone has been doing image scaling wrong since the popularization of the sRGB gamma corrected color space. I'm assuming video isn't immune to the same problem, yet I've never seen anyone mention it.

Deblocking

Assuming we find the upsampling method of choice, there are still ways the image can be improved. Most video codecs break the frames down into 'blocks', then encode each block separately--again losing information along the way. When everything is reconstructed, that lost information can often be seen as visible distortions between blocks. This problem has been addressed in more recent video codecs such as h.264, but is still a problem with the older MPEG2. I believe nearly all DVD players do some deblocking before showing the final frame.

Even though MPEG2 has been around a long time and deblocking is so common, I've had the darndest time trying to find much mention of what deblocking algorithms are in use today. UnBlock, and this page on JPEG Post-Processing are the best I've come by. I think I've read somewhere that some advanced deblockers can even make use of the original MPEG2 data to improve the deblocking.

I still consider myself a multimedia novice, so there are probably more post-processing methods that would really make the output shine. A big bummer among all research in that area is that if you can think it, you can pretty much count on it been patented.

Given how difficult all this stuff is, I really really wish I could just pass that problem off to the big players in the field, such as ffmpeg (i.e. libavcodec). I've even considered writing a PSX video to MPEG2 video translator so the MPEG2 video can be fed into ffmpeg. Unfortunately there are some big reasons why doing this still makes me uneasy.

IDCT

The PlayStation uses its own particular IDCT approach that I've never seen anywhere else. Given how important it is that the DCT used to encode the video matches the IDCT used to decode, there are no existing decoders that can do it (except jPSXdec of course).

Differences in YCbCr

Another worry is that a real good quality MPEG2 decoder will spatially position the chroma components in the proper location (vertically aligned with every other luma component) as opposed to how I believe PSX does it (the MPEG1 way: in-between luma components).

To make things a bit more complicated, MPEG2 uses the proper Rec.601 Y'CbCr color space with [16-235] luma, and [16-240] chroma range. PSX on the other hand, uses the full [0-255] range for color information. Many video converters don't handle that discrepancy very well. Related to that, pretty much all converters store the data as integers, so any fractional information is lost after every conversion. In contrast, jPSXdec maintains all that fractional information until the very end.

In general though, I have not been impressed with ffmpeg's quality, so I can't suggest people use it when looking for good quality.

One advantage that comes when incorporating all these enhancements in jPSXdec is it provides a much nicer user experience. No need to be hopping between multiple tools to get the best results.

So if I were to actually implement all these features, where would I get the information I lack? Perhaps the doom9.org forums could help. If any multimedia gurus happen to pass by this post, please, if you could, toss some wisdom my way.

Friday, March 26, 2010

YCbCr to RGB Conversion Showdown

In trying to ensure pixel perfect accuracy in my color conversions, I wanted to compare how two popular video converters handle YCbCr to RGB conversion: ffmpeg* and VirtualDub v1.9.8.

The Rec.601 YCbCr to RGB equation is defined as such:

Given Y color range of [16, 235] and Cb,Cr color range of [16, 240].
[ 1.164   0       1.59  ]   [ Y  - 16  ]     [ r ]
[ 1.164  -0.391  -0.813 ] * [ Cb - 128 ]  =  [ g ]
[ 1.164   2.018   0     ]   [ Cr - 128 ]     [ b ]

You can generate a table of the YCbCr to RGB conversion using this bit of code. Values outside valid YCbCr ranges are simply mapped to white.


public class YCbCrAndRgb {
    public static void main(String[] args) {
        for (int y = 0; y < 256; y++) {
            for (int cb = 0; cb < 256; cb++) {
                for (int cr = 0; cr < 256; cr++) {
                    if (y >=  16 && cb >=  16 && cr >= 16 &&
                        y <= 235 && cb <= 240 && cr <= 240) 
                    {
                        int r = (int)Math.round( (y - 16) * 1.164                       + (cr - 128) *  1.596 );
                        int g = (int)Math.round( (y - 16) * 1.164 + (cb - 128) * -0.391 + (cr - 128) * -0.813 );
                        int b = (int)Math.round( (y - 16) * 1.164 + (cb - 128) *  2.018                       );
                        
                        if (r < 0) r = 0; else if (r > 255) r = 255;
                        if (g < 0) g = 0; else if (g > 255) g = 255;
                        if (b < 0) b = 0; else if (b > 255) b = 255;
                        
                        System.out.format("%02x%02x%02x\t%02x%02x%02x", y, cb, cr, r, g, b);
                        System.out.println();
                    } else {
                        System.out.format("%02x%02x%02x\tffffff", y, cb, cr);
                        System.out.println();
                    }
                }
            }
        }
    }
}

Using a pixel format without subsampling should let me convert pixels without blending interfering, however ffmpeg still adds blending even to 4:4:4, which would distort the results. So instead I generated several AVIs with small dimensions (8x8) with fourcc YV12 pixel format (4:2:0), each frame containing one solid color. That came out to 256 AVI files, each with 256*256 frames.

Those AVIs were fed through ffmpeg and VirtualDub and converted to uncompressed RGB AVIs. This ffmepg command converts YCbCr to RGB AVI.

ffmpeg -i inYCbCr.avi -vcodec rawvideo -pix_fmt bgr24 outRgb.avi

Under VirtualDub's Video->Color Depth menu you can set the output pixel format.

A little script walked through every AVI and pulled out the first RGB pixel of each frame and associated it with the original YCbCr color.

At that point I had a table with 256^3 rows and 4 columns:

Original YCbCr color
RGB generated with the standard equation and floating-point math
RGB generated with VirtualDub
RGB generated with ffmpeg

Here you can download 4096x4096 images of the resulting RGB values using the three conversion methods.

Floating-point
VirtualDub	ffmpeg

Now to analyze, starting with some visual comparisons. Diffing and autoleveling (normalizing) exposes what pixels are different.

Floating-point vs. VirtualDub

Floating-point vs. ffmpeg

Seems VirtualDub is far more accurate than ffmepg, but still doesn't match the floating-point version perfectly.

Now some numbers.

VirtualDub has 1795792 pixels (11%) different from the floating-point conversion.
ffmpeg has 10827725 pixels (65%) different from the floating-point conversion.

Differences broken down by color channel.

I'm disappointed but not surprised that there are so many 1-off values in general. But ffmpeg's variance is as much as -3?? Wow, I hope I'm doing something wrong because that's pretty bad.

In the rare case someone has over an hour and 10GB to spare, along with various strange prerequisites, you can download the scripts used to generate these details.

FFmpeg version SVN-r22107, Copyright (c) 2000-2010 the FFmpeg developers
  built on Feb 28 2010 06:11:15 with gcc 4.4.2
  configuration: --enable-memalign-hack --cross-prefix=i686-mingw32- --cc=ccache-i686-mingw32-gcc --
arch=i686 --target-os=mingw32 --enable-runtime-cpudetect --enable-avisynth --enable-gpl --enable-ver
sion3 --enable-bzlib --enable-libgsm --enable-libfaad --enable-pthreads --enable-libvorbis --enable-
libtheora --enable-libspeex --enable-libmp3lame --enable-libopenjpeg --enable-libxvid --enable-libsc
hroedinger --enable-libx264 --enable-libopencore_amrwb --enable-libopencore_amrnb
  libavutil     50. 9. 0 / 50. 9. 0
  libavcodec    52.55. 0 / 52.55. 0
  libavformat   52.54. 0 / 52.54. 0
  libavdevice   52. 2. 0 / 52. 2. 0
  libswscale     0.10. 0 /  0.10. 0

Saturday, March 13, 2010

IDCT Demystified (a little)

The inverse discrete cosine transform is a very mysterious and intimidating equation.

(apologies if I messed up the notation)

For the longest time I let the IDCT remain a black box. I found a handful of Java IDCT implementations, plugged them in, and cross my fingers.

I know what the 2D DCT does: it pushes all the image data to the top left corner of the block, while the IDCT undoes that magic. I'm not sure how it does this, but just knowing what it does is enough for me.

But recently I finally discovered that the IDCT is simply a couple of matrix multiplications.

idct_matrix^T . coefficients . idct_matrix

The IDCT equation doesn't really suggest that to the casual mathematician. Of course if you take a class or pay for a book on the subject, maybe this is old news to you.

For those uninformed like me, let's take a closer look at this IDCT matrix.

Theoretically we could throw a bunch of trigonometry identities at this matrix to simplify it, but it turns out to be so much easier to just calculate it and see which decimal values are the same. In the end, there turns out to only be 7 unique values (listed here in varying forms).

1/sqrt(8)       =  cos(  PI/ 4)/2
cos(1*PI/16)/2  =  cos(  PI/16)/2  =  sqrt(2+sqrt(2+sqrt(2)))/4
cos(2*PI/16)/2  =  cos(  PI/ 8)/2  =  sqrt(2+sqrt(2))/4
cos(3*PI/16)/2  =  cos(3*PI/16)/2  =  sqrt(2+sqrt(2-sqrt(2)))/4
cos(5*PI/16)/2  =  cos(5*PI/16)/2  =  sqrt(2-sqrt(2-sqrt(2)))/4
cos(6*PI/16)/2  =  cos(3*PI/ 8)/2  =  sqrt(2-sqrt(2))/4
cos(7*PI/16)/2  =  cos(7*PI/16)/2  =  sqrt(2-sqrt(2+sqrt(2)))/4

Now the IDCT matrix can be simplified to this:

Taking things a step further, let's multiply the two IDCT matrix multiplications out (Maxima is awesome). After a lot of trigonometric simplification, it turns into a massive matrix. This tiny portion below resembles what the entire matrix looks like.

You can download the full 30,000 pixel wide image if you dare.

All those additions/subtractions help to explain why fast IDCT implementations consist of so many sums and only occasional multiplications.

The bare math still makes it difficult to identify patterns, so I took things to the extreme and visualized it a bit.

Sunday, February 7, 2010

Writing a Java-only Video Player

Finally designed and put together a fully working implementation, involving a custom blocking queue and 6 threads. I’ve tested it on Windows XP, OS X, and old Kubuntu Hardy. Windows and Mac both look amazing and run perfectly smooth. Unfortunately on Linux the video playback stutters a lot. At first I thought it was due to the GC. After integrating an impressive object pool design, the cause actually turned out to be the Java audio api.

I use the audio playback position to determine when a frame should be displayed. Running a little test exposes how reliable this timer is.

import javax.sound.sampled.*;

public class AudioPositionTest {
    public static void main(String[] args) throws LineUnavailableException {
        AudioFormat audFmt = new AudioFormat(18900, 16, 2, true, true);
        DataLine.Info info = new DataLine.Info(SourceDataLine.class, audFmt);
        SourceDataLine player = (SourceDataLine) AudioSystem.getLine(info);
        player.open(audFmt);
        player.start();

        byte[] abBuf = new byte[2 * 400];

        long lngTestLength = 20 * 1000; // 20 seconds
        long lngTestStart = System.currentTimeMillis();
        long lngTestEnd = lngTestStart + lngTestLength;

        System.out.println(System.getProperty("os.name") + "\tJava " +
                           System.getProperty("java.version"));
        StringBuilder sb = new StringBuilder(400);
        while (System.currentTimeMillis() < lngTestEnd) {
            player.write(abBuf, 0, abBuf.length);
            long lngTime = System.currentTimeMillis();
            long lngPos = player.getLongFramePosition();
            sb.append(lngTime - lngTestStart);
            sb.append('\t');
            sb.append(lngPos);
            System.out.println(sb);
            sb.setLength(0);
            Thread.yield();
        }

        player.stop();
        player.close();
    }
}

Graphing some of the output makes it pretty clear why Linux playback is choppy.

Since that isn’t going to work, my next test will involve registering a listener for start and stop events, and track playback time manually. Though I’m worried my time and the playback time are going to get out of sync.

psx h4x0rz in teh wired

Monday, September 6, 2010

PlayStation Video Decoders:
The Final Showdown

Thursday, August 19, 2010

Immaculate Decoding

Upsampling

Deblocking

IDCT

Differences in YCbCr

Friday, March 26, 2010

YCbCr to RGB Conversion Showdown

Saturday, March 13, 2010

IDCT Demystified (a little)

Sunday, February 7, 2010

Writing a Java-only Video Player

Contact

Time and Space

Labels

See also...

psx h4x0rz in teh wired

Monday, September 6, 2010

PlayStation Video Decoders:The Final Showdown

Thursday, August 19, 2010

Immaculate Decoding

Upsampling

Deblocking

IDCT

Differences in YCbCr

Friday, March 26, 2010

YCbCr to RGB Conversion Showdown

Saturday, March 13, 2010

IDCT Demystified (a little)

Sunday, February 7, 2010

Writing a Java-only Video Player

Contact

Time and Space

Labels

See also...

PlayStation Video Decoders:
The Final Showdown