Saturday, August 13, 2011
Replicate
A very impressive Russian site is trying to recreate most of the game's content for browsing on the web. What impressed me even more is the creator managed to reverse-engineer some of the game's data types before I did. He kindly gave jPSXdec a shout out since it was used heavily to extract nearly everything on the site.
This very old Japanese site I've seen before, but did a good job of documenting the game's content as well.
Friday, August 5, 2011
Translation Hacking
I figured it would be a bit rough to use this approach. Unfortunately, anything more than this would multiply the amount of work many times.
I've also discovered there are 34 images on the game discs that don't seem to ever appear in the game. They're not particularly interesting, however.
Wednesday, June 1, 2011
Decoding MPEG-like bitstreams
While developing jPSXdec for the last 4 years, I've run across three different methods of decoding bitstreams. Of course, in the honored tradition of multimedia hacking, none of these approaches are documented anywhere. So I thought I'd break that unwritten rule and actually write about them.
If you'd like to learn more about what part this plays in MPEG and PlayStation .STR decoding, check out my thorough document on the subject: PlayStation_STR_format.txt
Approach 1: Brute force
This is the most obvious approach. For each code, peek the next n-bits until the bits match something.
Next17Bits = Peek17Bits()
For Each Possible Code
If Next17Bits starts with code bits
Skip bit code length
If END_OF_BLOCK code
return END_OF_BLOCK
Else If ESCAPE_CODE
ParseEscapeCode()
Else
return matching code
End If
End if
Next
In the worst case, this approach requires 111 conditional checks to identify a bit code. To be honest, I've never actually seen this implemented anywhere besides by me years ago when first learning about bitstream parsing.
Approach 2: Binary tree
I actually ran across this approach implemented in the Serial Experiments Lain PlayStation game. You have a tree of conditionals testing the value of each bit until a match is found.
If ReadNextBit() == '1'
If ReadNextBit() == '0'
return END_OF_BLOCK
Else
If ReadNextBit() == '0'
return ('11+0'.ZeroRun, '11+0'.AC)
Else
return ('11+1'.ZeroRun, '11+1'.AC)
End If
End If
Else
If ReadNextBit() == '1'
If ReadNextBit() == '1'
// '011s'
...
Else
// '010... and so on
End If
Else
// '00... and so on
End If
End If
The branching can be optimized a bit for most leaves: once the length of the bit code is clear, the remaining bits can be used as the index in several small lookup tables. The jPSXdec implementation only requires (in the worst case) 12 branches to determine the longest bit codes.
Approach 3: Array lookup
I believe this type of approach is used in ffmpeg and the Q-gears decoder. Thanks to the unspoken tradition of never documenting anything, I was unable to understand what it was doing. It wasn't until I reverse-engineered the .iki bitstream parsing that I finally saw how this approach works.
At least for MPEG-1 (and PSX STR), you can take advantage of its particular set of variable length bit codes. Only the first code ('11s') and the end-of-block code ('10') need special parsing. The rest of the codes fall under one of three groups. The group a code belongs to can be determined by looking at how many initial zeros it has.
- Group one starts with between 1 and 4 zeros (this also includes the escape code
000001). - Group two starts with between 6 and 8 zeros.
- Group three starts with between 9 and 11 zeros.
All codes in their groups:
[Special handling]
01 // end-of-block
11s
---- [Group 1] ----
0 11s
0 100s
0 101s
0 0101s
0 0110s
0 0111s
0 00100s
0 00101s
0 00110s
0 00111s
0 000100s
0 000101s
0 000110s
0 000111s
0 00001 // escape code
0 0100000s
0 0100001s
0 0100010s
0 0100011s
0 0100100s
0 0100101s
0 0100110s
0 0100111s
---- [Group 2] ----
000000 1000s
000000 1001s
000000 1010s
000000 1011s
000000 1100s
000000 1101s
000000 1110s
000000 1111s
000000 010000s
000000 010001s
000000 010010s
000000 010011s
000000 010100s
000000 010101s
000000 010110s
000000 010111s
000000 011000s
000000 011001s
000000 011010s
000000 011011s
000000 011100s
000000 011101s
000000 011110s
000000 011111s
000000 0010000s
000000 0010001s
000000 0010010s
000000 0010011s
000000 0010100s
000000 0010101s
000000 0010110s
000000 0010111s
000000 0011000s
000000 0011001s
000000 0011010s
000000 0011011s
000000 0011100s
000000 0011101s
000000 0011110s
000000 0011111s
---- [Group 3] ----
000000000 10000s
000000000 10001s
000000000 10010s
000000000 10011s
000000000 10100s
000000000 10101s
000000000 10110s
000000000 10111s
000000000 11000s
000000000 11001s
000000000 11010s
000000000 11011s
000000000 11100s
000000000 11101s
000000000 11110s
000000000 11111s
000000000 010000s
000000000 010001s
000000000 010010s
000000000 010011s
000000000 010100s
000000000 010101s
000000000 010110s
000000000 010111s
000000000 011000s
000000000 011001s
000000000 011010s
000000000 011011s
000000000 011100s
000000000 011101s
000000000 011110s
000000000 011111s
000000000 0010000s
000000000 0010001s
000000000 0010010s
000000000 0010011s
000000000 0010100s
000000000 0010101s
000000000 0010110s
000000000 0010111s
000000000 0011000s
000000000 0011001s
000000000 0011010s
000000000 0011011s
000000000 0011100s
000000000 0011101s
000000000 0011110s
000000000 0011111s
Each group has its own lookup table of 256 entries, and each code will be associated with one or more entries in the lookup table. After stripping off the minimum number of zeros in the group, no entry in the group will have more than 8 bits remaining in the bit code. For codes that have 8 bits remaining, its value identifies the associated table index. For the bit codes that have fewer than 8 bits remaining, you have to walk through every combination of the remaining bits to find all associated indexes.
Example:
Use 0 for sign bit for now: 001100
Strip off first leading 0: 01100
Find all combinations of remaining bits:
01100+000 = 96 (table index)
01100+001 = 97
01100+010 = 98
01100+011 = 99
01100+100 = 100
01100+101 = 101
01100+110 = 102
01100+111 = 103
Thus bit code 00110s will be associated with table indexes 96-103.
Now each table entry needs three values: the inverse discreet cosine transform (IDCT) run of zero-value alternating current (AC) coefficients, the non-zero AC coefficient value, and the length of the bitstream bits that should be skipped.
Once all three tables are constructed, the following pseudo code will parse your bitstream.
If ReadNextBit() == '1'
If ReadNextBit() == '0'
return END_OF_BLOCK
Else
If ReadNextBit() == '0'
return ('11+0'.ZeroRun, '11+0'.AC)
Else
return ('11+1'.ZeroRun, '11+1'.AC)
End If
End If
Else
Next16Bits = Peek16Bits()
If NumberOfLeadingZeros(Next16Bits) <= 4
Match = LookupTable1[(Next16Bits >> 8) & 0xff]
Else If NumberOfLeadingZeros(Next16Bits) <= 8
Match = LookupTable2[(Next16Bits >> 3) & 0xff]
Else If NumberOfLeadingZeros(Next16Bits) <= 11
Match = LookupTable3[Next16Bits & 0xff]
Else
// bitstream error
End If
If Match == ESCAPE_CODE
SkipBits(ESCAPE_CODE.BitLength)
ParseEscapeCode()
Else
SkipBits(Match.BitLength)
return (Match.ZeroRun, Match.AC)
End If
End If
Of course the implementation details can vary, but this gives the idea. The Approach 3 I implemented for jPSXdec requires about 8 conditionals to identify a bit code in the worst case. I've found it to be about 10%-15% faster than the Approach 2 I've been using.
Monday, September 6, 2010
PlayStation Video Decoders:
The Final Showdown
The Final Showdown
Most importantly, I finally captured what it ACTUALLY looks like on PlayStation hardware (in the dead center).
Those on top get it (more) correct, those on the bottom get it (more) wrong (and ffmpeg and Q-gears are just weird).Naturally jPSXdec dominates in quality and accuracy. :)
The lineup:
- PSXPlay
- a fixed version of PCSX by Gabriele Gorla
- jPSXdec
- MAME
- PsxMC
- PSmplay
- PsxTulz
- reevengi
- PCSX
- ffmeg
- Q-gears
Thursday, August 19, 2010
Immaculate Decoding
Upsampling
When PlayStation videos are created, the pixels are broken up into luma (brightness) components and chroma (color) components. Like with JPEG and MPEG formats, 3/4 of the chroma information is thrown away because the human eye can't really tell (this is an example of lossy compression).When decoding, that lost chroma information needs to be recreated somehow to convert the pixels back into RGB. Unfortunately there is no one 'right' way to do it, because there's really no way to get that lost information back. All you can do is 'guess' by filling in the blanks based on the information around the pixels using some kind of interpolation. Some of the most well known kinds of interpolation are: nearest neighbor, bilinear, bicubic, and lanczos. I've read about more advanced chroma upsampling approaches that also take into account the luma component. This works because there is often a lot of correlation between the luma and chroma components--when the luma changes, the chroma probably will also. I'd like to try to find the best one, but I haven't had much luck on finding many good resources about them all.
Now, because this is essentially just scaling of a 2D image, I've been worried about this article that points out a nasty little gremlin called gamma correction. It seems nearly everyone has been doing image scaling wrong since the popularization of the sRGB gamma corrected color space. I'm assuming video isn't immune to the same problem, yet I've never seen anyone mention it.
Deblocking
Assuming we find the upsampling method of choice, there are still ways the image can be improved. Most video codecs break the frames down into 'blocks', then encode each block separately--again losing information along the way. When everything is reconstructed, that lost information can often be seen as visible distortions between blocks. This problem has been addressed in more recent video codecs such as h.264, but is still a problem with the older MPEG2. I believe nearly all DVD players do some deblocking before showing the final frame.Even though MPEG2 has been around a long time and deblocking is so common, I've had the darndest time trying to find much mention of what deblocking algorithms are in use today. UnBlock, and this page on JPEG Post-Processing are the best I've come by. I think I've read somewhere that some advanced deblockers can even make use of the original MPEG2 data to improve the deblocking.
I still consider myself a multimedia novice, so there are probably more post-processing methods that would really make the output shine. A big bummer among all research in that area is that if you can think it, you can pretty much count on it been patented.
Given how difficult all this stuff is, I really really wish I could just pass that problem off to the big players in the field, such as ffmpeg (i.e. libavcodec). I've even considered writing a PSX video to MPEG2 video translator so the MPEG2 video can be fed into ffmpeg. Unfortunately there are some big reasons why doing this still makes me uneasy.
IDCT
The PlayStation uses its own particular IDCT approach that I've never seen anywhere else. Given how important it is that the DCT used to encode the video matches the IDCT used to decode, there are no existing decoders that can do it (except jPSXdec of course).Differences in YCbCr
Another worry is that a real good quality MPEG2 decoder will spatially position the chroma components in the proper location (vertically aligned with every other luma component) as opposed to how I believe PSX does it (the MPEG1 way: in-between luma components).To make things a bit more complicated, MPEG2 uses the proper Rec.601 Y'CbCr color space with [16-235] luma, and [16-240] chroma range. PSX on the other hand, uses the full [0-255] range for color information. Many video converters don't handle that discrepancy very well. Related to that, pretty much all converters store the data as integers, so any fractional information is lost after every conversion. In contrast, jPSXdec maintains all that fractional information until the very end.
In general though, I have not been impressed with ffmpeg's quality, so I can't suggest people use it when looking for good quality.
One advantage that comes when incorporating all these enhancements in jPSXdec is it provides a much nicer user experience. No need to be hopping between multiple tools to get the best results.
So if I were to actually implement all these features, where would I get the information I lack? Perhaps the doom9.org forums could help. If any multimedia gurus happen to pass by this post, please, if you could, toss some wisdom my way.
Friday, March 26, 2010
YCbCr to RGB Conversion Showdown
The Rec.601 YCbCr to RGB equation is defined as such:
Given Y color range of [16, 235] and Cb,Cr color range of [16, 240].[ 1.164 0 1.59 ] [ Y - 16 ] [ r ]
[ 1.164 -0.391 -0.813 ] * [ Cb - 128 ] = [ g ]
[ 1.164 2.018 0 ] [ Cr - 128 ] [ b ]
You can generate a table of the YCbCr to RGB conversion using this bit of code. Values outside valid YCbCr ranges are simply mapped to white.
public class YCbCrAndRgb {
public static void main(String[] args) {
for (int y = 0; y < 256; y++) {
for (int cb = 0; cb < 256; cb++) {
for (int cr = 0; cr < 256; cr++) {
if (y >= 16 && cb >= 16 && cr >= 16 &&
y <= 235 && cb <= 240 && cr <= 240)
{
int r = (int)Math.round( (y - 16) * 1.164 + (cr - 128) * 1.596 );
int g = (int)Math.round( (y - 16) * 1.164 + (cb - 128) * -0.391 + (cr - 128) * -0.813 );
int b = (int)Math.round( (y - 16) * 1.164 + (cb - 128) * 2.018 );
if (r < 0) r = 0; else if (r > 255) r = 255;
if (g < 0) g = 0; else if (g > 255) g = 255;
if (b < 0) b = 0; else if (b > 255) b = 255;
System.out.format("%02x%02x%02x\t%02x%02x%02x", y, cb, cr, r, g, b);
System.out.println();
} else {
System.out.format("%02x%02x%02x\tffffff", y, cb, cr);
System.out.println();
}
}
}
}
}
}
Using a pixel format without subsampling should let me convert pixels without blending interfering, however ffmpeg still adds blending even to 4:4:4, which would distort the results. So instead I generated several AVIs with small dimensions (8x8) with fourcc YV12 pixel format (4:2:0), each frame containing one solid color. That came out to 256 AVI files, each with 256*256 frames.
Those AVIs were fed through ffmpeg and VirtualDub and converted to uncompressed RGB AVIs. This ffmepg command converts YCbCr to RGB AVI.
ffmpeg -i inYCbCr.avi -vcodec rawvideo -pix_fmt bgr24 outRgb.aviUnder VirtualDub's Video->Color Depth menu you can set the output pixel format.
A little script walked through every AVI and pulled out the first RGB pixel of each frame and associated it with the original YCbCr color.
At that point I had a table with 256^3 rows and 4 columns:
- Original YCbCr color
- RGB generated with the standard equation and floating-point math
- RGB generated with VirtualDub
- RGB generated with ffmpeg
Floating-point![]() | |
VirtualDub![]() | ffmpeg![]() |
Floating-point vs. VirtualDub![]() | Floating-point vs. ffmpeg![]() |
Now some numbers.
- VirtualDub has 1795792 pixels (11%) different from the floating-point conversion.
- ffmpeg has 10827725 pixels (65%) different from the floating-point conversion.
I'm disappointed but not surprised that there are so many 1-off values in general. But ffmpeg's variance is as much as -3?? Wow, I hope I'm doing something wrong because that's pretty bad.In the rare case someone has over an hour and 10GB to spare, along with various strange prerequisites, you can download the scripts used to generate these details.
FFmpeg version SVN-r22107, Copyright (c) 2000-2010 the FFmpeg developers
built on Feb 28 2010 06:11:15 with gcc 4.4.2
configuration: --enable-memalign-hack --cross-prefix=i686-mingw32- --cc=ccache-i686-mingw32-gcc --
arch=i686 --target-os=mingw32 --enable-runtime-cpudetect --enable-avisynth --enable-gpl --enable-ver
sion3 --enable-bzlib --enable-libgsm --enable-libfaad --enable-pthreads --enable-libvorbis --enable-
libtheora --enable-libspeex --enable-libmp3lame --enable-libopenjpeg --enable-libxvid --enable-libsc
hroedinger --enable-libx264 --enable-libopencore_amrwb --enable-libopencore_amrnb
libavutil 50. 9. 0 / 50. 9. 0
libavcodec 52.55. 0 / 52.55. 0
libavformat 52.54. 0 / 52.54. 0
libavdevice 52. 2. 0 / 52. 2. 0
libswscale 0.10. 0 / 0.10. 0
Saturday, March 13, 2010
IDCT Demystified (a little)
(apologies if I messed up the notation)For the longest time I let the IDCT remain a black box. I found a handful of Java IDCT implementations, plugged them in, and cross my fingers.
I know what the 2D DCT does: it pushes all the image data to the top left corner of the block, while the IDCT undoes that magic. I'm not sure how it does this, but just knowing what it does is enough for me.
But recently I finally discovered that the IDCT is simply a couple of matrix multiplications.
idct_matrixT . coefficients . idct_matrix
The IDCT equation doesn't really suggest that to the casual mathematician. Of course if you take a class or pay for a book on the subject, maybe this is old news to you.
For those uninformed like me, let's take a closer look at this IDCT matrix.

Theoretically we could throw a bunch of trigonometry identities at this matrix to simplify it, but it turns out to be so much easier to just calculate it and see which decimal values are the same. In the end, there turns out to only be 7 unique values (listed here in varying forms).
1/sqrt(8) = cos( PI/ 4)/2Now the IDCT matrix can be simplified to this:
cos(1*PI/16)/2 = cos( PI/16)/2 = sqrt(2+sqrt(2+sqrt(2)))/4
cos(2*PI/16)/2 = cos( PI/ 8)/2 = sqrt(2+sqrt(2))/4
cos(3*PI/16)/2 = cos(3*PI/16)/2 = sqrt(2+sqrt(2-sqrt(2)))/4
cos(5*PI/16)/2 = cos(5*PI/16)/2 = sqrt(2-sqrt(2-sqrt(2)))/4
cos(6*PI/16)/2 = cos(3*PI/ 8)/2 = sqrt(2-sqrt(2))/4
cos(7*PI/16)/2 = cos(7*PI/16)/2 = sqrt(2-sqrt(2+sqrt(2)))/4
Taking things a step further, let's multiply the two IDCT matrix multiplications out (Maxima is awesome). After a lot of trigonometric simplification, it turns into a massive matrix. This tiny portion below resembles what the entire matrix looks like.
You can download the full 30,000 pixel wide image if you dare.
All those additions/subtractions help to explain why fast IDCT implementations consist of so many sums and only occasional multiplications.
The bare math still makes it difficult to identify patterns, so I took things to the extreme and visualized it a bit.




