Reverse Engineering Flash Games
by JavanteaDec 7, 2014
Unproprietary 0.4: Nov 25, 2015
Unproprietary 0.5: Sept 11, 2016
unproprietary-0.5 [sig]
unproprietary-0.4 [sig]
unproprietary-0.3 [sig]
unproprietary-0.2.1 [sig]
Git repository: git clone https://www.altsci.com/repo/unproprietary.git
Lume is a simple point and click Flash game available from Steam and Humble Bundle. I got it as part of of the Humble Weekly Sale: Amanita & Friends bundle and played it because I was interested in playing a short puzzle game one night. Since it's only 30 MB, it's pretty much guaranteed that it's a short game. It took an hour or so to complete and had some excellent puzzles. One of the main features of the game is the graphics which were made by a good artist with good style. Today, I was able to reverse engineer the game in a short amount of time using some custom tools I wrote, so I'm going to release them and ask for pull requests. Reverse engineering file formats is not a difficult process, but it is time consuming and it is more difficult to automate, so tools that do the work for us are valuable. That is why I'm releasing this simple set of tools I wrote.
If you'd like to follow along, you can buy Lume on Humble Store for $5.99. It supports Linux, Mac, and Windows. Lume has a Metacritic score of 69 and a high score of 83 by GameShark. A sequel was released recently called Lumino City (5 days ago) and it has gotten good reviews. It looks brilliant but it isn't released for Linux yet.
Let's get to it.
Method
First, we are only given an ELF file for Lume. This makes it easy to choose what to use to reverse it. Strings is an important part of the reverse engineering toolset. It gives you any ascii strings in the file. You can also use flags to get UTF-16 which is common in Windows programs and Mono programs which are common.
strings -a -n 10 Lume |less
There are a huge number of strings, but this should not stop us. Skip to the bottom and work your way up. These are strings from the game. After a while, we see a ton of "LAME3.98.2". This is a marker for an MP3 compressor. This means we can pull out all the mp3s with a special tool that can detect MP3s. We also see HTML which is a sign of Adobe Flash. It isn't proof that they're using Flash because a lot of people use HTML, but it's a good sign. We also see "Adobe Photoshop CS Windows" along with "IEC http://www.iec.ch" which is a pretty clear sign of JPEG files. When you see, "GCC: (GNU) 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)", you're at the bottom of the ELF, so you don't have to go further up unless you want to.
The next step is to look at the executable itself. Objdump is a common tool to disassemble and look at different parts of the executable. It's a part of binutils and should be on every system. If it isn't, install it. You never know when you will need to disassemble a binary.
objdump -x Lume
The program headers and the section headers don't go past 10500519. Suspicious, right? The file is 30MB, the ELF only describes 10MB. Our hex editor shows that the game stuff starts at 10MB. I use KDE's hex editor, okteta, but you just need a hex editor that works.
okteta Lume
Note the offset at which the interesting stuff starts. In this case it's a03a90. Since it's hex, you'll want the decimal version of that number. You can use python to do that. Simply type 0xa03a90 into the python command line and it will give you the decimal value. The opposite operation (decimal to hex) is done with the hex function. hex(10500752) == '0xa03a90'. Note that it returns a string because that's how you can use it in your programs.
Now we get to use the first program from the file format reversing suite I wrote, unproprietary. Find_compress1 is a simple program that uses some of my knowledge of file formats to automatedly reverse file formats. It simply steps through a file and tries to decide whether there is a file there. It can detect swf, mp3, jpeg, png, gif, ogg, elf, zip, wav, tar, rar, sh, bzip, gzip, xz, and zlib. Those are very common in firmware, games, and custom archive formats for software. The reason why these formats are so common is because of free or common libraries or frameworks that are available that make it possible for programmers to add these compressed formats to their software for very cheap or free and save megabytes or gigabytes in comparison to raw data. There are hundreds of other common formats used by programs which is why I wrote Magic1. Magic1 uses the immense magic library to find every possible file format. It can find incredibly old formats, incredibly inane formats, and even file types that have no format at all. That makes its false positive rate so high that it's unusable for many applications, but incredibly useful for certain applications. Unlike Find_compress1, Magic1 doesn't have an offset flag yet so it can't be used for large files with the data at 10MB like Find_compress1, but that will be an easy upgrade when someone needs that functionality.
python find_compress1.py -m -o 10500752 Lume >find_Lume_10500752.txt File size: 31062179 Lume: zlib (offset: 10500888) Lume: zlib (offset: 10501296) Lume: zlib (offset: 10501377) Lume: swf (offset: 10501924) Lume: zlib (offset: 10503705)
You can see that Find_compress1 has a lot of false positives in finding zlib, but you can filter those out if you like. It found the SWF instantly, which makes our job very easy. If you look at the end of the find_Lume_10500752.txt file, you can see that the last position it found something is 11548811. That's because it only attempts 1MB at a time. You can change that by using the -l flag, but I recommend searching in 1MB chunks unless you just want results (I often do that).
Now that we know where the SWF is, we can use the script Skipto to pull it out of the ELF file. Skipto takes a filename and an offset and returns the data at that offset (to the end of the file) to standard output. It's a useful tool and works for things like this.
python3 skipto.py Lume 10501924 >Lume.swf
At this point we have an SWF file which we can work with. swfdump from swftools is a pretty handy tool, but doesn't give us what we really want. You can skip this command if you're not interested in the file structure of an SWF.
swfdump Lume.swf |less
The program swftopython on the other hand is an essential tool in reverse engineering Flash games. It's practically a reverse engineering tool for SWF. Along with swfextract from swftools, you can do a ton, which we will do. swftopython comes from ming, which is a Flash library with many language bindings. If you want to make a simple SWF file, ming is the place to be.
swftopython Lume.swf >Lume.py
Look at the output of swftopython and be amazed. First of all, it's the entire Flash (minus some important embedded files) in Python syntax. You can see vector graphics, animation, and so on. The converter doesn't do everything correctly, so there are plenty of bugs to work out, but the big conversion is done. You can get all the text from the game using this command:
grep addString Lume.py >Lume.txt
One of the most import assets in a game is the images. You can get those using this hackish code:
swf=Lume.swf grep swfextract Lume.py |cut -c 3-|sed 's/\$swf/$swf;/g' # The following command is extremely dangerous and should only be run if you validate the output of the above command and find no malicious code. eval $(grep swfextract Lume.py |cut -c 3-|sed 's/\$swf/$swf;/g')
There's a more complex operation that does this more securely, but I can't be asked to work on that until I've got all the assets. Some of the JPEGs didn't work, so we'll have to take a look at what went wrong with them.
The second most important asset for a game is audio. As part of the Humble Bundle I got the OST in FLAC, so I don't care about getting all the audio in MP3 format, but there are invaluable skills that we can gain by getting those. We can see that the swftopython script has created placeholders for all the audio in the form of SWFSound objects. Grepping for music is a useful task which gives us information about the background music.
grep -i music Lume.py
music.attachSound('lumeScoreMain');\
Since they are using the word lumeScoreMain as an identifier, we should look for that.
grep -i Score Lume.py
m.addExport(character6,'lumeScoreMain');
So now we know that character 6 is the music. Let's grab that first. We must use swfextract and then our script that decompresses Flash files. Flash files are often compressed with zlib which is very easy to decompress in Python.
swfextract -i 6 -o Lumem3.swf Lume.swf python swf_decompress.py Lumem3.swf >Lumem3a.mp3
Success. The 2.5MB mp3 file is the actual Lume soundtrack, 2:07 long. Find_compress1 is unable to find the MP3 but Magic1 is.
python magic1.py Lumem3a.mp3
New result 'data' at 0
New result 'Hitachi SH big-endian COFF executable, stripped' at 2
New result 'data' at 3
New result 'raw G3 data' at 5
New result 'data' at 6
New result 'PDP-11 overlaid separate executable not stripped' at 10
New result 'data' at 11
New result 'TTComp archive data' at 29
New result 'data' at 30
New result '8086 relocatable (Microsoft)' at 33
New result 'data' at 34
New result 'ps database from kernel \\377\\373\\242d' at 35
New result 'data' at 36
New result 'MPEG ADTS, layer III, v1, 160 kbps, 44.1 kHz, JntStereo' at 39
You can see that there are a ton of false positives, but that's all in a day's work as a reverse engineer. You can remove the unnecessary junk from the file using skipto or you could just leave it be since mplayer can play it and exiftool is actually able to read it as well.
exiftool Lumem3a.mp3
ExifTool Version Number : 9.77
File Name : Lumem3a.mp3
Directory : .
File Size : 2.4 MB
File Modification Date/Time : 2014:12:07 16:30:16-08:00
File Access Date/Time : 2014:12:07 16:30:16-08:00
File Inode Change Date/Time : 2014:12:07 16:30:16-08:00
File Permissions : rw-r--r--
File Type : MP3
MIME Type : audio/mpeg
MPEG Audio Version : 1
Audio Layer : 3
Audio Bitrate : 160 kbps
Sample Rate : 44100
Channel Mode : Joint Stereo
MS Stereo : On
Intensity Stereo : Off
Copyright Flag : False
Original Media : True
Emphasis : None
Duration : 0:02:07 (approx)
Find_compress1 found a false positive of a jpeg in Lumem3a.mp3. A look at it showed that it wasn't really a jpeg. Jpeg almost never has false positives, but remember your information science. Given enough random bytes, any false positive is possible given a simple enough parser.
Now let's find the rest of the sounds.
grep SWFSound Lume.py >Lume_sounds.txt
There aren't that many, but let's automate the extraction process. It's a bit messy because we're grabbing the number out of a script, but since it's a regular pattern, it works. This could be done with sed, but I don't want to get into that.
for x in $(gawk '{print $1}' Lume_sounds.txt | cut -c 10-); do echo "$x"; swfextract -i "$x" -o Lumes"$x".swf Lume.swf python swf_decompress.py Lumes"$x".swf >Lumes"$x".mp3 rm Lumes"$x".swf done
One of my favorite patterns in all of computing is the histogram of lines. When you have a program that outputs data that is repetitive, sort | uniq -c | sort -nr will do a histogram putting the most common stuff at the top and the least common stuff at the bottom. Exiftool is a very repetitive program, so we use that pattern. Ah yes, the power of GNU utilities.
exiftool Lumes*.mp3 | sort |uniq -c |sort -nr |less
This shows us that there was 1 error and 38 files with the same Sample Rate. 22 are stereo, 16 are mono. Most use 160 kbps, but some use 112, and some use 128. Two of them use 64 kbps. Lumes946 is the one with the error, so let's take a look. It's 631KB and it's a nice song with piano and some electronic sounds. It's the second longest after the score. For now we forget about it.
I wonder if there are any movies in the game. Certainly there should be if memory serves me. What video format does flash use? It has its own FLV or something like that.
# SWF_DEFINEVIDEOSTREAM # You'll need to extract video175.flv character175 = SWFVideoStream('video175.flv');# 1650 frames advertised character175.setFrameMode(SWF_VIDEOSTREAM_MODE_MANUAL); # SWF_DEFINEVIDEOSTREAM # You'll need to extract video740.flv character740 = SWFVideoStream('video740.flv');# 150 frames advertised character740.setFrameMode(SWF_VIDEOSTREAM_MODE_MANUAL);
Easy enough. Let's extract.
swfextract -i 175 -o video175.flv Lume.swf
Mplayer can play this file, so we view it in mplayer and see that it is all the transitions between places in the game as well as all the rooms. All of the transitions are linked together making a nice little tour of the game.
swfextract -i 740 -o video740.flv Lume.swf
video740.flv is a shakycam video of a messy building that I don't remember from the game. I believe it's part of the ending. So now we've grabbed the text, the images (not all of them, but any that we could easily grab), the audio, and the video. Is there anything left? Yes, the game logic. All of the script is in the Python script. If you want to see what is going on and how to solve the puzzles, you need only look. A programmer who knew that their program was going to be reversed might take measures to obfuscate their code and answers. This is not the case in Lume, most of the solutions are pretty easy if you understand JavaScript or ActionScript.
For the images that we weren't able to pull using swfextract -j, we can use swfextract -i which extracts as compressed SWF and then decompress and pull off the header. Luckily we have the tools to make that easy.
swfextract -i 742 -o Lumei742.swf Lume.swf python swf_decompress.py Lumei742.swf >Lumei742.jpg exiftool Lumei742.jpg python3 skipto.py Lumei742.jpg 38 > Lumei742a.jpg rm Lumei742.swf Lumei742.jpg
It turns out that the junk at the beginning is just 38 bytes long, so we can strip it off with skipto.py.
It turns out there are 58 image files that didn't work with -j, so let's get them using the above in a script.
for f in $(grep swfextract Lume.py |cut -c 3- | gawk '{print $5}'); do if ! [ -e "$f" ]; then num=$(basename "$f" .jpg) num=${num:9:15} swfextract -i "$num" -o Lumei"$num".swf Lume.swf python swf_decompress.py Lumei"$num".swf >Lumei"$num".jpg python3 skipto.py Lumei"$num".jpg 38 > Lumei"$num"a.jpg rm Lumei"$num".swf Lumei"$num".jpg fi done
This massive script finds all files that currently don't exist and runs the above commands on them. A couple didn't turn out, so let's look at them.
file Lumei*.jpg |grep -v JPEG Lumei22a.jpg: data Lumei37a.jpg: data Lumei40a.jpg: data Lumei81a.jpg: data Lumei85a.jpg: data
All of them are JPEGs but we got the wrong offset. So let's figure out the right one manually since there are so few.
exiftool Lumei22.jpg
Warning : Skipped unknown 36 byte header
So it's off by 2. Let's see about the others.
for num in 22 37 40 81 85; do swfextract -i "$num" -o Lumei"$num".swf Lume.swf python swf_decompress.py Lumei"$num".swf >Lumei"$num".jpg python3 skipto.py Lumei"$num".jpg 36 > Lumei"$num"a.jpg rm Lumei"$num".swf Lumei"$num".jpg done file Lumei* |grep -v JPEG
It succeeded. So there we go. Lumei22a.jpg is a wrench in case you're interested. There is nothing left to do, so that's our tutorial.
Why would someone want to reverse engineer a flash game? For one there's the challenge and the thrill of learning to understand things. A pretty good reason is that you never want to work with Adobe tools and you are forced to rewrite legacy Flash components for your corporate website in HTML5 and you want the original JPEG, vector graphics, and animations from the actual file you formerly used. Another good reason is to take good music from flash games so that you can play them on music players that only accept MP3, Ogg, WAV, and FLAC (mobile players come to mind, but also library based music players like gmusicbrowser and VLC). Another good reason to reverse flash games is to make wallpapers out of the images. Authors often make a handful of wallpapers for their game, but they rarely make the perfect wallpaper, so a person can make a low-resolution wallpaper from one of the source files (Lume is an excellent example, their background here is a decent background, but there are dozens more that could be made from the source files).
Conclusion
Thanks for reading this diversion into the world of reverse engineering. The tools I used were simple and are not as complex as most tasks I carry out to reverse engineer challenging binaries. Flash games are a very common target of reverse engineering so I thought I would release tools and show how they work. Now you too can reverse engineer these games.
END of transmission.
Permalink-
Leave a Reply
Comments: 2
Leave a reply »
Dig this primer into reverse engineering! Thanks for sharing.
I'm glad that you were able to read this. I'll write a few more for you in the future.