Wanted to do a quick post on PDF analysis. This will be a 2 part post, I don’t have time to finish it this week because Shmoocon is this weekend and I need to do other things :) The sample I'm using can be found here.
Probably the easiest and fastest thing to do is to run the PDF in a VM with acrobat reader and whatever tools you use to monitor system changes and just snag whatever dropped files you get to analyze. If that doesn’t work, and you have a throwaway system lying around that you can re-image later, you could just open the PDF in acrobat on a real physical machine and collect your files.
Also, if we select that last struct thats defined in 010 as a PDFXref, up top you will see one "%EOF" and if you scroll to the very bottom (about 164700 bytes on down) then we will see another "%EOF". Looks fishy eh :) Probably embedded file(s). At this point, your fastest route would probably be to to look shortly after the first "%EOF" for an 'M' and see if the next byte could be XOR'd with something to get a 'Z', then track down the PE section and try to dig the binary out manually. If you can do that, you can save yourself the following steps. I took a glance, and I noticed all incrementing and decrementing bytes:
Those are probably NULLs, so it looked like an incrementing / decrementing 2 byte key, but I fiddled with it for an hour or so and my un-XOR'd version certainly wasnt a valid PE file so I decided to analyze the shellcode instead. I haven't analyzed the shellcode yet, so it probably is still some simple XOR encryption.
So lets use yet another useful tool by Didier Stevens, pdf-parser, and it can be found here. If we run "pdf-parser.py --stats xxxxxxxx.pdf" on the file, it looks like object 1 has an embedded file:
So lets get a little more info on that object by running "pdf-parser.py --object 1 --raw xxxxxxxx.pdf" and we see this is object contains a stream of compressed data:
Getting warmer :) So lets inspect this fishy ass object by using the --filter flag and send the output to an xml file like so:
and if we open this up with notepad++ we see the following:
Looks like base64 encoded shellcode to me :) lets decode it in either notepad++ or copy it over to 010, they both can do it for us. I prefer 010, so copy everything between quotes after that "sBase=" tag, so starting at "SUkq..." down to "...AAC=" and paste it into a new file in 010. You should be here:
After you have it in the new file, highlight everything and run the DecodeBase64 script on it, and then switch to the hex view and we should see this:
Scrolling through, it looks like a NOP sled followed by some shellcode doesnt it? now you can save this new binary file and in IDA you can open it and take a look, or use a tool that will convert the hex to a binary we could debug. Thats what we'll do next post.
If you're pressed for tools Python also can do the base64 decode with something like:
import base64, sys; base64.decode(open("input.txt", "rb"), open("output.hex", "wb"))
where you saved "SUkq..." down to "...AAC=" in a text file in your current directory and named it "input.txt".
Thanks to Didier for his blog thats chock full of useful information, check it out if you haven't already.