Archive for December, 2009

21
Dec
09

PDF shenannigans!

_MDL_ from the MalwareDomainList website tweeted on the weekend about a couple of PDF samples that wouldn’t decode.

http://www.malwaredomainlist.com/forums/index.php?topic=3626

So naturally I jumped in and had a look.

Putting the samples on a testbed and looking at them made it quite clear straight away what the problem was. pdf-parser.py from Didier Stevens didn’t support the filters used to encode the data stream. UHOH!

So now what?

“Lets make some decoders!”, I hear you call out.

well screw that! 😛

“Lets steal some decoders!”, you yell at me.

Yeah, ok.

Firstly we need to know what we’re needing to look for, these are standard filters in Adobe products and there should be something in one of the open source projects that we can look at.

sample 1: /Filter [ /ASCIIHexDecode /LZWDecode /ASCII85Decode /RunLengthDecode

sample 2: /Filter [/ASCIIHexDecode /LZWDecode /ASCII85Decode /RunLengthDecode /FlateDecode ]

Didier’s parser already supports the ASCIIHexDecode, ASCII85Decode and FlateDecode, we need LZW and RLE decoding now.

After a bunch of time sorting through the crap links from the not so crap links, I stumbled upon “pdfminer” which already had a LZW routine setup in python I could use. Still no RLE decoder though.

One small grace for the first sample, RLE was the last stage to decode. By removing the /RunLengthDecode filter statement, I was able to get the script to parse the file and give me some output!

While the output wasn’t as good as it could have been, the RLE was mostly ineffective at compressing the stream but only in obfuscating it partially. Much of the javascript was readable and it was possible to figure out what the intention was. MAYHEM! and trojans. I was able to immediately spot some well known pdf exploits. Collab.getIcon, Collab.collectEmailInfo, etc.

The 2nd sample had to wait, as it needed a clean decode from the RLE to pass into the FlateDecode filter.

I passed the samples on to others in my team to look at and one of the guys better at python than me ( everyone is better at python than me 😛 ) made a RLE decode function that worked great and we were able to decode the 2 samples.

Once I get his all clear, I’ll see about releasing something for your use/abuse at home. :]

Not much tech in this post, but I think those who’ve been in this position before know what it’s like finding a tool that doesn’t quite meet your needs and finding out you have the ability to modify it to suit. Don’t give up and google is a great tool. Remember to respect the copyrights of what you use, credit them where possible and even let the original author update their tool with your findings.

Advertisements