Wednesday, January 10, 2007

Unpacking I

Recently, I was given a copy of a piece of malware by Curt Wilson. He had unpacked it in memory, but wasn't quite sure how to finish the process in order to load it up again for further analysis. As a simple howto, and as a way to keep a few notes for myself, I'm documenting the unpacking process.

The sample in question was found as upnp.exe on disk. Looking at it, it was packed with Morphine. I don't personally consider knowing which packer it is ahead of time to be critical, though there are a couple of exceptions. First, if I know it is UPX packed, then I may just try using the latest UPX to unpack it. It works maybe half the time. The other half, there are UPX "corrupters" out that there will break that, and there is at least one packer designed to look somewhat like UPX. Second, there are a couple of packer out there that are probably easily beyond my skill level, and I wouldn't bother trying. The two I can think off of the top of my head are both written by Nicolas Brulez.

If you want to find out what packer was used, you can usually get an answer from PEiD, or by trying the VirusTotal service. Here is the VirusTotal analysis of upnp.exe, for example. Both of those correctly identify this as Morphine, though I got through the hard part of the unpacking without knowing that.

The basic unpacking technique is to execute the program with a debugger until the original binary (or as much as is left) is uncompressed in memory, and then you dump the copy in memory. Usually, when you dump it you also fix the imports so that your analysis tool will know which functions are being called. I'll show you an example in a moment. For a somewhat more advanced example, you can watch a video where Nicolas does an unpack on a binary that has more than one packer used on it, each with multiple antidebugging tricks. This was from a talk we gave at RECON.

First thing, the warnings: If you choose to try to unpack malware, you are taking a chance that you will make a mistake, and just run the thing. If you do this on a real production machine, you will be sad, and infected. If you're smart, you will have a sacrificial machine you can do this on, that you can restore to a known state with no place for the malware to hide. VMWare is popular, though unfortunately, a lot of malware now checks to see if it is running in a VM and shuts down.

The strictest AV guys will also tell you that it is irresponsible to do any analysis on a non-isolated machine, because there's a good chance you will spread it further. If you work on a non-isolated machine and people find out, there's a chance that some or all AV companies will never employ you. That may not seem like much of a threat, but you never know who Symantec or McAfee are going to buy next.

In other words, do as I say, not as I do. When you press the wrong button in your debugger, and run the malware all the way, you will find yourself very interested in finishing the analysis in a hurry in order to find out what you've just done to your machine. You should also know which cable to pull in a hurry to disconnect yourself from the Internet.

So, on with the debugging. Load the program in your favorite debugger. I like to use the debugger now built into IDA Pro. Another popular (and free) debugger is Ollydbg. With both, you need to set an initial breakpoint, and then run the program. Generally speaking, what you will be doing is stepping through the code until you get to the point where you think you've hit the original packed binary, then you leave it paused.

This is the easiest place to screw up. For one, in both debuggers, the step, step over, and run keys are all next to each-other. If you fat-finger the keypress, you just infected your machine. Also, you may encounter antidebugging tricks. I can't say I noticed any with Morphine, but I have certainly seen them with others. Even if you're single-stepping, if you miss accounting for an antidebugger trick, you may find that the program finishes executing without you.

One of the things that pretty much all packers do is to replace a certain portion of the OS's loader. For Windows, this almost always means replacing the portion that takes care of loading and mapping the imports. So, if you are tracking through packer code, you will see the packer calling LoadLibrary and GetProcAddress, in a loop. Packers also almost always compress and/or obfuscate the binary code, so there's also going to be some loops where it is iterating over memory segments. These memory segments are usually create by calling VirtualAlloc.

I bring this up, because you really, really want to step over these functions and not waste time stepping through them or trying to follow them into the kernel. You will also need to become adept at spotting loops. You will need to skip those most of the time, just because they will be too tedious to step through manually. Yet another place to screw up.

Here's an example of something I can spot from experience:
db1

You see where it's pushing a bunch of bytes in the ASCII range onto the stack, and then calling something? Let me decode it to make it a little easier to read:

db2


Read the function names backwards. This tells me I'm in the beginning stages of restoring the imports. Then it calls VirtualAlloc:

db3


And you trace through some loops where it is importing all the libraries and fixing up pointers to the functions.

Eventually, you will arrive at something like this:

db4


There is often a telltale "JMP EAX" or "CALL EAX", or similar. Step one more instruction, and you're at the Original Entry Point (OEP). This is when you're unpacked, or as much as you're going to be. If you trace much farther, you start initializing things, and you might start causing trouble for your analysis. This is what it looks like when we're at the OEP:

db5


I usually like to take note of the OEP and the last address before the OEP. I like to set a hardware breakpoint on execute at one or both of those. In this case, the OEP isn't mapped to a memory segment until after the program has run some portion of the way through the packer, so I set it on the last address before the OEP, and save the database. That way, if I have trouble with the dump step, I can replay right up to that point without having to manual step it again. That works in this case (Morphine) but not in all cases. Sometimes you have to account for antidebugger tricks along the way.

Now that you're at the OEP, you need to dump the binary in memory. I've used two tools for this, Import Reconstructor (imprec) and LordPE. Before I get into the technical details on each, I should talk about the reason I ended up putting this post together.

I was having some trouble getting a good dump of upnp.exe. Specifically, I had traced it to the OEP as I have described, but I couldn't get imprec to dump it properly. The imports table wouldn't come out right. That's when I went to Jason Geffner for help. Jason is another of these guys who is better at reverse engineering than I am. I met him originally in the class I took from Nicolas Brulez. Jason was taking it too, but he didn't really need it.

Jason wanted told me to just use LordPE. He said that Morphine ended up rebuilding the original PE file in memory, and that LordPE did a perfect dump. He even made a screenshot of what settings I should use:
lordpe

Sure enough, I used LordPE and it dumped perfectly. I'd had good luck with imprec before. Nicolas had shown me the tool in his class. Before that I had been doing raw memory dumps and manually naming offsets. No fun. So I've had a tendency to reach for imprec because I'm used to it.

But, there was no arguing with the fact that LordPE worked for me in this case, and imprec didn't. So part of what I planned to do with this post was to recommend LordPE. So I repeated the steps on my home machine so I could take screenshots and so on. When I got to the step where I was going to show the bad dump made by imprec... I found that it had dumped it perfectly.

Thinking back, I believe why imprec didn't work before was because I had done it on a work machine, which was Windows XP x64. When I tried to use imprec on the 64-bit Windows, I had a problem with some of the imports not being valid. That probably had to do with why it wasn't writing out the import table properly. I had removed the "bad" imports, but it probably just broke the process. I think I was able to use LordPE on the same machine, but now I'm going to have to go back and check.

Which brings me to a general point about tools. If you've got tools you use that did into the guts of a system, then those tools are probably going to quit working when you move to a newer system. This is especially true of tools for which development has ceased (which seems to be the case with imprec.) If it's not being actively maintained, then it will eventually "expire" when the OS moves on. On my home machine, which is regular XP, imprec still works fine. Further, malware and packers tend to account for popular tools by implementing countermeasures. So, if you plan to keep up on reverse engineering, you should also plan to keep looking for the latest and greatest tools.

But in any case, my thanks to Jason for encouraging me to check out LordPE and for fixing my mistake. Back to the techie bits.

I'll skip the imprec demo for now. If you're interested in me spelling out the same steps for imprec, leave me a comment, and I'll write it up. In LordPE, you basically run the tool, find the process you still have paused at the OEP in your debugger, right-click it and select dump full:
lordpe1

In this particular case at least, you now have a good copy of the unpacked executable, and you can load it up in your favorite analysis tool:
unpacked1

If you're curious, the binary is a fairly typical call-home-to-an-IRC-C&C bot.

Notes:
  • Sorry about the pictures, the arrangement isn't ideal. If you click on one, you can drill down a couple of levels and see the full size so you can read it. I'll probably try tweaking the pictures to work a bit better. Any Blogger and/or Zooomr advice is welcome.
  • I realize I've got a weird mix of beginner and advanced topics here. Sorry about that. Again, this is at least partially to remind myself as well. If you liked the post and want me to take the tech level up or down, let me know. It probably won't be hard to talk me into writing about it more.
  • Both Nicolas and Jason teach this topic as a training attached to security conferences. Nicolas teaches it at RECON. I don't know for sure yet if Nicolas will be teaching this year. Jason has taught at Black Hat, but it doesn't look like the Black Hat training schedule for this year has been announced yet either. I'll post an update if I find out anything about either of them teaching again.

4 comments:

Anonymous said...

Nice writeup, Ryan.

I have no experience of reverse engineering whatsoever but your post really made my curious. I've been collecting suspected malware everytime I run into them (links in emails/IMs and so on) in the event of wanting to learn reverse engineering at some point. No need to say but it's growing steadily...

Do you have any sound suggestions on good books/papers/articles where one could start out and move on?

Anonymous said...

Oops... I didn't trim around the LordPE window before sending you the screenshot. Now we're going to start seeing unpacking stubs that check to see whether the desktop background's RGB color is 0x00800000 and if so assume I'm the one analyzing it and crash the program :)

Kartik said...

Nice one Ryan,

Was trying to do some malware analysis and surely your blog helped. Great writeup.

Ryan Russell said...

kartik:

Thanks, glad I could help.