DBI framework for fuzzing on the board, part I.

I started a bit researching around fuzzers, fuzzing techniques and practices. As i study materials about fuzzing, code (node / edge) coverage approach quickly impressed me. But for this method is essential to have a good dbi. Pin or valgrind are good solutions, but will i try to make it in lightweight way – specificated for further fuzzing needs.

Already implemented features :

  • BTF – Hypervisor based
  • PageTable walker
  • VAD walker
  • full Process control [images, threads, memory]
  • Syscall monitoring – implemented process virtual memory monitor

FEATURES

  • BTF – Hypervisor based

Branch tracing is known method already implemented in some of tracers, but known (*known for me) methods implemented it just under debugger. When it comes to play with binary code (tracing, unpacking, monitoring) – i dont like simulated enviroment like debugger, because it is too slow… it can be little crappy to set up BTF in msr, in ring0, wait for exception processing and for finaly executing your exception handler to handle branch tracing, and for keeping track to set up BTF in msr == switch to ring0 again! this seems like a solid perfomance overkill.

But in previous post i mentioned possibility how to use intel vtx technology to gain some advantages for reasonable performance penalty. After a bit playing with documentation and some debuging, i come to easy way how to extend hypervisor, that was introduced, with handling TRAPS and keep eye on BTF!

In other words when Trap flag is set then each trap exception will cause VM_EXIT. So tracing on branches will be handled not by system exception handling but by our monitor! and with perfomance penalty == our processing BTF and VM_EXIT cost!

  • PageTable walker

For effective fuzzing it is necessary to fuzz application from particular state (or you can just kill perfomance for re-running application per fuzz test case), and with this is related saving context -> memory address space, thread context (registers / stack). It is no such big deal just enumerate memory address space, save context and you have it .. but it need a lot of memory resources and it is time consuming as well …

For handling this step, i propose protecting all (not write-copy, and exluding stacks) memory as non-writeable, monitoring acces to write and saving affected memory (by custom PAGE_SIZE granularity – not saving just affected bytes alone => lookup & copy performance). But doing it by additional VirtualProtect dont bring time effective results…

So after some educating how exactly PageTable looks like, and some googling how to handle it i create PoC of pykd – script :

And wondering how it is easy, implemented c++ equivalent :

which save my day against perfomance kill by virtual protect API

Handling memory write attempts from app to protected memory is done via hook on PageFault, in which is memory temporary updated with original protection mask.

  • VAD walker

But it have some issues! .. first of all, i will try to disable write by unset this flag in PTE by address when memory is allocated, buut … in this moment is PTE(addr).Valid == 0 … magic is, that for performance reason m$ will not create PTE per allocation request, but instead of this by first access (== pagefault) to this memory.

It can be overcomed to handling it after PTE will be craeted for given memory range, but more simplier option comes here. How m$ code know flags of memory, and even so, is that memory even allocated and so for sure access should be granted ? answer is VAD ! Some interesting reading can be found at Windows Internals, 6th edition [Chapter 10  Memory Management.

So lets go update VAD instead of PTE per alloc . PTE should in exchange unlock (write enable) memory in PageFault caused by application attempt to writting to its own memory – but also get callback to us that particular bytes are likely to change.

VAD can be found at EPROCESS structure, and i am not proud of it, but it needs some system dependent constants (to avoid rebuild whole project, when it should be shipped on another version of windows, will be mentioned some TODO at the end of blog). And also great source of internal knowledge of m$ code (excluding ntoskrnl binary itself ) is reactos project.

From now it is easy to handle it :

  • VAD is AVL-tree structured
  • ptr to VAD is stored in EPROCESS
  • lock address space is necessary

* With VAD walker is also easy to enumerate whole process address space *

  • full Process control [images, threads, memory]

Under debugger you get events about everything, but if you set up correctly in your on-the-fly (>debuger free) monitor you can get same results as a callbacks – which ensure speed up of whole processing

  • Syscall monitoring – implemented process virtual memory monitor

“System calls provide an essential interface between a process and the operating system.” – and so it is nice point to get hook, and monitor process. Now it is just implemented virtual memory monitor to keep eye on memory address space – protection of memory pages

PROBLEMS :

  • SysCall hook => PatchGuard
  • PageFaul hook => PatchGuard
  • VAD walker => windows version dependent!
  • ring3 – ring0, ring3 – vmm communication => performance

SOLUTIONS [implemented just partlialy] :

  • PatchGuard => VMM
      • SysCall protect MSR via VMX_EXIT_RDMSR [implemented]
      • PageFault protection via DRx – VMX_EXIT_DRX_MOVE
        • ?? ->
        • we can easly protect hook via DRx at IDT[page_fault] pointer
        • to avoid this PatchGuard needs to clear dr7
        • in VMM we trap dr acces and fool / terminate PatchGuard thread
  • windows version dependent constants => constants should be provided be user app using this framework. This constants can be obtained manualy from windbg, ida, by windbg + script – pykd, or by playing with SymLoadModuleEx
  • communication with ring0 and vmm parts of dbi => implement own fast calls [not properly implemented yet]
    • ring3-ring0 => SYSENTER (mov eax, VMM_FASTCALL_R0)
    • ring3-vmm => CPUID (mov eax, VMM_FASTCALL_VMM)

Idea is implement this dbi tool for fuzzing as a module, which can be fully used from python (or other -c++, ruby …), and has fastcall access to dbi modules.

FEATURES – present + TODO :

  • implement all needed callbakcs :
    • branch tracing -or single step
    • memory access
    • exception occured
    • process / thread termination
    • syscalls
  • accessible all needed info :
    • loaded images
    • enumerate whole process address space
    • per thread information – stack, context …
    • child processes
  • full process control
    • save  + restart state  [ memory + context ]
    • stop / pause / resume threads
    • deny / allow process creation
    • alter process context / memory
    • alter process control flow

So idea looks like :

For now i have just implemented PoC on crappy ring3 app. Windows 8, dbi driver x64, app x86.

.. as demo was written this concept, which at alloc set unwritable allocated memory, and at first access it ignore it – exception handling in try blog is invoked, but at second acces is access granted by setting PTE(address).Write = 1 in PageFault

and some DbgPrint by monitoring it follows :

00000000BADF00D0 is marker of BTF, before is printed source instruction (which changed control flow), and after follow destination address (current rip). @VirtualMemoryCallback  + @Prologue  + @Epilogue is implementation of current state of SYSCALL + PageFault cooperating to handle memory writes – used PTE and VAD.

.. so first step is done! -> PoC of monitor.

Next part will be implementing callbacks and introduce communication with concept of python based fuzzer to demonstrate control over fuzzed process.

[ SRC’s available on github feel free to mail me]
Leave a comment

1 Comments.

  1. And also simplier sollution exist on older windows – f.e. winxp. In current using of VMM it is easy to omit VMM :

    I. fast call can be replaced (also speed up!) by using callgates -> “call far fword ” (http://www.zer0mem.sk/?p=34 – ring3-ring0 callgate)

    II. no patchguard, so IDT hooks and SYSCALL hook can leave unprotected

    —–

    but with VMM we have total control above system, and more over ring3 framework can be some kind of model to ring0 dbi fuzz framework which is even more promissing

Leave a Reply


[ Ctrl + Enter ]


Go To Top
Follow

Get every new post delivered to your Inbox

Join other followers: