Nothing Special   »   [go: up one dir, main page]

Be A Binary Rockstar

Download as pdf or txt
Download as pdf or txt
You are on page 1of 77

Be a Binary Rockst r

An Introduction to Program Analysis with


Binary Ninja
Agenda
● Motivation
● Current State of Program Analysis
● Design Goals of Binja Program
Analysis
● Building Tools
2
Motivation
3
● Tooling - Concrete -> Symbolic
○ Increase speed & effectiveness of RE / VR
● Make Program Analysis more accessible &
useful

4
Foundations

• Need to understand code semantics


• Could be done directly on the assembly
• An Intermediate Language (IL) is needed

5
Why IL?
• Architecture Abstraction
• Smaller number of instructions

6
Easy to lift
• Simple flags calculation
• As close to native instructions as possible
• Typeless - types inferred later

7
Easy to read
• Intuitive to read
• Tree-based infix notation
• No register abstraction
• Flags calculation only when necessary
• Avoid excessive temporaries

8
IL Instruction Set
Size

ze
aly

Ea
an

sie
r to

r to
sie

lift
Ea

Instruction
Set Size

9
The Options
10
Existing Options for IL
• BAP
• VEX
• REIL
• LLVM
• IDA

11
BAP
• Tree-tree based :)
• Flags are explicit and inhibit readability :(
• Written in OCAML :(

12
addr 0x0 @asm ”add %eax,%ebx” t:u32 = REBX:u32
REBX:u32 = REBX:u32 + REAX:u32
RCF:bool = REBX:u32 < t:u32
addr 0x2 @asm ”shl %cl,%ebx”
add ebx, eax t1:u32 = REBX:u32 >> 0x20:u32 − (RECX:u32 &
shl ebx, cl 0x1f:u32)
RCF:bool = ((RECX:u32 & 0x1f:u32) = 0:u32) &
RCF:bool | ̃((RECX:u32 & 0x1f:u32) = 0:u32) &
low:bool(t1:u32)

13
VEX
• Register names are abstracted :(
• Single assignment :(
• Over 1000 instructions! :(
• Yet they call it “RISC-like”
• Even Angr is planning a move away from it

14
t0 = GET:I32(16)
t1 = 0x8:I32
subs R2, R2, #8 t3 = Sub32(t0,t1)
PUT(16) = t3
PUT(68) = 0x59FC8:I32

15
REIL
• Tiny instruction set
• Horrible readability
• Makes abstractions nearly impossible
• Flags are explicit and inhibit readability :(

16
00000000.00 STR R_EAX:32, , V_00:32
00000000.01 STR 0:1, , R_CF:1
00000000.02 AND V_00:32, ff:8, V_01:8
00000000.03 SHR V_01:8, 7:8, V_02:8
00000000.04 SHR V_01:8, 6:8, V_03:8
00000000.05 XOR V_02:8, V_03:8, V_04:8
00000000.06 SHR V_01:8, 5:8, V_05:8
00000000.07 SHR V_01:8, 4:8, V_06:8
00000000.08 XOR V_05:8, V_06:8, V_07:8
00000000.09 XOR V_04:8, V_07:8, V_08:8
00000000.0a SHR V_01:8, 3:8, V_09:8
00000000.0b SHR V_01:8, 2:8, V_10:8
test eax, eax 00000000.0c XOR V_09:8, V_10:8, V_11:8
00000000.0d SHR V_01:8, 1:8, V_12:8
00000000.0e XOR V_12:8, V_01:8, V_13:8
00000000.0f XOR V_11:8, V_13:8, V_14:8
00000000.10 XOR V_08:8, V_14:8, V_15:8
00000000.11 AND V_15:8, 1:1, V_16:1
00000000.12 NOT V_16:1, , R_PF:1
00000000.13 STR 0:1, , R_AF:1
00000000.14 EQ V_00:32, 0:32, R_ZF:1
00000000.15 SHR V_00:32, 1f:32, V_17:32
00000000.16 AND 1:32, V_17:32, V_18:32
00000000.17 EQ 1:32, V_18:32, R_SF:1
00000000.18 STR 0:1, , R_OF:1 17
LLVM
● Easy to analyze and has great tools already
available
● It’s a compiler!
○ Reversers want a decompiler.
○ Cannot be the only goal

18
LLVM Challenges

● Hard to lift well from compiled binaries


○ Designed for compiler output
● Expects type information in the instructions
● SSA form - assembly is not
● Stack in assembly looks like a structure, but
structures lose many advantages of SSA
19
IDA

?
20
Binary Ninja’s Answer
• Binary Ninja Intermediate Language (BNIL)

21
IL Goals & Design
22
Why Another IL?
● Popular existing ILs for compiled binaries are not very
human readable. They are extremely low level and
verbose.
● Existing ILs are single stage. Heavyweight analysis must
be performed to get anywhere close to decompiled output.
● Writing a lifter for a new architecture is usually very time
consuming.

23
Binary Ninja IL
● Create a family of ILs with multiple stages of analysis
● Lowest level is close to assembly
● After analysis and transformations, higher levels are
closer to decompiled output and would be much easier to
translate to good LLVM code
● Analysis involved in each transformation is easy to
understand, fast, and directly aids further analysis

24
IL Design Goals
● Human readable
● Computer understandable (SSA, 3AF, etc.)
● Plugin understandable
● Easy to lift native architectures
● Translation to other ILs such as LLVM

25
Human Readable

● Reads like pseudocode, even in lowest level


form
● Flags are resolved into readable expressions

26
Low Level IL Example
lea rax, [0x201047] rax = 0x201047
lea rdi, [0x201040] rdi = 0x201040
push rbp push(rbp)
sub rax, rdi rax = rax - rdi
mov rbp, rsp rbp = rsp
cmp rax, 0xe if (rax u> 0xe) then
ja 0x68d 6 @ 0x68d else 8 @ 0x68b

x86-64 Assembly Low Level IL

27
Low Level IL Example
addiu $sp, $sp, -0x18 $sp = $sp - 0x18
sw $ra, 0x14($sp) [$sp + 0x14].d = $ra
lw $a0, ($a1) $a0 = [$a1].d
jal atoi call(atoi)
nop $at = $v0 u< 0x20 ? 1 : 0
sltiu $at, $v0, 0x20 if ($at == 0) then
beqz $at, 0x4002d8 7 @ 0x4002d8 else
nop 12 @ 0x400290

MIPS Assembly Low Level IL


28
Computer Understandable

● Multiple IL forms
● Pick the right IL for the task at hand

29
IL Forms
Lifted IL Low Level IL SSA /
ASM -> IL Flags use resolved 3AF

High Level IL Medium Level IL


Calls in high level form Stack usage resolved SSA /
3AF
Expression folding Type propagation
Like decompiled output
30
Plugin Understandable

● All IL forms directly accessible from API


● Analysis performed on IL also accessible by
API

31
Easy to Lift

● Expression tree
● Designed for quick, modular lifter
implementations
● Semantic flags eases the burden of describing
flag effects during lifting

32
Semantic Flags
● Architecture plugins define the set of flags and
their semantic roles
● Instructions can define a set of flags they write
● Data flow analysis is performed to link flag
uses to flag writes

33
Semantic Flags
● In most compiled code, flags are resolved to
simple comparison expressions with no effort
from the architecture plugin
● Special cases fall back to emitting concrete
flag write expressions

34
Semantic Flags Example
“Writes to all ALU flags” Folded expression
describing use of flags
sub.q{*}(rax, 0xe)
if (rax u> 0xe) then
if (u>) then
… else …
… else …

“Flag state representing


unsigned greater than”
35
Translating Upwards

● Semantic flags analysis gives Low Level IL with flag


usage fully resolved
● Stack is represented as memory accesses, so data flow
can be difficult to compute on stack variables in Low Level
IL
● Need to analyze and translate to Medium Level IL

36
Low Level IL to Medium Level IL
● Low Level IL is translated to SSA form
● Use implicit data flow from SSA to resolve stack layout
● Data flow based stack layout resolution avoids problems
with nonstandard frame pointer behavior
● Translate loads and stores on stack to stack variable uses
and assignments

37
Medium Level IL Example
push(ebp)
ebp = esp
var_4 = ebp
esp = esp - 0x18
eax = arg_4
eax = [ebp + 8].d
var_1c = eax
[esp].d = eax
free(var_1c)
call(free)
ebp = var_4
esp = ebp
return
ebp = pop
<return> jump(pop)
Medium Level IL
38
Medium Level IL
● Registers and stack usage are now both treated as
variables
● Stack variables no longer use explicit memory access
● Translate to SSA form to obtain implicit data flow on both
registers and stack variables
● Type propagation is performed on SSA form

39
Using Medium Level IL - Jump Tables

40
Using Medium Level IL - Jump Tables
● Jump table resolution based on path-sensitive data flow
● SSA conversion process also tracks control flow
dependence for every block
● Data flow computations allow disjoint sets of possible
values
● Reads from memory are simulated
● At jump site, possible values are the possible jump targets

41
Jump Table Example
x8#1 = zx.q(x0#2.d)
if (x0#2.d u> 0x1f)
then … else …

… x8#2 = sx.q([table + (x8#1 << 2)].d)


x8#3 = x8#2 + table
jump(x8#3)
Medium Level IL
Solve for this to get jump targets SSA Form
42
Jump Table Example
x8#1 = zx.q(x0#2.d)
if (x0#2.d u> 0x1f)
then … else …

… x8#2 = sx.q([table + (x8#1 << 2)].d)


x8#3 = x8#2 + table
jump(x8#3)

Track flow backwards with SSA to find definitions


43
Jump Table Example
x8#1 = zx.q(x0#2.d)
if (x0#2.d u> 0x1f)
then … else …

… x8#2 = sx.q([table + (x8#1 << 2)].d)


x8#3 = x8#2 + table
jump(x8#3)

Memory read depends on value of x8#1


44
Jump Table Example
x8#1 = zx.q(x0#2.d) Value used in
if (x0#2.d u> 0x1f) branch
then … else … comparison

… x8#2 = sx.q([table + (x8#1 << 2)].d)


x8#3 = x8#2 + table
jump(x8#3)

45
Jump Table Example
x8#1 = zx.q(x0#2.d) Branch condition
if (x0#2.d u> 0x1f) must be false to
then … else … reach jump site

… x8#2 = sx.q([table + (x8#1 << 2)].d)


x8#3 = x8#2 + table
jump(x8#3)

46
Jump Table Example
When false, we
x8#1 = zx.q(x0#2.d) know that x0#2.d
if (x0#2.d u> 0x1f) is between 0 and
then … else … 0x1f inclusive

… x8#2 = sx.q([table + (x8#1 << 2)].d)


x8#3 = x8#2 + table
jump(x8#3)

47
Jump Table Example
Resolve forward
x8#1 = zx.q(x0#2.d) to obtain possible
if (x0#2.d u> 0x1f) jump targets
then … else …

… x8#2 = sx.q([table + (x8#1 << 2)].d)


x8#3 = x8#2 + table
jump(x8#3)

Set of possible values here are the jump targets 48


Using Medium Level IL - Jump Tables
● More complex idioms need to combine multiple sources of
information
● Value through SSA ϕ-functions is the set union of the
inputs
● Value of a specific SSA variable is the set intersection of
the information found in the definition and all uses of the
variable

49
Leveraging the Jump Table Algorithm
● Single jump table algorithm works on all architectures with
no additional effort from architecture plugin
● Control flow dependence information accessible from API
● Queries for set of possible values accessible from API

50
The Final Forms
● Medium Level IL has the type information, stack
knowledge, and SSA form to translate easily into LLVM IR
● Further analysis can be performed to translate to High
Level IL, the Binary Ninja IL that will be used to create its
decompiler
● All aspects of every IL form are plugin accessible, so
translating to other representations is straightforward

51
Binary Ninja for Profit
52
● Python, C and C++ API’s
Binja API
(headless)

● Branches: Basic block/ Function edges


(incoming & outgoing)

● Get the register states, some naive range


analysis

● api.binary.ninja/search.html
53
binja_memcpy.py: IL
/bin/bash

54
binja_memcpy.py: IL
/bin/bash

55
binja_memcpy.py: API

56
binja_memcpy.py: API

57
binja_memcpy.py: API

58
binja_memcpy.py: API

59
binja_memcpy.py: Output

60
SSA: Uninitialized variable
for func in bv.functions:
for block in func.medium_level_il.ssa_form:
for instr in block:
visit_instr(instr)

61
SSA: Uninitialized variable
def visit_instr(inst):
# Read of variable
if inst.operation == MLIL_VAR_SSA:
# Not written
if inst.index == 0:
if inst.src.type == StackVariableSourceType:
# Local variables
if inst.src.identifier < 0:
print
("Uninitialized stack variable reference at " + hex(inst.address))

62
SSA: Uninitialized variable
else:
for op in instr.operands:
if isinstance(op, MediumLevelILInstruction):
visit_instr(op)

63
Symbolic Execution
● Very accurate

● Takes time, data, and memory, often not feasible

● IDEA! Reasoning only about what we care about.

● Apply complex data to abstract domains !

● Domains: type, sign, range, color etc….


64
Abstract Interpretation ● Sets of
concrete values
are abstracted
imprecisely

● Galois
Connection
formalizes
Concrete
<-> Abstract

65
Abstract Interpretation
● X ‘s value is imprecise
int x; ● Compilers perform imprecise
int[] a = new abstraction
int[10]; 1. Add precision - i.e. declare
a[2 * x] = 3; 1. abstract value [0, 9]
2. Symbolically execute with
abstract domain/ values
● Requires control-flow analysis
66
Abstract Domains & Sign Analysis
int a,b,c; ● Map variables to an
a = 42; abstract value
b = 87;
if (input) {
c = a + b;
} else {
c = a - b;
} 67
Abstract Domains & Sign Analysis
● Binary Ninja plugin

● Under approximate
● Path sensitive - construct lattices of
abstract values
● One abstract state per CFG node
● Avoid loss in precision for fractions.
68
Demo!
● Analyze
example
program

● PHP
CVE-2016-6289

69
UAF Analysis:
PointsTo for Binja IL
● Before: Allocation -> Write
● UAF Analysis: Allocation -> Free -> Use
● Key Idea: Data flow graph, assignments, copies, dereferences,
and frees of pointers
● Context and path sensitive (path API == soon!).
blog.trailofbits.com/2016/03/09/the-problem-with-dynamic-program-analysis/
70
Devirtualizing C++
● VTable Function Call
● Example: mov eax, [ecx]; call [eax + 4]

https://blog.trailofbits.com/2017/02/13/devirtualizing-c-with-binary-ninja/ 71
Devirtualizing C++

72
Devirtualizing C++

73
Playing with Scripts!
● memcpy, headless python
API script
https://github.com/
● Depth-first-search, path
sensitive CFG template trailofbits/binjascripts
● Sign analysis, abstract domain
plugin, CFG traversal script
● And much much more ….
74
Conclusion: Resources
● binary.ninja/
● Abstract Interpretation talk:
santos.cs.ksu.edu/schmidt/Escuela03/WSSA/talk1p.pdf
● Static Program Analysis Book!
cs.au.dk/~amoeller/spa/spa.pdf

75
Conclusion: Binary Ninja

76
Contact Us
Sophia d’Antoine
- IRC/Slack: @quend
- sophia@trailofbits.com

Peter LaFosse
Rusty Wagner
- https://binaryninjaslack.herokuapp.com
- binaryninja@vector35.com
77

You might also like