Be A Binary Rockstar
Be A Binary Rockstar
Be A Binary Rockstar
4
Foundations
5
Why IL?
• Architecture Abstraction
• Smaller number of instructions
6
Easy to lift
• Simple flags calculation
• As close to native instructions as possible
• Typeless - types inferred later
7
Easy to read
• Intuitive to read
• Tree-based infix notation
• No register abstraction
• Flags calculation only when necessary
• Avoid excessive temporaries
8
IL Instruction Set
Size
ze
aly
Ea
an
sie
r to
r to
sie
lift
Ea
Instruction
Set Size
9
The Options
10
Existing Options for IL
• BAP
• VEX
• REIL
• LLVM
• IDA
11
BAP
• Tree-tree based :)
• Flags are explicit and inhibit readability :(
• Written in OCAML :(
12
addr 0x0 @asm ”add %eax,%ebx” t:u32 = REBX:u32
REBX:u32 = REBX:u32 + REAX:u32
RCF:bool = REBX:u32 < t:u32
addr 0x2 @asm ”shl %cl,%ebx”
add ebx, eax t1:u32 = REBX:u32 >> 0x20:u32 − (RECX:u32 &
shl ebx, cl 0x1f:u32)
RCF:bool = ((RECX:u32 & 0x1f:u32) = 0:u32) &
RCF:bool | ̃((RECX:u32 & 0x1f:u32) = 0:u32) &
low:bool(t1:u32)
13
VEX
• Register names are abstracted :(
• Single assignment :(
• Over 1000 instructions! :(
• Yet they call it “RISC-like”
• Even Angr is planning a move away from it
14
t0 = GET:I32(16)
t1 = 0x8:I32
subs R2, R2, #8 t3 = Sub32(t0,t1)
PUT(16) = t3
PUT(68) = 0x59FC8:I32
15
REIL
• Tiny instruction set
• Horrible readability
• Makes abstractions nearly impossible
• Flags are explicit and inhibit readability :(
16
00000000.00 STR R_EAX:32, , V_00:32
00000000.01 STR 0:1, , R_CF:1
00000000.02 AND V_00:32, ff:8, V_01:8
00000000.03 SHR V_01:8, 7:8, V_02:8
00000000.04 SHR V_01:8, 6:8, V_03:8
00000000.05 XOR V_02:8, V_03:8, V_04:8
00000000.06 SHR V_01:8, 5:8, V_05:8
00000000.07 SHR V_01:8, 4:8, V_06:8
00000000.08 XOR V_05:8, V_06:8, V_07:8
00000000.09 XOR V_04:8, V_07:8, V_08:8
00000000.0a SHR V_01:8, 3:8, V_09:8
00000000.0b SHR V_01:8, 2:8, V_10:8
test eax, eax 00000000.0c XOR V_09:8, V_10:8, V_11:8
00000000.0d SHR V_01:8, 1:8, V_12:8
00000000.0e XOR V_12:8, V_01:8, V_13:8
00000000.0f XOR V_11:8, V_13:8, V_14:8
00000000.10 XOR V_08:8, V_14:8, V_15:8
00000000.11 AND V_15:8, 1:1, V_16:1
00000000.12 NOT V_16:1, , R_PF:1
00000000.13 STR 0:1, , R_AF:1
00000000.14 EQ V_00:32, 0:32, R_ZF:1
00000000.15 SHR V_00:32, 1f:32, V_17:32
00000000.16 AND 1:32, V_17:32, V_18:32
00000000.17 EQ 1:32, V_18:32, R_SF:1
00000000.18 STR 0:1, , R_OF:1 17
LLVM
● Easy to analyze and has great tools already
available
● It’s a compiler!
○ Reversers want a decompiler.
○ Cannot be the only goal
18
LLVM Challenges
?
20
Binary Ninja’s Answer
• Binary Ninja Intermediate Language (BNIL)
21
IL Goals & Design
22
Why Another IL?
● Popular existing ILs for compiled binaries are not very
human readable. They are extremely low level and
verbose.
● Existing ILs are single stage. Heavyweight analysis must
be performed to get anywhere close to decompiled output.
● Writing a lifter for a new architecture is usually very time
consuming.
23
Binary Ninja IL
● Create a family of ILs with multiple stages of analysis
● Lowest level is close to assembly
● After analysis and transformations, higher levels are
closer to decompiled output and would be much easier to
translate to good LLVM code
● Analysis involved in each transformation is easy to
understand, fast, and directly aids further analysis
24
IL Design Goals
● Human readable
● Computer understandable (SSA, 3AF, etc.)
● Plugin understandable
● Easy to lift native architectures
● Translation to other ILs such as LLVM
25
Human Readable
26
Low Level IL Example
lea rax, [0x201047] rax = 0x201047
lea rdi, [0x201040] rdi = 0x201040
push rbp push(rbp)
sub rax, rdi rax = rax - rdi
mov rbp, rsp rbp = rsp
cmp rax, 0xe if (rax u> 0xe) then
ja 0x68d 6 @ 0x68d else 8 @ 0x68b
27
Low Level IL Example
addiu $sp, $sp, -0x18 $sp = $sp - 0x18
sw $ra, 0x14($sp) [$sp + 0x14].d = $ra
lw $a0, ($a1) $a0 = [$a1].d
jal atoi call(atoi)
nop $at = $v0 u< 0x20 ? 1 : 0
sltiu $at, $v0, 0x20 if ($at == 0) then
beqz $at, 0x4002d8 7 @ 0x4002d8 else
nop 12 @ 0x400290
● Multiple IL forms
● Pick the right IL for the task at hand
29
IL Forms
Lifted IL Low Level IL SSA /
ASM -> IL Flags use resolved 3AF
31
Easy to Lift
● Expression tree
● Designed for quick, modular lifter
implementations
● Semantic flags eases the burden of describing
flag effects during lifting
32
Semantic Flags
● Architecture plugins define the set of flags and
their semantic roles
● Instructions can define a set of flags they write
● Data flow analysis is performed to link flag
uses to flag writes
33
Semantic Flags
● In most compiled code, flags are resolved to
simple comparison expressions with no effort
from the architecture plugin
● Special cases fall back to emitting concrete
flag write expressions
34
Semantic Flags Example
“Writes to all ALU flags” Folded expression
describing use of flags
sub.q{*}(rax, 0xe)
if (rax u> 0xe) then
if (u>) then
… else …
… else …
36
Low Level IL to Medium Level IL
● Low Level IL is translated to SSA form
● Use implicit data flow from SSA to resolve stack layout
● Data flow based stack layout resolution avoids problems
with nonstandard frame pointer behavior
● Translate loads and stores on stack to stack variable uses
and assignments
37
Medium Level IL Example
push(ebp)
ebp = esp
var_4 = ebp
esp = esp - 0x18
eax = arg_4
eax = [ebp + 8].d
var_1c = eax
[esp].d = eax
free(var_1c)
call(free)
ebp = var_4
esp = ebp
return
ebp = pop
<return> jump(pop)
Medium Level IL
38
Medium Level IL
● Registers and stack usage are now both treated as
variables
● Stack variables no longer use explicit memory access
● Translate to SSA form to obtain implicit data flow on both
registers and stack variables
● Type propagation is performed on SSA form
39
Using Medium Level IL - Jump Tables
40
Using Medium Level IL - Jump Tables
● Jump table resolution based on path-sensitive data flow
● SSA conversion process also tracks control flow
dependence for every block
● Data flow computations allow disjoint sets of possible
values
● Reads from memory are simulated
● At jump site, possible values are the possible jump targets
41
Jump Table Example
x8#1 = zx.q(x0#2.d)
if (x0#2.d u> 0x1f)
then … else …
45
Jump Table Example
x8#1 = zx.q(x0#2.d) Branch condition
if (x0#2.d u> 0x1f) must be false to
then … else … reach jump site
46
Jump Table Example
When false, we
x8#1 = zx.q(x0#2.d) know that x0#2.d
if (x0#2.d u> 0x1f) is between 0 and
then … else … 0x1f inclusive
47
Jump Table Example
Resolve forward
x8#1 = zx.q(x0#2.d) to obtain possible
if (x0#2.d u> 0x1f) jump targets
then … else …
49
Leveraging the Jump Table Algorithm
● Single jump table algorithm works on all architectures with
no additional effort from architecture plugin
● Control flow dependence information accessible from API
● Queries for set of possible values accessible from API
50
The Final Forms
● Medium Level IL has the type information, stack
knowledge, and SSA form to translate easily into LLVM IR
● Further analysis can be performed to translate to High
Level IL, the Binary Ninja IL that will be used to create its
decompiler
● All aspects of every IL form are plugin accessible, so
translating to other representations is straightforward
51
Binary Ninja for Profit
52
● Python, C and C++ API’s
Binja API
(headless)
● api.binary.ninja/search.html
53
binja_memcpy.py: IL
/bin/bash
54
binja_memcpy.py: IL
/bin/bash
55
binja_memcpy.py: API
56
binja_memcpy.py: API
57
binja_memcpy.py: API
58
binja_memcpy.py: API
59
binja_memcpy.py: Output
60
SSA: Uninitialized variable
for func in bv.functions:
for block in func.medium_level_il.ssa_form:
for instr in block:
visit_instr(instr)
61
SSA: Uninitialized variable
def visit_instr(inst):
# Read of variable
if inst.operation == MLIL_VAR_SSA:
# Not written
if inst.index == 0:
if inst.src.type == StackVariableSourceType:
# Local variables
if inst.src.identifier < 0:
print
("Uninitialized stack variable reference at " + hex(inst.address))
62
SSA: Uninitialized variable
else:
for op in instr.operands:
if isinstance(op, MediumLevelILInstruction):
visit_instr(op)
63
Symbolic Execution
● Very accurate
● Galois
Connection
formalizes
Concrete
<-> Abstract
65
Abstract Interpretation
● X ‘s value is imprecise
int x; ● Compilers perform imprecise
int[] a = new abstraction
int[10]; 1. Add precision - i.e. declare
a[2 * x] = 3; 1. abstract value [0, 9]
2. Symbolically execute with
abstract domain/ values
● Requires control-flow analysis
66
Abstract Domains & Sign Analysis
int a,b,c; ● Map variables to an
a = 42; abstract value
b = 87;
if (input) {
c = a + b;
} else {
c = a - b;
} 67
Abstract Domains & Sign Analysis
● Binary Ninja plugin
● Under approximate
● Path sensitive - construct lattices of
abstract values
● One abstract state per CFG node
● Avoid loss in precision for fractions.
68
Demo!
● Analyze
example
program
● PHP
CVE-2016-6289
69
UAF Analysis:
PointsTo for Binja IL
● Before: Allocation -> Write
● UAF Analysis: Allocation -> Free -> Use
● Key Idea: Data flow graph, assignments, copies, dereferences,
and frees of pointers
● Context and path sensitive (path API == soon!).
blog.trailofbits.com/2016/03/09/the-problem-with-dynamic-program-analysis/
70
Devirtualizing C++
● VTable Function Call
● Example: mov eax, [ecx]; call [eax + 4]
https://blog.trailofbits.com/2017/02/13/devirtualizing-c-with-binary-ninja/ 71
Devirtualizing C++
72
Devirtualizing C++
73
Playing with Scripts!
● memcpy, headless python
API script
https://github.com/
● Depth-first-search, path
sensitive CFG template trailofbits/binjascripts
● Sign analysis, abstract domain
plugin, CFG traversal script
● And much much more ….
74
Conclusion: Resources
● binary.ninja/
● Abstract Interpretation talk:
santos.cs.ksu.edu/schmidt/Escuela03/WSSA/talk1p.pdf
● Static Program Analysis Book!
cs.au.dk/~amoeller/spa/spa.pdf
75
Conclusion: Binary Ninja
76
Contact Us
Sophia d’Antoine
- IRC/Slack: @quend
- sophia@trailofbits.com
Peter LaFosse
Rusty Wagner
- https://binaryninjaslack.herokuapp.com
- binaryninja@vector35.com
77