Waas
Waas
Waas
06/23/10 Confidential
Greenplum Database
Software-only solution
Not an appliance No proprietary hardware
SQL MapReduce
...
...
Network Interconnect
Segment Severs
Query processing & data storage
...
...
External Sources
Loading, streaming, etc.
06/23/10 Confidential
Customer Profiles 100+ global enterprise customers 10s of TB to multi-PB 10s of tables to 15,000+ 10s of concurrent users to 150+ Integration w/ existing data analysis ecosystem, e.g., BI tools, ETL/ELT infrastructure Compute intensive, storage intensive Mixed workloads Urban myth: pure DW workloads
06/23/10 Confidential 4
Specific Challenges
Parallel processing
Distributed transactions Distributed DDL Parallel query processing Multi-core aware query optimization
Data management
In-cluster replication DAS = unmanaged storage TBs per box
~10PB today, ~100PB tomorrow
Stress tests
Similar; more components
Performance
Same basic principles; much more data to capture
In-cluster replication
Special hooks into product
Scale testing
True at-scale testing prohibitively expensive
Fault-tolerance
Elaborate external harnesses
06/23/10 Confidential 6
Test challenges: Fault-tolerance Significantly larger test matrix Need to capture detailed system state
Observing distributed systems cannot rely on time stamps
Fault scenarios
Network bisection Node failures Gradually degrading hardware
Test strategies
Random fault injection Network/drive failure simulations Explicit K-safety scenarios
7
06/23/10 Confidential
Higher standards of quality Transactional data management Test methods highly specialized System specific Programming language specific
06/23/10 Confidential 8
The Road Ahead Distributed programming will become pervasive Above test challenges will be ubiquitous Cannot afford to implement these as black-box tests outside of system
Re-implements highly complex logic Too costly to maintain