NIST - Towards A Reference Architecture For BIG DATA
NIST - Towards A Reference Architecture For BIG DATA
NIST - Towards A Reference Architecture For BIG DATA
Updated: 07.2013
3
BIG DATA General Reference Architecture:
Big Data Simple Definition
Approach:
Take Advantage Of Many Available Data Sources To Expose Hidden
Knowledge Lost In Traditional Data Processing
How:
Employing Social Media, Text Processing, Natural Language
Processing.. Flexible/Dynamic Database Schemes
While:
Often Bypassing Tradition Tools, Policy And Processes Accelerating
Results
5
BIG DATA General Reference Architecture:
Comprehensive Capabilities Taxonomy
Transforms “Other” Capabilities Formats To A Common Reference Architecture Consumable
General Systems Capabilities
Account Management And Monitoring
User Administration And Monitoring
Security Nearly 500 Detailed Capabilities/Functions Defined
Federation (Models) Management And Monitoring
Configuration (Models) Management And Monitoring
About 25% - 30% Complete
Deployment (Models) Management And Monitoring
Availability – Metrics And Qualitative Levels (Experimental, Commercial, Mission Critical, Life Critical)
Procurement Compliance Management And Monitoring?
Maintenance & Diagnostics Management And Monitoring
License Management And Monitoring
Data Management And Monitoring
Supported Ingest Formats
Supported Output Formats
Note: Some Capabilities Are Functionally Cross-Cutting
Supported Devices
Supported Interfaces
RA and Standards Compliance
Performance (Models) Management, Monitoring, Metrics And Qualitative Levels
User Support Capabilities- Education, Help Management And Monitoring
Vendor Support Capabilities - Maintenance Management And Monitoring
System Specific Capabilities
Data Characterizations (Dynamics, Types of Change, Rate Of Change, Confidence, Quality, Demand)
Workload Management And Monitoring
Infrastructure Management And Monitoring (Compute Management, Storage Management, Network Management)
Updated: 07.2013
7
BIG DATA General Reference Architecture:
Application Profile Landscape
BIG DATA Applications Have Widely Differing Operating Needs
Role Applications Characterization
Updated: 07.2013
9
BIG DATA General Reference Architecture:
General Reference Architecture Views
Eco-System Resource Flows
Aligns Market Drivers With Solutions Definition of operational concepts
And Participants Applying a local context to a capability
Allocation of activities to resources
Capability
Identifies and Aligns System Abilities Deployment
Facilitate Alignment To Requirements Identifies Approaches And Options
Surrounding Solution Topology
Technical
Identifies and Aligns Technical Areas Security
Defines Areas of Technical Aligns Security Approaches And
Responsibilities Features With Other RA Models
Defines Interface Surfaces
Technology Agnostic May Consider Other Reference Types
Data Processing Order Agnostic and Topic Areas
RA of Adopted RAs
Processes
Life Cycles
10
BIG DATA General Reference Architecture:
Ecosystem Viewpoint
Individual Data Transfer
Data Sources
Data Objects Big Data Transfer
Management
Conditioning
Security
Aggregation Aggregation
PII
Matching
Pseudo-
anonymized
Data Mining Anonymized
Data Usage
Network Operators / Telecom Industries / Businesses Government (incl. health & financial institutions) Academia
Design, Develop,
Real-time Interactive Batch and Deploy Tools
Analytics Analytics Analytics
and Interfaces and Interfaces and Interfaces
Security
High Process
Performance Operational Analytics Management
Operational Databases Database
Databases
Data Resource
Management
Visualization Devices
Security
Data Reports
Web Services
Reports
Web
Processing
Reports
pdfs
File Shares BIG
DATA
Web
Traditional Data Processing Reports
Reports
pdfs
Web APIs
External Web Pages
Semi-Structured
Data Sources
Social Media
Forums, Blogs, Twitter
20
BIG DATA General Reference Architecture:
BIG DATA High Level Operational Resource Flow (OV-2)
Ingest Data Data
Sensors
Specialist Scientist Analyst
Data Entry
Imaging
Telemetry
Ingest
Processing
Highly
Structured
Query API Ingest
War Fighters
Highly Processing Reports
Structured Reports Web
Data Web Services
Highly and
Reports
Semi Data Analytics
Data Analytics pdfs
Structured Visualization
Visualization
Data
21
BIG DATA Common Reference Architecture:
e.g. Reference Architecture Mapped to Accumulo/Hadoop
#1 #2 #3 #4
Data Design Data Ingest Analytics Utilization
Data Data
Data Sources Specialist Analyst Consumer
Scientist
Legacy Data
Visualization Devices
Ingest Planning
Widgets / Apps
Ingest
Plans
Data Visualization
Query Tool
MapReduce (Hive, Pig, . . .)
Data Ingest
Enriched relationships
generated using Data
Data Sources ingested into MapReduce analytics Results
Data Queries
Accumulo as NuWave Tables and stored in Accumulo
Data
Models
Catalog Accumulo
Analytic Applications
Updated: Added Slide,11/2012 Accumulo/Hadoop Attribution: “Big Data from a DoD Perspective 0.2”
Accumulo
Storage Queries
Open Source
Integrated Storage Virtualization Standard Infrastructure Virtualization Standard
Management Tools CDMI OCCI/PAAS
Management
Standards
OCCI/CDMI/CIM Data Storage Infrastructure Data Processing Infrastructure
Updated: 07.2013
24
BIG DATA General Reference Architecture:
BIG DATA Commercial Enterprise Key Capabilities (should have wish list)
3. RESTful Cloud Object Management Interface Specification (to drive other new interface specifications)
4. Common Catalog Interface Specification – Searchable Capabilities, Services, Applications, Information, Data (profiles)
5. RESTful URI Search/Query Interface (CDR work?) (reduce dev/ops costs, increase deployment options)
6. Data Virtualization Interface Specification (reduce dev/ops costs, increase deployment options)
7. Infrastructure Management Harmonization Interface Spec. (reduce mgt costs, policy based, autonomic data center mgt)
8. Cloud PAAS/SAAS Management Interface Specification (for workload mgt, improved security)
10. Natural Language Query Specification (extend info harvesting to imaging/video, integrated redaction)
Updated: 07.2013
Core Capability
Institutional Government SDOs Consultancies Economy
Suppliers
Defense &
Intermediaries Funding Sources SSOs Open Source Market & Media
Intel Communities
Chief Executive
Traditional Executive Portfolio Officer
Chief
Chief Legal
Administration
Counsel
Officer
New Additions
Response to New Opportunities and Concerns In Shifting Business and Social Landscape
Note: Procurement May Fall Under Either CFO, COO, CAO, CEO Responsibilities
Network
Data Scientists Ingest Specialists
Engineers
Storage Report
Statisticians
Engineers Specialists
Systems Availability
Data Analysts
Administrators Specialists
Database
Ops Applications
Administrators Development
(DevOps)
07.2013 Aggregate Several Presentations (broke Gary Mazzaferro NIST Big Data Initiative
storyboard)
07.2013 Added
07.2013 Re-orged and Excerpted Slides Robert Marcus Align With NIST Big Data WGs
07.2013 Added “Ecosystem To Capabilities Slides” And Gary Mazzaferro Align With NIST Big Data RA WG
“Capabilities To Technical Viewpoint Slides”
Cleaned up Typos and Formatting
Added Stakeholder Slides
36