Nothing Special   »   [go: up one dir, main page]

Link Search Menu Expand Document
Start for Free

Batch-Based Accelerated Query Engine

This page contains contains information on the Batch-Based Accelerated Query Engine.

Page Contents
  1. Introduction
  2. Support
  3. Example Query
  4. Analyzing BARQ engine performance
    1. Disabling the BARQ engine

Introduction

BARQ - Batch-Based Accelerated Query Engine - is the new batch-based query execution engine for Stardog, which is now the default engine since version 10.2.

Stardog’s previous query engine was designed around a sophisticated optimizer, which uses advanced selectivity statistics to minimize disk IO. Execution is row or tuple based (i.e. the Volcano model) that works really well for queries with selective patterns. It works less well for analytical CPU bound queries, especially for large joins the traditional tuple-at-a-time model will not have a high throughput.

BARQ is inspired by systems like MonetDB (later rebranded as VectorWise and then Actian) and more recently Velox, in which executable operators operate on and generate a batch of tuples at a time. This enables much higher throughput on CPU-bound query workloads.

BARQ is integrated into the query engine pipeline. Once it is enabled the query translator can pick the correct execution model. Queries that use supported operators may use BARQ automatically.

Support

Support of BARQ operators is quite comprehensive. Plans can be executed in a hybrid way, in case batch operators are not supported. This means Stardog may switch between the batch and non batch-based query executors on the fly.

As of Stardog 10.2, BARQ supports most key SPARQL query operators: hash- and merge-joins, filters, simple aggregation, anti-joins (MINUS), and distinct. The missing bits include traversals (property paths and paths), services (particularly for virtualized data), and deeper integration with Stardog’s custom memory management layer, particularly, for batch-based hash table lookups.

Example Query

This query of the Labelled Subgraph Query Benchmark will execute in only a few seconds with BARQ enabled, yet may take 10-20 times as long with the previous query engine:

SELECT (COUNT(*) as ?count)
WHERE { 
 ?person1 (lp:Person_knows_Person | ^lp:Person_knows_Person ) ?person2 .
 ?person2 (lp:Person_knows_Person | ^lp:Person_knows_Person ) ?person3 .
 ?person3 lp:Person_hasInterest_Tag ?tag .
 FILTER ( ?person1 != ?person3 )
}

Analyzing BARQ engine performance

Which parts of the plan are executed with BARQ or without BARQ can be identified from the query plan profiler output.

When running stardog query explain --profile myquery.sparql the text output contains the batched keyword for operators which used BARQ:

Projection(?count) [#1], results: 1 (next: 0), wall time: 0 ms (0.0%), batched
`─ Group(aggregates=[(COUNT(*) AS ?count)]) [#1], results: 1 (next: 0), wall time: 1 ms (0.1%), batched
   `─ HashJoin(?personA) [#27.5M], memory: {total=12M (46.1%); max=12M}, results: 277K (next: 0), wall time: 572 ms (42.8%), batched
      +─ MergeJoin(?post) [#347K], results: 347K (next: 0), wall time: 15 ms (1.1%), batched
      │  +─ Scan[PSOC](?post, http://ldbcouncil.org/Post_hasCreator_Person, ?personB) [#414K], results: 84K (next: 1.6K, skip: 1.6K, reset: 0), wall time: 44 ms (3.3%), batched
      │  `─ Sort(?post) [#347K], memory: {total=13M (53.9%)}, results: 347K (next: 1, skip: 1, reset: 0), wall time: 361 ms (27.0%), batched
      │     `─ Restriction(?personA, ?post) [#347K]
      │        `─ MergeJoin(?comment) [#347K], results: 347K (next: 0), wall time: 41 ms (3.1%), batched
      │           +─ Scan[PSOC](?comment, http://ldbcouncil.org/Comment_replyOf_Post, ?post) [#347K], results: 347K (next: 0), wall time: 58 ms (4.3%), batched
      │           `─ Scan[PSOC](?comment, http://ldbcouncil.org/Comment_hasCreator_Person, ?personA) [#698K], results: 660K (next: 21K, skip: 21K, reset: 0), wall time: 143 ms (10.7%), batched
      `─ Union [#114K], results: 114K (next: 0), wall time: 5 ms (0.4%), batched
         +─ Scan[POSC](?personA, http://ldbcouncil.org/Person_knows_Person, ?personB) [#57K], results: 57K (next: 0), wall time: 10 ms (0.7%), batched
         `─ Scan[PSOC](?personB, http://ldbcouncil.org/Person_knows_Person, ?personA) [#57K], results: 57K (next: 0), wall time: 7 ms (0.5%), batched

Disabling the BARQ engine

Our aim is for BARQ to be suitable for all workloads. If you experience suboptimal performance, do not hesitate to contact us.

Should it be necessary to disable BARQ, it is easy to do so: To disable BARQ globally for all SPARQL queries, one can add the following to stardog.properties. This causes the older engine to be used:

query.executor=LEGACY

To use the legacy engine on a single query, add this query hint pragma to the top of the query (preamble):

#pragma executor LEGACY
SELECT * { ?s ?p ?o }

The executor specified in the preamble has priority over the setting specified in stardog.properties.