-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-6301] Extend ‘Must-filter’ columns to support a conditional bypass list #3984
base: main
Are you sure you want to change the base?
Conversation
…bypass-field being selected but not filtered on, a must-filter field not being selected (thus can no longer be defused), and filtering on the bypass field in the enclosing query, thus defusing the must-filter field Fix style
270696b
to
028c6ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a ton of context on these changes but they seem really thoroughly tested so approved w/ minor comments
core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/calcite/test/SqlValidatorTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/calcite/test/SqlValidatorTest.java
Outdated
Show resolved
Hide resolved
testkit/src/main/java/org/apache/calcite/test/catalog/MockCatalogReader.java
Outdated
Show resolved
Hide resolved
testkit/src/main/java/org/apache/calcite/test/catalog/MockCatalogReader.java
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/sql/validate/AbstractNamespace.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/sql/validate/AbstractNamespace.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/sql/validate/AbstractNamespace.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/sql/validate/SemanticTable.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java
Outdated
Show resolved
Hide resolved
@@ -1217,12 +1219,21 @@ protected void validateNamespace(final SqlValidatorNamespace namespace, | |||
// fields; these are neutralized if the consuming query filters on them. | |||
final ImmutableBitSet mustFilterFields = | |||
namespace.getMustFilterFields(); | |||
if (!mustFilterFields.isEmpty()) { | |||
// Remnant must filter fields are fields that are not selected and cannot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether the 'remnant' concept is necessary. Instead could we just remove must-filter fields that have been satisfied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is necessary to handle the case where the bypass field is selected, but the must-filter fields are not selected.
Imagine this setup
Table: EMP (empno, name, id)
must_filters: empno
bypass_filter: name
Query:
select * from (select name from emp) where name = 'bob'
The inner query select name from emp
can't error on empno
not being filtered at this point because name
was selected, and could be filtered on later. The must-filter for empno
is not satisfied yet at this point.
When we get to the select * from _ where name = 'bob'
, that's where we defuse the error for empno
based on seeing name
being filtered on.
If alternatively the query was
select * from (select name, id from emp) where id = '12'
When we get to select * from _ where id = '12'
, we're at the top-level, and remnantMustFilter's set is not empty, hence we throw an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After revising and simplifying the 3 fields mustFilterFields
, mustFilterBypassFields
and remnantMustFilterFields
into a single class, I was able to simplify away the interface and no longer need to expose getRemnantFilterFields
to users of SemanticTable
. I'll push new commits soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. If t
has a must-filter field f
and bypass fields b0
and b1
, then the query select b0, b1 from t
has no must-filter fields, no bypass fields, but two remnant fields [b0, b1]
.
If you filter on any of those remnant fields then they all go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that is not how the sets are populated.
The purpose of remnant fields is to capture the fields that were marked as required, but no longer accessible to be filtered on and defused in the select clause. In other words, you cannot filter on any of the remnant fields. The presence of values being in the remant-filter set serves the purpose of being an indicator for erroring.
They can only be defused if any of the bypass fields are filtered on.
In your example this would be the correct set values:
select b0, b1 from t
where f
is a must-filter field
must-filter field: []
bypass-fields: [b0
, b1
]
remnant-fields: [f]
core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java
Show resolved
Hide resolved
core/src/test/java/org/apache/calcite/test/SqlValidatorTest.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/sql/validate/MustFilterRequirements.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java
Outdated
Show resolved
Hide resolved
* in the current query, but can still be defused by filtering on a bypass field in the | ||
* enclosing query. | ||
*/ | ||
public class MustFilterRequirements { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must the class be public
?
Class names should not be plural (except utility classes). It becomes problematic to name collections of such classes.
'Must' and 'Requirement' seem redundant. So I suggest 'FilterRequirement'.
Maybe mustFilterFields
→ filterFields
, mustFilterBypassFields
→ bypassFields
; remnantMustFilterFields
→ filterAnyFields
.
Could the fields be final
? (Public final fields are fine, but if you must have non-final fields they should probably be private
.)
Immutable classes are much easier to reason about. Consider adding 'withXxx' methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be package-private.
I'll rename it to FilterRequirement
and filterFields
, bypassFields
.
Still thinking about the rename of remnantMustFilterFields
... I still think the name needs to capture the connotation that they're valid filter requirements left-over from a previous query, but also distinct from the set that can actively be defused
Latest commit looks fairly close. There needs to be an cogent explanation in the code for the concepts of bypass fields etc. The javadoc for Move the current descriptions of the 3 fields onto the fields themselves. 'Remnant' is not a good name because it forces you to talk about how we got here - the winnowing algorithm - rather than there we are. I prefer 'anyFitlerFields' because for the query to be valid you need to filter any of them. I think you can make |
Quality Gate passedIssues Measures |
No description provided.