[ENHANCEMENT] Make use of functional hierarchies more transparent in the UI #433

jenno-verdonck · 2022-12-15T10:27:01Z

Describe the bug
ARX gives different optimal solutions when using hierarchyBuilders in comparison to using the hierarchy created from this builder.

To Reproduce
Steps to reproduce the behavior:

Open the example project in ARX
Anonymize and note down the best node
write all hierarchies to CSV
Load all hierarchies back in from CSV so that you no longer use builders
Note the different solution.

Expected behavior
I expected to get the same solution in both situations.

Files
example.zip

ARX GUI (please complete the following information):

OS: Windows
Version 3.9.1

prasser · 2022-12-15T10:33:32Z

Thanks. This issue doesn't contain enough info to understand the potential bug. Please provide further details.

prasser · 2022-12-15T10:37:07Z

PS: I'm pretty sure that this isn't a bug but intended behavior, but to be sure and to explain what is going on I need more details.

jenno-verdonck · 2022-12-15T10:52:30Z

Yea my bad. I accidently posted the report already before finishing it.

prasser · 2022-12-15T12:45:58Z

OK, thanks. As already suspected, this is not a bug but expected behaviour. In ARX, hierarchies that have been generated using the builders are assicated with a "functional definition" of the hierarchy as meta-information. This information can be used to more accurately measure information loss. One example:

Assume you have a dataset with an integer attribute. In the records, you have three values: 1, 3 and 7.

When using an interval-based hierarchy builder, you specify the interval [0, 10[. As a result, ARX knows that [0, 10[ is a generalization of 10 integer values and might, e.g., estimate information loss as 1/10 = 0.1

When loading a hierarchy from a CSV file, ARX cannot "understand" what the entries in the hierarchy mean. In the case of our example, it can just see that "[0, 10[" is a generalization of 1, 3 and 7 and might, e.g., estimate information loss as 1/|{1, 3, 7}| = 1/3 = 0.33

You can also save and load the functional definitions of hierarchies in the wizards, using the "Save..." and "Load..." buttons.

jenno-verdonck · 2022-12-15T13:56:01Z

Thanks for the clarification.

I already suspected something like this. I can however see how this may be confusing for some users that expect the same result when visually seeing the same hierarchy in the GUI.

Calculating the score like it is done using the csv files seems to make more sense to me as it take into account the properties of the used dataset and more accurately reflects the score specific to the dataset. I suspect that therefor the utility of the dataset obtained using CSV files will be higher.

prasser · 2022-12-16T07:41:13Z

Calculating the score like it is done using the csv files seems to make more sense to me as it take into account the properties of the used dataset and more accurately reflects the score specific to the dataset. I suspect that therefor the utility of the dataset obtained using CSV files will be higher.

Not sure. I think this depends on the context and use case.

I already suspected something like this. I can however see how this may be confusing for some users that expect the same result when visually seeing the same hierarchy in the GUI.

I turned this issue into an "enhancement". We could make the fact whether a functional definition of a hierarchy is available and should be used more transparent in the UI. Please note that you can remove the functional representations, by manually editing the hierarchy in the hierarchy viewer (not in the wizard) as a workaround.

idhamari · 2022-12-18T08:55:33Z

What about expoerting and importing the finctional definition of the hierarchies at the same event of the hierarchies. This way, if functional definition is available, it can be used for more accurate loss calculation and one gets same result everytime.

jenno-verdonck · 2022-12-20T12:11:11Z

What about expoerting and importing the finctional definition of the hierarchies at the same event of the hierarchies. This way, if functional definition is available, it can be used for more accurate loss calculation and one gets same result everytime.

This would probably solve the import/export problems in the UI. A fix for this in the API could be to disable the user from building the HierarchyBuilder themselves or giving a warning when doing so. This would avoid scenarios where the user builds the Hierarchy and passes the result to the configuration, removing the functional definition. At the moment a user could do this without the knowledge of the difference between Hierarchies and HierarchyBuilders.

Another option would be to merge the hierarchy and builder representation and working with a toggle that enables or disables the functional definition when available. This would however require a mayor restructure I think.

idhamari · 2022-12-22T11:27:27Z

This would probably solve the import/export problems in the UI.

I think one can do the same in the API e.g. saving both hierarchy and functional definition then load them. I will try the above solution and propose a PR.

jenno-verdonck · 2023-01-04T13:48:18Z

After investigating this behavior a bit further. I noticed that the code only calculates the shares in the scoring functions differently when using Redaction- and Interval-based builders. All other builder types are calculated identically to not having a functional definition. The utility metrics, on the other hand, are only calculate differently when using a Redaction-based builders.

prasser · 2023-01-04T18:08:44Z

It's true that not all utility models make use of additional info provided by functional hierarchies and that not all hierarchy types provide such information.

jenno-verdonck added the bug label Dec 15, 2022

jenno-verdonck assigned prasser Dec 15, 2022

jenno-verdonck changed the title ~~[BUG] Inconsistent calculation metric using hierarchies vs hierarchyBuilders~~ [BUG] Inconsistent score calculation using hierarchies vs hierarchyBuilders Dec 15, 2022

prasser added enhancement and removed bug labels Dec 16, 2022

prasser changed the title ~~[BUG] Inconsistent score calculation using hierarchies vs hierarchyBuilders~~ [ENHANCEMENT] Make use of functional hierarchies more transparent in the UI Dec 16, 2022

This was referenced Jan 4, 2023

saving and loading functional hierarchies with csv files in the GUI #435

Open

loading .ahs files in the api #436

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENHANCEMENT] Make use of functional hierarchies more transparent in the UI #433

[ENHANCEMENT] Make use of functional hierarchies more transparent in the UI #433

[ENHANCEMENT] Make use of functional hierarchies more transparent in the UI #433

[ENHANCEMENT] Make use of functional hierarchies more transparent in the UI #433

Comments