[PDF][PDF] Preprocessing CVS Data for Fine-Grained Analysis.

T Zimmermann, P Weißgerber - MSR, 2004 - research.cs.queensu.ca
T Zimmermann, P Weißgerber
MSR, 2004research.cs.queensu.ca
All analyses of version archives have one phase in common: the preprocessing of data.
Preprocessing has a direct impact on the quality of the results returned by an analysis. In this
paper we discuss four essential preprocessing tasks necessary for a fine-grained analysis of
CVS archives:(a) data extraction,(b) transaction recovery,(c) mapping of changes to fine-
grained entities, and (d) data cleaning. We formalize the concept of sliding time windows
and show how commit mails can relate revisions to transactions. We also present two …
Abstract
All analyses of version archives have one phase in common: the preprocessing of data. Preprocessing has a direct impact on the quality of the results returned by an analysis. In this paper we discuss four essential preprocessing tasks necessary for a fine-grained analysis of CVS archives:(a) data extraction,(b) transaction recovery,(c) mapping of changes to fine-grained entities, and (d) data cleaning. We formalize the concept of sliding time windows and show how commit mails can relate revisions to transactions. We also present two approaches that map changes to the affected building blocks of a file, eg functions or sections.
research.cs.queensu.ca