CN117155406B - Intelligent management system for social investigation data - Google Patents
Intelligent management system for social investigation data Download PDFInfo
- Publication number
- CN117155406B CN117155406B CN202311413375.6A CN202311413375A CN117155406B CN 117155406 B CN117155406 B CN 117155406B CN 202311413375 A CN202311413375 A CN 202311413375A CN 117155406 B CN117155406 B CN 117155406B
- Authority
- CN
- China
- Prior art keywords
- class
- frequency
- confirmed
- group
- grouping
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000011835 investigation Methods 0.000 title abstract description 10
- 238000000034 method Methods 0.000 claims abstract description 90
- 238000007726 management method Methods 0.000 claims abstract description 17
- 238000013144 data compression Methods 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 11
- 238000013500 data storage Methods 0.000 claims abstract description 7
- 238000013507 mapping Methods 0.000 claims description 14
- 238000013523 data management Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 7
- 230000000694 effects Effects 0.000 abstract description 15
- 238000007906 compression Methods 0.000 description 7
- 230000006835 compression Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 235000012054 meals Nutrition 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention relates to the technical field of data compression management, in particular to an intelligent management system for social investigation data. The system acquires census data through a data acquisition module; analyzing the frequency of various characters in the census data through a data processing module, adopting Fei Nuo coding, and obtaining the weighted preference degree of various characters according to the frequency of various characters in the class II group to be confirmed and the frequency of all characters in the class I group to be confirmed in the process of grouping each time, so as to obtain the optimal character class group in the process of grouping each time; constructing a coding tree of census data; obtaining census compressed data through a data compression module; and storing the census compressed data through a data storage module. According to the invention, the grouping condition is adaptively adjusted by considering the frequency difference between the grouping in each grouping process, the difference of character frequency between the grouping is reduced, the data compression effect is improved, and the intelligent management system of the data is optimized.
Description
Technical Field
The invention relates to the technical field of digital compression management, in particular to an intelligent management system for social investigation data.
Background
By managing the social investigation data, researchers can be helped to understand and use the data more quickly, so that the research efficiency is improved; however, some variables and indexes may have high repeatability and regularity, so that problems of high data storage cost, low data transmission speed, low data processing efficiency and the like exist in data management.
In the prior art, the data with higher repetition and regularity are represented by using a shorter code by using the Fischer code, so that the size of the data can be reduced; however, the traditional Fischer-Tropsch coding directly divides the characters with higher frequency and lower frequency into two groups according to the occurrence frequency of the characters in the data, so that the frequency difference of the final grouping result is larger, the compression effect of the data is affected, and the effect of data management is poor.
Disclosure of Invention
In order to solve the technical problem that the data compression effect is poor due to the fact that the grouping condition of the coding tree cannot be adaptively adjusted in the prior art, the invention aims to provide an intelligent social investigation data management system, and the adopted technical scheme is as follows:
the invention provides an intelligent management system for social investigation data, which comprises:
the data acquisition module acquires census data;
the data processing module traverses the census data to acquire the frequency of various characters in the census data; performing Fischer-Tropsch coding on various characters according to the frequency, putting the character with the largest frequency in the characters to be grouped into a group to be confirmed in each grouping process in a Fei Nuo coding algorithm, and putting the other characters into the group to be confirmed; obtaining the frequency difference between the groups according to the distribution of the character to be grouped between the group to be confirmed and the group to be confirmed; judging whether the corresponding grouping process needs to be adjusted according to the frequency difference, and if so, obtaining the preference degree of each type of character in the two-type grouping to be confirmed according to the frequency of each type of character in the two-type grouping to be confirmed; obtaining the preference degree influence factors of the various characters in the two-class group to be confirmed according to the difference between the frequency of the various characters in the two-class group to be confirmed and the frequency of the characters in the one-class group to be confirmed; obtaining weighted preference degrees of various characters in the two-class group to be confirmed according to the preference degrees and the preference degree influence factors of various characters in the two-class group to be confirmed; adding characters in the class II groups to be confirmed to the class II groups to be confirmed according to the weighted preference degree until the corresponding grouping process does not need to be adjusted, and obtaining the optimal character class group in each grouping process; constructing a coding tree according to the optimal character category grouping in all grouping processes;
the data compression module is used for obtaining population census compressed data according to the coding tree;
and the data storage module is used for storing the compressed data according to the census.
Further, the method for acquiring the frequency difference includes:
the frequency differences include a first frequency difference and a second frequency difference;
calculating the sum of the frequencies of various characters in the to-be-confirmed group as the group frequency;
calculating the sum of the frequencies of various characters in the second-class grouping to be confirmed to be used as the second-class grouping frequency;
calculating the difference between the first class grouping frequency and the second class grouping frequency, and taking the difference as the grouping difference through the inverse correlation mapping of the exponential function; if the first class grouping frequency is greater than or equal to the second class grouping frequency, carrying out negative correlation mapping and normalization on the grouping difference to obtain the first frequency difference; and if the first group frequency is smaller than the second group frequency, solving the opposite number of the first frequency difference to obtain the second frequency difference.
Further, the method for obtaining the preference degree comprises the following steps:
calculating the sum of the group frequency of the first class and the character frequency of each type in the second class group to be confirmed as a group update frequency; calculating the difference between the class II grouping frequency and various character frequencies in the class II grouping to be confirmed as class II grouping update frequency;
and calculating the difference between the update frequency of the first class packet corresponding to various characters in the second class packet to be confirmed and the update frequency of the second class packet, and obtaining the preference degree of the various characters in the second class packet to be confirmed through negative correlation mapping.
Further, determining whether the corresponding grouping process needs to be adjusted according to the frequency difference includes:
comparing the frequency difference with a preset experience threshold; if the frequency difference is greater than or equal to a preset experience threshold, judging that the corresponding grouping process does not need to be adjusted; and if the frequency difference is smaller than a preset experience threshold, judging that the corresponding grouping process needs to be adjusted.
Further, the method for acquiring the preference degree influence factor comprises the following steps:
with the second group to be confirmedThe class of characters is exemplified by the class of characters,the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing the +.f in the class II packet to be acknowledged>A preference degree influence factor of the class character; />Representing the total number of character types in a group to be confirmed; />Representing +.f. in a class of packets to be acknowledged>Class characters; />Representing +.f. in a class of packets to be acknowledged>The frequency of the class characters; />Representing the +.f in the class II packet to be acknowledged>The frequency of the class characters; />Represents +.2 as base>Is a logarithmic function of (2); />Represents +.2 as base>Logarithmic function of>Expressed as natural constant->An exponential function of the base.
Further, the method for acquiring the weighted preference degree comprises the following steps:
calculating the product of the preference degree and the preference degree influence factor of various characters in the second-class group to be confirmed to obtain the weighted preference degree of various characters in the second-class group to be confirmed;
and the preference degree influence factor are in positive correlation with the weighted preference degree.
Further, the method for acquiring the optimal character class group comprises the following steps:
and moving the character with the highest weighted preference degree in the class II group to be confirmed into the class II group to be confirmed, and continuing to adjust until the corresponding grouping process does not need to be adjusted, so as to obtain the optimal character class group under each grouping process.
Further, the method for acquiring the census compressed data comprises the following steps:
and compressing the census data by adopting the Fisher code according to the code tree to obtain census compressed data.
Further, the preset empirical threshold takes 0.
Further, the method of the negative correlation mapping is to perform the negative correlation mapping by an exponential function based on a natural constant.
The invention has the following beneficial effects:
the invention acquires the frequency of various characters in the population census data, and avoids the grouping situation when grouping is affected because of different occurrence frequencies; performing Fisher coding on various characters according to the frequency, putting a group to be confirmed into a character corresponding to the maximum frequency in the characters to be grouped in each grouping process in a Fei Nuo coding algorithm, and putting the other characters into a group to be confirmed; obtaining the frequency difference between the groups according to the distribution of the character to be grouped between the group to be confirmed and the group to be confirmed, and judging whether the group reaches the optimal condition; judging whether the corresponding grouping process needs to be adjusted according to the frequency difference, if so, obtaining the preference degree of each type of character in the two-type grouping to be confirmed according to the frequency of each type of character in the two-type grouping to be confirmed, and indicating the possibility that each type of character in the two-type grouping to be confirmed is added into the one-type grouping to be confirmed; obtaining the preference degree influence factors of the various characters in the class II group to be confirmed according to the difference between the frequencies of the various characters in the class II group to be confirmed and the frequencies of all the characters in the class II group to be confirmed, analyzing the difference condition between the frequencies of the various characters in the class II group to be confirmed and the frequencies of the characters in the class II group to be confirmed, and avoiding the larger frequency difference in the group after adjustment; obtaining weighted preference degrees of the characters in the two-class group to be confirmed according to the preference degrees and preference degree influence factors of the characters in the two-class group to be confirmed, and indicating the possibility that the characters in the two-class group to be confirmed can be added into the one-class group to be confirmed; adding characters in the class II groups to be confirmed to the class II groups to be confirmed according to the weighted preference degree until the corresponding grouping process does not need to be adjusted, and obtaining the optimal character class group in each grouping process, so that the frequency among the groups is close; the code tree is built according to the optimal character category grouping in all grouping processes, so that the storage cost is saved, the transmission efficiency is improved, the data compression effect is further improved, and the intelligent management system of the data is optimized.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block diagram of a social survey data intelligent management system according to one embodiment of the present invention;
fig. 2 is a schematic diagram of a coding tree according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of a specific implementation, structure, characteristics and effects of the social investigation data intelligent management system according to the invention with reference to the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the social investigation data intelligent management system provided by the invention with reference to the accompanying drawings.
Referring to fig. 1, a system block diagram of a social survey data intelligent management system according to an embodiment of the present invention is shown, the system includes: a data acquisition module 101, a data processing module 102, a data compression module 103, and a data storage module 104.
The data acquisition module 101 acquires census data.
In the embodiment of the invention, the census data is one of social investigation data, the census is that population of the whole country or region is comprehensively counted and investigated, various population-related data are collected, including information such as name, gender, age, ethnicity, cultural degree and the like, and the problems such as socioeconomic phenomenon, population structure and change and the like can be analyzed and studied; population census data was acquired for study.
It should be noted that, in an embodiment of the present invention, the method for obtaining census data is a technical means well known to those skilled in the art, and will not be described in detail herein.
The data processing module 102 traverses the census data to acquire the frequency of various characters in the census data; performing Fisher coding on various characters according to the frequency, and placing the character with the highest frequency corresponding to the character to be grouped into a group to be confirmed according to the frequency of the character to be grouped in each grouping process in a Fei Nuo coding algorithm, and placing the other characters into the group to be confirmed; obtaining the frequency difference between the groups according to the distribution of the character to be grouped between the group to be confirmed and the group to be confirmed; judging whether the corresponding grouping process needs to be adjusted according to the frequency difference, and if so, obtaining the preference degree of each type of character in the two-type grouping to be confirmed according to the frequency of each type of character in the two-type grouping to be confirmed; obtaining the preference degree influence factors of the various characters in the class II group to be confirmed according to the difference between the frequencies of the various characters in the class II group to be confirmed and the frequencies of all the characters in the class II group to be confirmed; obtaining weighted preference degrees of the characters in the two-class group to be confirmed according to preference degrees and preference degree influence factors of the characters in the two-class group to be confirmed; adding characters in the class II groups to be confirmed to the class II groups to be confirmed according to the weighted preference degree until the corresponding grouping process does not need to be adjusted, and obtaining the optimal character class group in each grouping process; and constructing a coding tree according to the optimal character category group in all grouping processes.
Because each character in the census data may be different, for example, characters such as names may be less different, and only two types of characters such as "male" and "female" are used; the ages are in different age groups, the occurrence of the ages is more likely to be different, and the occurrence frequency is highThe data are also different, so that the grouping situation when the data are grouped is influenced, and the frequency of various characters in the census data is required to be obtained by traversing the census data; in one embodiment of the invention, toThe class character is taken as an example, and the concrete expression form of the data is as follows:
wherein,representing the total number of character types in the grouping process; />Representing +.>Class characters; />Representing +.>The frequency of the seed character.
And performing Fischer coding on various characters according to the frequency, placing the character with the highest frequency corresponding to the character to be grouped into a group to be confirmed according to the frequency of the character to be grouped in each grouping process in a Fei Nuo coding algorithm, placing the other characters into a group to be confirmed, and analyzing the difference condition among the frequencies of various characters. It should be noted that, the grouping process in the Fei Nuo coding algorithm is sequentially grouped, that is, each grouping process is grouped according to the grouping result of the last grouping.
The traditional Fei Nuo coding algorithm sorts the characters from large to small according to the occurrence frequency of the characters, and groups the characters continuously according to the sorting order, so that the characters with higher frequency are all grouped into one group, the characters with lower frequency are grouped into one group, the frequency difference of the characters in the same group is larger in the grouping process, and the frequency difference of the final grouping result is larger, thereby influencing the compression effect of the data; therefore, after the frequencies of various characters in each grouping process are ordered from large to small, the size of the encoded data can be reduced to the greatest extent by giving out the code with the shortest character with the highest occurrence frequency; the character with the highest frequency is put into a group to be used as a group to be confirmed; all other characters are put into one group to be used as a class II group to be confirmed, so that the condition that the frequency difference between two groups of characters which are directly divided according to the frequency is too large is avoided, and the compression effect is influenced.
In the grouping process, the sum of the frequencies of the two groups of characters needs to be as close as possible, so that the purposes of enabling the coding length of the characters with larger frequency to be shorter and the coding length of the characters with smaller frequency to be longer can be achieved, the frequency difference between the groups can be obtained according to the distribution of the characters to be grouped between the groups to be confirmed and the groups to be confirmed, the frequency difference can indicate the approaching condition of the frequencies between the two groups, and the closer the approaching condition is, the greater the possibility that the grouping condition is optimal, the less adjustment is needed. Whether the corresponding grouping process needs to be adjusted or not can be judged according to the frequency difference. Referring to fig. 2, there is shown a schematic diagram of a coding tree, during the first grouping of fig. 2, the frequency sequences corresponding to the characters are 0.4, 0.1, 0.05 0.05, 0.05 0.05, 0.05; the character with the maximum frequency of 0.4 on the left is independently used as a class, and is a group to be confirmed, corresponding to frequencies of 0.1, 0.05 0.05, 0.05 the remainder of the characters as a class, for the second-class grouping to be confirmed, after the analysis of the frequency difference between the two-class grouping, judging that the corresponding grouping process does not need to be adjusted; continuing grouping the class II group to be confirmed, in the second grouping process, taking the character with the maximum frequency of 0.1 as the class II group to be confirmed, the other frequencies are 0.05, 0.05 characters of 0.05, 0.05 are used as the class II packets to be confirmed, and judging that the grouping process needs to be adjusted, and adjusting various characters in the second-class grouping to be confirmed to obtain 0.1, 0.05 and 0.05, 0.05 and 0.05 groups, wherein the corresponding grouping process does not need to be adjusted again to obtain an optimal grouping; each subsequent grouping process executes the operation method provided by the embodiment of the invention.
Under the condition that the sum of the frequencies of the two groups is large, in order to adjust the grouping condition to be optimal, the possibility that various characters in the two groups to be confirmed are adjusted is analyzed, the preference degree of the various characters in the two groups to be confirmed is obtained according to the frequencies of the various characters in the two groups to be confirmed, and the greater the optimization degree is, the greater the possibility that the characters are adjusted is; because the larger the difference of the character frequency in the group is, the compression effect is also influenced, the preference degree influence factors of the characters in the group of the class to be confirmed are obtained according to the difference between the frequency of the characters in the group of the class to be confirmed and the frequency of all the characters in the group of the class to be confirmed, the larger the difference is, the smaller the preference degree influence factors are, and the smaller the preference degree is, the possibility of adjustment is small; and obtaining weighted preference degrees of the characters in the two groups to be confirmed according to the preference degrees and preference degree influence factors of the characters in the two groups to be confirmed, wherein the greater the preference degree is, the greater the preference degree influence factor is, the greater the weighted preference degree is, namely the greater the possibility of being adjusted is.
Preferably, in one embodiment of the present invention, the method for acquiring a frequency difference includes:
the frequency difference may analyze the distribution and relationship of character frequencies in different groupings. The frequency differences include a first frequency difference and a second frequency difference; calculating the sum of the frequencies of various characters in the group to be confirmed as the group frequency; calculating the sum of the frequencies of various characters in the two groups to be confirmed, and determining the occurrence frequency of the characters in each group as the frequency of the two groups; calculating the difference between the first class grouping frequency and the second class grouping frequency, and taking the difference as the grouping difference through the inverse correlation mapping of the exponential function; the smaller the grouping difference is, the larger the grouping frequency of the first class is different from the grouping frequency of the second class, and the worse the grouping preference degree is; if the first group frequency is greater than or equal to the second group frequency, carrying out negative correlation mapping and normalization on the group difference to obtain a first frequency difference, wherein the larger the first frequency difference is, the larger the difference between the groups is, and the smaller the optimal possibility of the group is; if the first group frequency is smaller than the second group frequency, the opposite number of the first frequency difference is calculated, a second frequency difference is obtained, and the larger the second frequency difference is, the smaller the difference between the groups is, and the greater the possibility that the group is optimal is. In one embodiment of the invention, the formula for the frequency difference is expressed as:
;
in the method, in the process of the invention,representing the frequency difference of two packets to be acknowledged; />Representing the total number of character types in a group to be confirmed; />Representing +.f. in a class of packets to be acknowledged>Planting characters; />Representing the total number of character types in the class II group to be confirmed; />The +.f representing the meal in the class II packet to be acknowledged>Planting characters; />Representing +.f. in a class of packets to be acknowledged>The frequency of the class characters; />Representing the +.f in the class II packet to be acknowledged>The frequency of the class characters; />The representation takes absolute value; />Expressed as natural constant->An exponential function of the base.
Wherein,representing the difference between the sum of the character frequencies in the group of the type to be confirmed and the sum of the character frequencies in the group of the type to be confirmed; by exponential function based on natural constantAnd carrying out negative correlation mapping and normalization to obtain grouping difference, wherein if the difference between the first-class grouping frequency and the second-class grouping frequency is smaller, the grouping difference is larger, which means that the closer the frequencies of the two groupings are, the better the grouping condition is.
It should be noted that, in other embodiments of the present invention, positive and negative correlation and normalization methods may be constructed by other basic mathematical operations, and specific means are technical means well known to those skilled in the art, and are not described herein.
Preferably, in one embodiment of the present invention, determining whether the corresponding packet needs to be adjusted according to the frequency difference includes:
comparing the frequency difference with a preset experience threshold; if the frequency difference is greater than or equal to a preset experience threshold value, judging that the corresponding grouping process does not need to be adjusted; if the frequency difference is smaller than a preset experience threshold, judging that the corresponding grouping process needs to be adjusted. In one embodiment of the present invention, the preset empirical threshold is taken as 0, and in other embodiments of the present invention, the magnitude of the preset empirical threshold may be specifically set according to specific situations, which is not limited and described herein.
Preferably, in one embodiment of the present invention, the method for obtaining the preference degree includes:
calculating the sum of the frequency of one group and the frequency of various characters in the two groups to be confirmed, and taking the sum as the update frequency of one group of various characters in the two groups to be confirmed; calculating the difference between the frequency of the second class packet and the frequency of various characters in the second class packet to be confirmed, and taking the difference as the update frequency of the second class packet of various characters in the second class packet to be confirmed; calculating the difference between the first-class grouping update frequency and the second-class grouping update frequency of various characters in the second-class grouping to be confirmed, and obtaining the preference degree of the various characters in the second-class grouping to be confirmed through negative correlation mapping, wherein the larger the difference between the first-class grouping update frequency and the second-class grouping update frequency is, the smaller the preference degree of the characters is, and the worse the effect after adjustment is. In one embodiment of the invention, the second group is to be confirmedFor example, the class character, the formula for preference level is:
;
in the method, in the process of the invention,representing +.f in the class II packet to be acknowledged>The preference degree of the class character added to the class packet to be confirmed; />Representing the total number of character types in a group to be confirmed; />Representing +.f. in a class of packets to be acknowledged>Class characters; />Representing the total number of character types in the class II group to be confirmed; />Representing +.f in the class II packet to be acknowledged>Class characters; />Representing +.f. in a class of packets to be acknowledged>The frequency of the class characters; />Representing the first group in the class II packet to be acknowledgedThe frequency of the class characters; />Expressed as natural constant->An exponential function of the base.
In the formula of the degree of preference,representing the +.f in the class II packet to be acknowledged>The difference between the class-one update frequency and the class-two update frequency of the class-character will ++by an exponential function based on a natural constant>The smaller the difference between the two, the greater the preference of the character.
It should be noted that, in other embodiments of the present invention, positive and negative correlation and normalization methods may be constructed by other basic mathematical operations, and specific means are technical means well known to those skilled in the art, and are not described herein.
Preferably, in one embodiment of the present invention, the method for acquiring the preference degree influencing factor includes:
with the second group to be confirmedFor example, the formula of the preference degree influence factor is expressed as:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing the +.f in the class II packet to be acknowledged>A preference degree influence factor of the class character; />Representing the total number of character types in a group to be confirmed; />Representing +.f. in a class of packets to be acknowledged>Class characters; />Representing +.f. in a class of packets to be acknowledged>The frequency of the class characters; />Representing the +.f in the class II packet to be acknowledged>The frequency of the class characters; />Represents +.2 as base>Is a logarithmic function of (2); />Represents +.2 as base>Logarithmic function of>Expressed as natural constant->An exponential function of the base.
In the formula of the preference degree influencing factor,indicating the overall confusion degree after adding various characters in the class II group to be confirmed into the class II group to be confirmed,/>And->The value range of (2) is 0-1, and the closer the frequency of the characters is, the greater the degree of confusion is; by exponential function based on natural constantNegative correlation mapping, wherein the greater the value of the confusion degree is, the description of the +.>The smaller the difference between the frequency of the class character and the frequency of the character in the class packet to be confirmed, namely the closer the frequency is, the higher the preference degree of the character is; degree of confusion is takenThe smaller the value is, the +.f in the class II packet to be acknowledged>The larger the difference between the frequency of the class character and the frequency of the character in the class group to be confirmed is, the worse the effect of adding the class character into the class group to be confirmed is, the smaller the influence factor of the preference degree of the class character is, and the lower the preference degree of the class character is.
It should be noted that, in other embodiments of the present invention, positive and negative correlation and normalization methods may be constructed by other basic mathematical operations, and specific means are technical means well known to those skilled in the art, and are not described herein.
Preferably, in one embodiment of the present invention, the method for obtaining the weighted preference degree includes:
comprehensively considering the character frequency difference among the groups and the character frequency difference in the preset and adjusted groups, and determining the character which is most favorable for the group adjustment. Calculating the product of the preference degree and the preference degree influence factor of each type of character in the two-type group to be confirmed to obtain the weighted preference degree of each type of character in the two-type group to be confirmed, wherein the greater the preference degree is, the greater the preference degree influence factor is, the greater the weighted preference degree of the character is, the more likely the character is to be adjusted, and the closer the frequency among the groups is; the preference degree and the preference degree influence factor are both in positive correlation with the weighted preference degree. In one embodiment of the invention, the second group is to be confirmedFor example, the formula for weighting the preference degree is expressed as:
;
in the method, in the process of the invention,representing the +.f in the class II packet to be acknowledged>The weighted preference degree of the class characters; />Representing the +.f in the class II packet to be acknowledged>A preference degree influence factor of the class character; />Representing the +.f in the class II packet to be acknowledged>Preference degree of class character.
It should be noted that, in other embodiments of the present invention, the positive-negative correlation may be constructed by other basic mathematical operations, and specific means are technical means well known to those skilled in the art, which are not described herein.
The method has the advantages that the grouping condition under the grouping process is adjusted, the size of the encoded data can be reduced, the method is very useful for processing and transmitting a large amount of data, the storage space requirement can be reduced, and the processing efficiency can be improved; adding characters in the class II groups to be confirmed to the class II groups to be confirmed according to the weighted preference degree until the corresponding grouping process does not need to be adjusted, and obtaining the optimal character class group in each grouping process; in Fei Nuo coding, the grouping with the total number of character types not being 1 is continuously split into 2 groups, the grouping is continuously carried out until the number of data types in each group is 1, a coding tree is built according to the optimal character type grouping in all grouping processes, the coding tree can carry out effective data compression, and through coding and compressing nodes, the storage space requirement of data can be reduced, and the data processing efficiency is improved.
Preferably, in one embodiment of the present invention, the method for acquiring the optimal character class packet includes:
the difference between the groups after adjustment is larger, and a poor compression effect can be generated; the difference between the groups after adjustment is small, so that the compression effect can be improved; and adding the character with the highest weighted preference degree in the class II groups to be confirmed into the class II groups to be confirmed until the corresponding grouping process does not need to be adjusted, and obtaining the optimal character class group under each grouping process.
The data compression module 103 obtains census compressed data according to the coding tree.
In census, the coding tree can be used for compressing population data so as to transmit and store the data more quickly, and the population census compressed data is obtained according to the coding tree, so that the storage space of the data can be greatly reduced through the compressed data, and the storage cost is reduced.
Preferably, in one embodiment of the present invention, the method for acquiring census compressed data includes:
the census compressed data can save the storage cost and improve the transmission efficiency; and compressing the census data by adopting the Fisher code according to the code tree to obtain census compressed data.
It should be noted that, in one embodiment of the present invention, fei Nuo is encoded as a technical means well known to those skilled in the art, and will not be described herein.
The data storage module 104 stores compressed data according to census.
Compressed data can be stored and managed more efficiently. The data volume is reduced, so that operations such as searching, reading and managing are quicker and more convenient, and the working efficiency is improved; the cloud platform management system has strict security and privacy protection measures, and ensures the reliability and the integrity of census data. And the population census compressed data is stored in the cloud platform management system, so that the data can be conveniently analyzed and processed by a subsequent data analysis module.
In summary, the invention acquires census data through the data acquisition module; analyzing the frequency of various characters in the census data through a data processing module, adopting Fei Nuo coding, and obtaining the weighted preference degree of various characters in the two-class group to be confirmed according to the frequency of various characters in the two-class group to be confirmed and the frequency of all characters in the one-class group to be confirmed in each grouping process, so as to obtain the optimal character class group in each grouping process; constructing a coding tree of census data; obtaining census compressed data through a data compression module; and storing the census compressed data through a data storage module. According to the invention, the grouping condition is adaptively adjusted by considering the frequency difference between the grouping in each grouping process, the difference of character frequency between the grouping is reduced, the data compression effect is improved, and the intelligent management system of the data is optimized.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
Claims (10)
1. An intelligent management system for social survey data, the system comprising:
the data acquisition module acquires census data;
the data processing module traverses the census data to acquire the frequency of various characters in the census data; performing Fischer-Tropsch coding on various characters according to the frequency, putting the character with the largest frequency in the characters to be grouped into a group to be confirmed in each grouping process in a Fei Nuo coding algorithm, and putting the other characters into the group to be confirmed; obtaining the frequency difference between the groups according to the distribution of the character to be grouped between the group to be confirmed and the group to be confirmed; judging whether the corresponding grouping process needs to be adjusted according to the frequency difference, and if so, obtaining the preference degree of each type of character in the two-type grouping to be confirmed according to the frequency of each type of character in the two-type grouping to be confirmed; obtaining the preference degree influence factors of the various characters in the two-class group to be confirmed according to the difference between the frequency of the various characters in the two-class group to be confirmed and the frequency of the characters in the one-class group to be confirmed; obtaining weighted preference degrees of various characters in the two-class group to be confirmed according to the preference degrees and the preference degree influence factors of various characters in the two-class group to be confirmed; adding characters in the class II groups to be confirmed to the class II groups to be confirmed according to the weighted preference degree until the corresponding grouping process does not need to be adjusted, and obtaining the optimal character class group in each grouping process; constructing a coding tree according to the optimal character category grouping in all grouping processes;
the data compression module is used for obtaining population census compressed data according to the coding tree;
and the data storage module is used for storing the compressed data according to the census.
2. The intelligent social survey data management system of claim 1, wherein the method for obtaining the frequency difference comprises:
the frequency differences include a first frequency difference and a second frequency difference;
calculating the sum of the frequencies of various characters in the to-be-confirmed group as the group frequency;
calculating the sum of the frequencies of various characters in the second-class grouping to be confirmed to be used as the second-class grouping frequency;
calculating the difference between the first class grouping frequency and the second class grouping frequency, and taking the difference as the grouping difference through the inverse correlation mapping of the exponential function; if the first class grouping frequency is greater than or equal to the second class grouping frequency, carrying out negative correlation mapping and normalization on the grouping difference to obtain the first frequency difference; and if the first group frequency is smaller than the second group frequency, solving the opposite number of the first frequency difference to obtain the second frequency difference.
3. The intelligent social survey data management system of claim 2, wherein the method for obtaining the preference degree comprises:
calculating the sum of the group frequency of the first class and the character frequency of each type in the second class group to be confirmed as a group update frequency; calculating the difference between the class II grouping frequency and various character frequencies in the class II grouping to be confirmed as class II grouping update frequency;
and calculating the difference between the update frequency of the first class packet corresponding to various characters in the second class packet to be confirmed and the update frequency of the second class packet, and obtaining the preference degree of the various characters in the second class packet to be confirmed through negative correlation mapping.
4. The intelligent social survey data management system of claim 1, wherein determining whether the corresponding grouping process requires adjustment based on the frequency difference comprises:
comparing the frequency difference with a preset experience threshold; if the frequency difference is greater than or equal to a preset experience threshold, judging that the corresponding grouping process does not need to be adjusted; and if the frequency difference is smaller than a preset experience threshold, judging that the corresponding grouping process needs to be adjusted.
5. The intelligent social survey data management system of claim 1, wherein the method for obtaining the preference degree influence factor comprises:
with the second group to be confirmedThe class of characters is exemplified by the class of characters,the method comprises the steps of carrying out a first treatment on the surface of the Wherein,representing the +.f in the class II packet to be acknowledged>A preference degree influence factor of the class character; />Representing characters in a class of packets to be acknowledgedTotal number of categories; />Representing +.f. in a class of packets to be acknowledged>Class characters; />Representing +.f. in a class of packets to be acknowledged>The frequency of the class characters; />Representing the +.f in the class II packet to be acknowledged>The frequency of the class characters; />Represents +.2 as base>Is a logarithmic function of (2); />Represents +.2 as base>Logarithmic function of>Expressed as natural constant->An exponential function of the base.
6. The intelligent social survey data management system of claim 1, wherein the weighted preference degree acquisition method comprises:
calculating the product of the preference degree and the preference degree influence factor of various characters in the second-class group to be confirmed to obtain the weighted preference degree of various characters in the second-class group to be confirmed;
and the preference degree influence factor are in positive correlation with the weighted preference degree.
7. The intelligent social survey data management system of claim 1, wherein the method for obtaining the optimal character class groupings comprises:
and moving the character with the highest weighted preference degree in the class II group to be confirmed into the class II group to be confirmed, and continuing to adjust until the corresponding grouping process does not need to be adjusted, so as to obtain the optimal character class group under each grouping process.
8. The intelligent social survey data management system of claim 1, wherein the method for obtaining census compressed data comprises:
and compressing the census data by adopting the Fisher code according to the code tree to obtain census compressed data.
9. The intelligent social survey data management system of claim 4, wherein the predetermined empirical threshold is 0.
10. A social survey data intelligent management system according to claim 3 wherein the negative correlation mapping is performed by an exponential function based on a natural constant.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311413375.6A CN117155406B (en) | 2023-10-30 | 2023-10-30 | Intelligent management system for social investigation data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311413375.6A CN117155406B (en) | 2023-10-30 | 2023-10-30 | Intelligent management system for social investigation data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117155406A CN117155406A (en) | 2023-12-01 |
CN117155406B true CN117155406B (en) | 2024-02-02 |
Family
ID=88897090
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311413375.6A Active CN117155406B (en) | 2023-10-30 | 2023-10-30 | Intelligent management system for social investigation data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117155406B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0494038A2 (en) * | 1990-12-31 | 1992-07-08 | International Business Machines Corporation | Run-length encoding in extensible character sets |
US5260693A (en) * | 1991-10-11 | 1993-11-09 | Spacelabs Medical, Inc. | Method and system for lossless and adaptive data compression and decompression |
JP2002049512A (en) * | 1992-10-22 | 2002-02-15 | Nec Corp | File compression processor |
CN101520771A (en) * | 2009-03-27 | 2009-09-02 | 广东国笔科技股份有限公司 | Method and system for code compression and decoding for word library |
CN112711935A (en) * | 2020-12-11 | 2021-04-27 | 中国科学院深圳先进技术研究院 | Encoding method, decoding method, apparatus and computer readable storage medium |
CN116318173A (en) * | 2023-05-10 | 2023-06-23 | 青岛农村商业银行股份有限公司 | Digital intelligent management system for financial financing service |
CN116915258A (en) * | 2023-09-12 | 2023-10-20 | 湖南省湘辉人力资源服务有限公司 | Enterprise pay management method and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6609404B2 (en) * | 2014-07-22 | 2019-11-20 | 富士通株式会社 | Compression program, compression method, and compression apparatus |
-
2023
- 2023-10-30 CN CN202311413375.6A patent/CN117155406B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0494038A2 (en) * | 1990-12-31 | 1992-07-08 | International Business Machines Corporation | Run-length encoding in extensible character sets |
US5260693A (en) * | 1991-10-11 | 1993-11-09 | Spacelabs Medical, Inc. | Method and system for lossless and adaptive data compression and decompression |
JP2002049512A (en) * | 1992-10-22 | 2002-02-15 | Nec Corp | File compression processor |
CN101520771A (en) * | 2009-03-27 | 2009-09-02 | 广东国笔科技股份有限公司 | Method and system for code compression and decoding for word library |
CN112711935A (en) * | 2020-12-11 | 2021-04-27 | 中国科学院深圳先进技术研究院 | Encoding method, decoding method, apparatus and computer readable storage medium |
CN116318173A (en) * | 2023-05-10 | 2023-06-23 | 青岛农村商业银行股份有限公司 | Digital intelligent management system for financial financing service |
CN116915258A (en) * | 2023-09-12 | 2023-10-20 | 湖南省湘辉人力资源服务有限公司 | Enterprise pay management method and system |
Non-Patent Citations (1)
Title |
---|
基于哈夫曼的k-匿名模型隐私保护数据压缩方案;于玥等;《网络与信息安全学报》;第9卷(第4期);第64-73页 * |
Also Published As
Publication number | Publication date |
---|---|
CN117155406A (en) | 2023-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11907760B2 (en) | Systems and methods of memory allocation for neural networks | |
US8266147B2 (en) | Methods and systems for database organization | |
CN105574212B (en) | A kind of image search method of more index disk hash data structures | |
CN111489201A (en) | Method, device and storage medium for analyzing customer value | |
JPH07105239A (en) | Data base managing method and data base retrieving method | |
CN114722014B (en) | Batch data time sequence transmission method and system based on database log file | |
CN104679895A (en) | Medical image data storing method | |
CN117828002B (en) | Intelligent management method and system for land resource information data | |
CN117216023B (en) | Large-scale network data storage method and system | |
CN114187979A (en) | Data processing, model training, molecular prediction and screening method and device thereof | |
CN109523016B (en) | Multi-valued quantization depth neural network compression method and system for embedded system | |
CN115795329B (en) | Power utilization abnormal behavior analysis method and device based on big data grid | |
CN117155406B (en) | Intelligent management system for social investigation data | |
CN116701979A (en) | Social network data analysis method and system based on limited k-means | |
CN107273493B (en) | Data optimization and rapid sampling method under big data environment | |
Al-Omari et al. | Goodness-of-fit tests for Laplace distribution using ranked set sampling | |
CN111931848A (en) | Data feature extraction method and device, computer equipment and storage medium | |
CN111475158A (en) | Sub-domain dividing method and device, electronic equipment and computer readable storage medium | |
CN113127464B (en) | Agricultural big data environment feature processing method and device and electronic equipment | |
CN117312613A (en) | Cloud computing-based order data intelligent management method and system | |
CN108664548B (en) | Network access behavior characteristic group dynamic mining method and system under degradation condition | |
CN114756742A (en) | Information pushing method and device and storage medium | |
CN107103095A (en) | Method for computing data based on high performance network framework | |
CN112364080A (en) | Rapid retrieval system and method for massive vector library | |
CN111950615A (en) | Network fault feature selection method based on tree species optimization algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |