Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Fuzzy heaping mechanism for heaped count data with imprecision

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In genetic association studies, the traits of interest may sometimes be collected from the reported data. Since subjects report exact responses and/or rounded responses, the histogram of data frequently exhibits spikes at particular values. This phenomenon, known as heaping, can cause difficulties in performing the association test via standard modeling approaches. Recently, several models have been proposed to identify the true unobservable underlying distribution from heaped data. However, all of these methods depend on probabilistic assumptions regarding the heaping mechanism. Unfortunately, probabilistic models cannot represent heaped data effectively, because heaping can be caused by imprecisely reported values. This type of imprecision is different from probabilistic uncertainty, which is described well by a probabilistic model. In this paper, we propose a fuzzy heaping model to identify genetic variants for the heaped count data. Our fuzzy model uses a mixture of likelihood functions for precisely and imprecisely reported data, treating heaped data as imprecise data represented by fuzzy sets. Moreover, since reported count data may include excess zeros, as well as heaped data, we extend our fuzzy heaping model to handle excess zeros. Through simulation studies, we show that the proposed fuzzy heaping model controls type I errors effectively and has great power to identify causal variants. We illustrate the proposed fuzzy heaping model through a study of the identification of genetic variants associated with the number of cigarettes smoked per day.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Bar H, Lillard D (2012) Accounting for heaping in retrospectively reported event data. A mixture-model approach. Stat Med 31:3347–3365

    Article  MathSciNet  Google Scholar 

  • Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI, Ouwehand WH, Samani NJ, Todd JA (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 447(7145):661–678

    Article  Google Scholar 

  • Bush WS, Moore JH (2012) Genome-wide association studies. PLoS Comput Biol 8(12):e1002822

    Article  Google Scholar 

  • Cho YS, Go MJ, Kim YJ, Heo JY, Oh JH, Ban HJ, Yoon D, Lee MH, Kim DJ, Park M, Cha SH (2009) A large-scale genome-wide association study of Asian populations uncovers genetic factors influencing eight quantitative traits. Nat Genet 41(5):527–534

    Article  Google Scholar 

  • Dale SC, Robin JM et al (2014) Effect of neuronal nicotinic acetylcholine receptor genes (CHRN) on longitudinal cigarettes per day in adolescents and young adults. Nicotine Tob Res Feb 16(2):137–144

    Article  Google Scholar 

  • David SP et al (2012) Genome-wide meta-analyses of smoking behaviors in African Americans. Transl psychiatry 2(5):e119

  • Denoeux T (2011) Maximum likelihood estimation from fuzzy data using the EM algorithm. Fuzzy Sets Syst 183(1):72–91

    Article  MathSciNet  MATH  Google Scholar 

  • Dubois D, Prade H (1980) Fuzzy sets and systems theory and applications. Academic Press, New York

    MATH  Google Scholar 

  • Farrell L, Fry T, Harris M (2008) A pack a day for twenty years: smoking and cigarette packet sizes. Appl Econ 43:2833–2842

    Article  Google Scholar 

  • Hardy J, Singleton A (2009) Genomewide association studies and human disease. N Engl J Med 360(17):1759–1768

    Article  Google Scholar 

  • Heilbron D (1989) Generalized linear models for altered zero probabilities and overdispersion in count Data, SIMS Technical Report 9. University of California, San Francisco, Department of Epidemiology and Biostatistics

  • Jung H, Choi H, Park T (2015) Fuzzy mixture model for heaping data. In: Proceedings of the 9th NAUN international conference on applied mathematics, simulation, modelling (ASM ’15), Konya, Turkey, 20–22 May 2015

  • Jung H, Lee W, Yoon J, Choi S (2014) Likelihood inference based on fuzzy data in regression model. In: SCIS & ISIS 2014, IEEE, 1175-1179

  • Kumasaka N, Aoki M, Okada Y, Takahashi A, Ozaki K, Mushiroda T, Kamatani N (2012) Haplotypes with copy number and single nucleotide polymorphisms in CYP2A6 locus are associated with smoking quantity in a Japanese population. PLoS ONE 7(9):e44507

    Article  Google Scholar 

  • Lambert D (2008) Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics 34:1–14

    Article  MATH  Google Scholar 

  • Li MD, Yoon D, Lee JY, Han BG, Niu T, Payne TJ, Park T (2010) Associations of variants in CHRNA5/A3/B4 gene cluster with smoking behaviors in a Korean population. PLoS ONE 5(8):e12183

    Article  Google Scholar 

  • Manolio TA, Brooks LD, Collins FS (2008) A HapMap harvest of insights into the genetics of common disease. J Clin Investig 118(5):1590–1605

    Article  Google Scholar 

  • Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39(7):906–913

    Article  Google Scholar 

  • Mullahy J (1986) Specification and testing of some modified count data models. J Econom 33:341–365

    Article  MathSciNet  Google Scholar 

  • Mullahy J (1997) Heterogeneity, excess zeros, and the structure of count data model. J Appl Econom 12:337–350

    Article  Google Scholar 

  • Najafi Z, Taheri SM, Mashinchi M (2010) Likelihood ratio test based on fuzzy data. Int J Intell Technol Appl Stat 3(3):285–301

    Google Scholar 

  • Rice JP et al (2012) CHRNB3 is more strongly associated with FTCD-based nicotine dependence than cigarettes per day: phenotype definition changes GWAS results, Addiction (Abingdon, England) 107.11 2019

  • The Tobacco and Genetics Consortium (2010) Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat Genet 42(5):443–571

    Article  Google Scholar 

  • Thorgeirsson TE et al (2008) A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature 452(7187):638–642

    Article  Google Scholar 

  • Wang H, Heitjan DF (2008) Modeling heaping in self-reported cigarette counts. Stat Med 27:3789–3804

    Article  MathSciNet  Google Scholar 

  • Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353

    Article  MATH  Google Scholar 

  • Zadeh LA (1968) Probability measures of fuzzy events. J Math Anal Appl 23(2):421–427

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the Bio-Synergy Research Project (2013M3A9C4078158) of the Ministry of Science, ICT and Future Planning through the National Research Foundation and by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI16C2037).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Taesung Park.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jung, HY., Choi, H. & Park, T. Fuzzy heaping mechanism for heaped count data with imprecision. Soft Comput 22, 4585–4594 (2018). https://doi.org/10.1007/s00500-017-2641-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-017-2641-4

Keywords

Navigation