GitHub - Tony-Hao/GenX: A Python tool for Gender Information Identification from Clinical Trial Texts

GenX

a Python tool to extract and structure transgender-related clinical trial eligibility criteria free text from ClinicalTrials.gov

Objective: To develop an automated approach to structuring transgender criteria text for enhancing transgender-related clinical trial recruitment.
Materials and Methods: A gender criteria data model incorporating transgender populations is designed and an automated method called GenX is developed. GenX extracts and standardizes transgender specifications from clinical trial summaries using the proposed data model. GenX is compared to 42 machine learning methods on 105,078 clinical trials.
Results: GenX achieves a macro-averaged precision of 0.82, a macro-averaged recall of 0.823 and a macro-averaged F1-measure of 0.821, outperforming all other 42 baseline methods.
Conclusion: This study addresses the problems of gender information incompleteness and ambiguity, particularly for transgender populations. A new gender data model extending gender type coverage and an automated method GenX for transgender criteria extraction promise to enhance transgender-related clinical trial recruitment. The experiment results demonstrate the effectiveness of GenX in transgender criteria extraction.

Usage


import GenX

Clean text by preprocessing and split text into sentences


GenX.textPreProcessing(text)

text is the original clinical research eligibility free text and this function will return a list of sentences split from that text

Load a set of rules that use to identify and annotate gender information


GenX.loadRule(ruleFile, genderDictionaryFile, catesFile)

ruleFile, genderDictionaryFile and catesFile are corresponding to the files that contain the regular expression rules, gender feature dictionary and the gender categories. Those files are named 'rule.csv', 'genderDic.csv' and 'cates.csv' respectively and are saved in the file named 'rules'

Extract and annotate the gender features in sentences


GenX.genderAnnotator(sentences, rules, pre_label=None)

Using the rules from GenX.loadRule(ruleFile, genderDictionaryFile, catesFile) to process the sentences from GenX.textPreProcessing(text). A set of candidate gender features will be returned. pre_label is the original gender type on ClinicalTrial.gov eligibility. This value can also not be input in this function.

Conclude the gender type


GenX.genderTypeConclusion(genderFeatureList, threshold)

Using the set threshold to conclude the final gender type based on the gender features list from GenX.genderAnnotator(sentences, rulePattern)

Normalize the output gender type


GenX.genderOutputNormalization(gender)

GenX method


GenX.GenX(text, threshold, pre_label=None)

Using GenX method with pre-set threshold to identify the gender type from a clinical trial text, while pre_label is the original gender type on ClinicalTrial.gov eligibility. This value can also not be input in this function.
e.g.,the text "Biologically male, including male-to-female transgender women" will be identified as ['Transgender Female', 'Biological Male'] with threshold 7.

GenX method in clinical trials

Using GenX method with a threshold to process a list of clinical trials, while the input trial data saved in the file 'clinicaltrial' and output data in the same file.
fileIn is the input csv file which contains a list of clinical trials with their clinical trial ID, original gender type in eligibility, study title, study description and eligibility criteria.
fileOut is the output csv file which contains a list of clinical trials with their clinical trial ID and the gender type annotated by GenX


processByGenX(fileIn, fileOut, threshold)

Citation

Hao T, Chen B, Qu Y. An Automated Method for Gender Information Identification from Clinical Trial Texts[C]// International Conference on Health Information Science. Springer International Publishing, 2016:109-118.

Contributors

Tianyong Hao
Boyu Chen

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
clinicaltrial		clinicaltrial
kernel/NLP		kernel/NLP
rules		rules
.gitattributes		.gitattributes
GenX.py		GenX.py
Gior.py		Gior.py
LICENSE		LICENSE
README.md		README.md
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenX

Usage

Clean text by preprocessing and split text into sentences

Load a set of rules that use to identify and annotate gender information

Extract and annotate the gender features in sentences

Conclude the gender type

Normalize the output gender type

GenX method

GenX method in clinical trials

Citation

Contributors

About

Releases

Packages

Languages

License

Tony-Hao/GenX

Folders and files

Latest commit

History

Repository files navigation

GenX

Usage

Clean text by preprocessing and split text into sentences

Load a set of rules that use to identify and annotate gender information

Extract and annotate the gender features in sentences

Conclude the gender type

Normalize the output gender type

GenX method

GenX method in clinical trials

Citation

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages