Raw Data Library
About
Aims and ScopeAdvisory Board Members
More
Who We Are?
User Guide
Green Science
​
​
EN
Kurumsal BaşvuruSign inGet started
​
​

About
Aims and ScopeAdvisory Board Members
More
Who We Are?
User GuideGreen Science

Language

Kurumsal Başvuru

Sign inGet started
RDL logo

Verified research datasets. Instant access. Built for collaboration.

Navigation

About

Aims and Scope

Advisory Board Members

More

Who We Are?

Contact

Add Raw Data

User Guide

Legal

Privacy Policy

Terms of Service

Support

Got an issue? Email us directly.

Email: info@rawdatalibrary.netOpen Mail App
​
​

© 2026 Raw Data Library. All rights reserved.
PrivacyTermsContact
  1. Raw Data Library
  2. /
  3. Publications
  4. /
  5. Abstract 5303: Identification of myelofibrosis from electronic health records with novel algorithms and JAKextractor

Verified authors • Institutional access • DOI aware
50,000+ researchers120,000+ datasets90% satisfaction
Article
English
2018

Abstract 5303: Identification of myelofibrosis from electronic health records with novel algorithms and JAKextractor

0 Datasets

0 Files

English
2018
Cancer Research
Vol 78 (13_Supplement)
DOI: 10.1158/1538-7445.am2018-5303

Get instant academic access to this publication’s datasets.

Create free accountHow it works

Frequently asked questions

Is access really free for academics and students?

Yes. After verification, you can browse and download datasets at no cost. Some premium assets may require author approval.

How is my data protected?

Files are stored on encrypted storage. Access is restricted to verified users and all downloads are logged.

Can I request additional materials?

Yes, message the author after sign-up to request supplementary files or replication code.

Advance your research today

Join 50,000+ researchers worldwide. Get instant access to peer-reviewed datasets, advanced analytics, and global collaboration tools.

Get free academic accessLearn more
✓ Immediate verification • ✓ Free institutional access • ✓ Global collaboration
Access Research Data

Join our academic network to download verified datasets and collaborate with researchers worldwide.

Get Free Access
Institutional SSO
Secure
This PDF is not available in different languages.
No localized PDFs are currently available.
Adrian Bejan
Adrian Bejan

Duke University

Verified
Adrian Bejan
Andrew Sochacki
Shilin Zhao
+2 more

Abstract

Myelofibrosis (MF) is a devastating myeloproliferative neoplasm (MPN) hallmarked by marrow fibrosis, extramedullary hematopoiesis, vascular thromboembolism, and ~50% incidence of JAK2V617F. MF is difficult to study in large EHR datasets due to clinical heterogeneity and unreliable ICD coding. The Synthetic Derivative is a cloned and de-identified research EHR with 2.9 million unique patients linked to BioVU, a DNA biorepository. To develop phenotype-genotype associations, we created an algorithm to classify MF, using NLP with negation detection of MF keywords, medications, and ICD coding. To enrich our cohort, we developed JAKextractor, an algorithm to identify patients tested clinically for JAK2V617F across all 248,000 BioVU patients. For MF identification, we trained a supervised learning algorithm to learn decision rules that encode counts of MF-specific ICD codes, medications, text mentions, as well as the assertion status of MF and JAK2 mentions in patient notes. Experiments were evaluated using a 10-fold cross validation scheme. JAKextractor used pattern matching to extract the status (WT vs MUT) of each JAK2 text mention. Machine learning predicted a JAK2V617F patient based on the information extracted in the previous step from patient notes. We subsequently genotyped banked DNA on an enriched subset of MF cases via a Illumina® TruSight myeloid NGS panel to validate JAKextractor. The top performing MF algorithm combined all sources of clinical information and achieved an F1-measure (F1) of 96% and identified 309 MF patients in BioVU. The extracted decision rule for predicting an MF patient was [JAK2V617F ^ ICD>1] v [JAK2WT ^ ICD>1 ^ TXT>3]. ICD is necessary but not sufficient to predict MF identification. Utilizing only ICD counts created a detrimentally lower F1 of 88% (P<0.001). Our MF cohort had a mean age at onset (60.3±12.6), last visit age (63.1±12.1), and JAK2V617F (46.1%). The mean age of MF onset was higher with JAK2V617F (64) compared to JAK2WT (57) (P<0.001). Survival was no different between JAK2V617F and JAK2WT MF cases via log-rank test (P=0.11) with median survival 108 months. 131 MF cases were genotyped with JAK2V617F in 71/131 (54.2%) compared to 66/131 (50.4%) via JAKextractor. Mean JAK2V617F allelic frequency was 0.569 with detection ranging 0.069-0.976. Ten cases displayed disagreement between JAKextractor and NGS. There were 2 FP and 4 FN JAKextractor predictions 6/131(4.6%); 2 true NGS failures, 1 incomplete chart and 1 loss of JAK2V617F over time. NGS detected JAKV617F on MF patients who had not been previously tested 7/131 (5.3%). Our results demonstrated successful identification of MF and JAK2V617F within an EHR. We established the feasibility of creating a MPN database with retrospective genotyping of biobanked DNA. We plan for scaled implementation of similar algorithms across all myeloid disease within BioVU with the ability to retrospectively genotype each case. Citation Format: Cosmin A. Bejan, Andrew Sochacki, Shilin Zhao, Yaomin Xu, Michael Savona. Identification of myelofibrosis from electronic health records with novel algorithms and JAKextractor [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 5303.

How to cite this publication

Adrian Bejan, Andrew Sochacki, Shilin Zhao, Yaomin Xu, Michael R. Savona (2018). Abstract 5303: Identification of myelofibrosis from electronic health records with novel algorithms and JAKextractor. Cancer Research, 78(13_Supplement), pp. 5303-5303, DOI: 10.1158/1538-7445.am2018-5303.

Related publications

Why join Raw Data Library?

Quality

Datasets shared by verified academics with rich metadata and previews.

Control

Authors choose access levels; downloads are logged for transparency.

Free for Academia

Students and faculty get instant access after verification.

Publication Details

Type

Article

Year

2018

Authors

5

Datasets

0

Total Files

0

Language

English

Journal

Cancer Research

DOI

10.1158/1538-7445.am2018-5303

Join Research Community

Access datasets from 50,000+ researchers worldwide with institutional verification.

Get Free Access