The model with the best precision F
The model with the best precision, F1-score, and accuracy did not use machine learning, but rather leveraged list structures. With respect to the manual gold standard the list model achieved 90.14 precision, 60.69 recall, 75.41 F1-score, and 92.60 accuracy. The list model also provided optimal performance with respect to the silver standard where the outcomes predicted out-performed the machine learning classifiers with respect to precision, F-score, and accuracy across all collections. It is important to note that the list model achieved these results even though only half of the abstracts in the training set included a list with one of the initial survivorship 6-NBDG terms.
The initial set of 2082 seed terms produced more than 4000 out-comes in the training set that included both health effectiveness (e.g. resistance, progression, and incidence) and adverse effects (e.g. toxicity, death, and safety). The outcomes for the entire collection were identified using the silver standard which enabled us to compare three different treatment strategies. Some outcomes such as response, survival, toxicity, overall survival were frequently reported by all three treatments, but surrogate endpoints such as disease free survival and progression free survival differed with respect to the treatment type. The difference in relative frequencies of outcome reporting reflected both the biological underpinnings of the treatment and the way in which a treatment is used when treating breast cancer.
Taken together these results show that an informatics approach can be used to automatically capture outcomes reported in a study and to our knowledge this paper is the first to quantify the number of outcome measures reported in the method section of MEDLINE abstracts. Although further work is required to quantify the change in these measures, and to extract the specific values, the approach presented in this paper brings us one step closer to providing the tools needed by physicians as they see patients in a clinical setting and researchers who are striving to systematically review the literature. Journal of Biomedical Informatics: X 1 (2019) 100005
Declaration of interests
The authors declared that there is no conflict of interest.
This material is based upon work supported in part by the National Science Foundation under Grant No. 1535167. We thank the reviewers for their thoughtful comments and suggestions.
 Guidance for Industry Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics, F. a. D. A. U.S. Department of Health and Human Services, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER), 2007.
X. Paoletti, M. Mauer, S. Mathoulin-Pelissier, F. Bonnetain, Protocol of the Definition for the Assessment of Time-to-event Endpoints in CANcer trials (DATECAN) project: formal consensus method for the development of guidelines for standardised time-to-event endpoints' definitions in cancer clinical trials, Eur. J. Cancer 49 (4) (2013) 769–781.
G. MacGrogan, S. Michiels, I. Negreiros, B.V. Offersen, F. Penault Llorca, G. Pruneri, H. Roche, N.S. Russell, F. Schmitt, V. Servent, B. Thurlimann, M. Untch, J.A. Van der Hage, G. Van Tienhoven, H. Wildiers, J. Yarnold, F. Bonnetain, S. Mathoulin-Pelissier, C. Bellera, T.S. Dabakuyo-Yonli, I. Definition for the Assessment of Time-to-event Endpoints in Cancer Trials, Guidelines for time-to-event end point defini-tions in breast cancer trials: results of the DATECAN initiative (Definition for the Assessment of Time-to-event Endpoints in CANcer trials)dagger, Ann. Oncol. 26 (5) (2015) 873–879.
 S.R. Jonnalagadda, P. Goyal, M.D. Huffman, Automating data extraction in sys-tematic reviews: a Serum dependence systematic review, Syst. Rev. 4 (2015) 78.  G. Karystianis, K. Thayer, M. Wolfe, G. Tsafnat, Evaluation of a rule-based method for epidemiological document classification towards the automation of systematic reviews, J. Biomed. Inform. 70 (2017) 27–34.  S.N. Kim, D. Martinez, L. Cavedon, L. Yencken, Automatic classification of
C. Blake and R. Kehm
 S. Kiritchenko, B. de Bruijn, S. Carini, J. Martin, I. Sim, ExaCT: automatic extraction of clinical trial characteristics from journal publications, BMC Med. Inform. Decis Mak 10 (2010) 56.  F.P. Lin, T. Groza, S. Kocbek, E. Antezana, R.J. Epstein, Cancer care treatment outcome ontology: a novel computable ontology for profiling treatment outcomes in patients with solid tumors, JCO Clin. Cancer Inform. 2 (2018) 1–14.  Y. Liu, R. Bill, M. Fiszman, T. Rindflesch, T. Pedersen, G.B. Melton, S.V. Pakhomov, Using SemRep to label semantic relations extracted from clinical text, in: AMIA Annu. Symp. Proc. 2012, 2012, pp. 587–595.  A. Lucic, C. Blake, Improving Endpoint Detection to Support Automated Systematic Reviews, American Medical Informatics Association Symposium, Chicago, IL, 2016.