
Term extraction from medical documents using word embeddings

, , und .


In this paper we present a new method for the extraction of discipline-specific terms from medical documents. Due to the small text corpora and the specific nature of medical documents, there are limitations for approaches that are solely based on term frequencies. A combination of such methods with procedures that are sensitive to semantic aspects is therefore promising. We use word embeddings in a neighborhood context based method which we call Snowball because of its layerwise way of working. Snowball is integrated together with established methods into an end to end pipeline with which we can process documents to extract relevant terms. Proof of concept is given on a gold standard created recently together with experts in medical coding. The preliminary results highlight the feasibility of our approach and its potential for automated, machine learning based text processing in the medical context.



  • @lepsky

Kommentare und Rezensionen