Magnet.me  -  The smart network where students and professionals find their internship or job.

The smart network where students and professionals find their internship or job.

Assignment: Research on the Use of Large Language Models for Pathology Reports

Posted 3 Dec 2024
Share:
Work experience
0 to 1 years
Full-time / part-time
Full-time
Job function
Salary
€600 per month
Degree level
Required languages
English (Good)
Dutch (Fluent)

Your career starts on Magnet.me

Create a profile and receive smart job recommendations based on your liked jobs.

The rise of large language models (LLMs) has opened new possibilities for processing and analysing medical texts, including pathology reports. These reports often contain complex medical terminology and detailed information that is crucial for making diagnoses and developing treatment plans. This research will investigate the effectiveness of various large language models in processing pathology reports in Dutch, and compare these models with traditional techniques such as a keyword matching approach, to evaluate which method is best suited for extracting useful data from these reports.

Objective of the Assignment:

The aim of this assignment is to research which large language model performs best at extracting relevant information from Dutch pathology reports and how these models compare to a keyword matching approach. You will also need to analyse the capabilities and limitations of both approaches, with a focus on their ability to handle Dutch-language reports. Additionally, there may be a need to train or fine-tune the model to improve its accuracy with Dutch pathology data.

Assignment Description:

  1. Literature Review
  2. Conduct a thorough literature review on the use of large language models in the medical field, with a specific focus on pathology reports in Dutch. Describe the advantages and disadvantages of various LLMs that could be used in this context, and discuss their capability to handle medical terminology in the Dutch language.
  3. Comparison of LLMs
  4. Select at least three different large language models (such as GPT, BERT, BioBERT) and evaluate them based on their performance in extracting information from Dutch pathology reports. Consider the following factors:
  • Accuracy of extraction
  • Ability to correctly interpret Dutch medical terminology
  • Processing speed
  • Amount of training data required (especially for Dutch-language reports)
  • Whether the model needs to be trained or fine-tuned on Dutch pathology reports to improve performance
  1. Training the Model (Possibly)
  2. If deemed necessary, train or fine-tune one or more of the selected large language models specifically for Dutch pathology reports. This may involve the following steps:
  • Preprocessing of pathology report data
  • Fine-tuning the model with labelled examples (if available )
  • Evaluating the model’s performance post-training

4. Comparison

Compare the performance of the selected LLMs with a keyword matching approach. Perform tests where both methods are used to analyse the same Dutch pathology reports and evaluate:

  • The quality of extracted data
  • The number of errors or missing data
  • Ease of implementation

Bij Performation hebben we voor elke afdeling binnen de zorgorganisatie een oplossing, gebaseerd op één databron als stabiele kracht. Continu werken we aan verbeteringen om de besturing nog efficiënter te maken. Daarbij zijn innovatie, optimalisatie en doelmatigheid onze drijfveren. Wij optimaliseren de bedrijfsvoering van de zorg, zodat zij zich kunnen richten op patiëntenzorg.

IT
Zeist
Active in 3 countries
160 employees
40% men - 60% women
Average age is 35 years