Skip to content



Mortality prediction and pathology detection in thoracic imaging screening using deep machine learning techniques



The project concerns the use of artificial intelligence techniques, including deep machine learning methods, to analyse medical X-ray and low-dose CT images from screening for early detection of pathological changes and prediction of risk of death.

Project objective

The aim of the project is to demonstrate that with the help of Trustworthy AI techniques applied to medical image analysis in screening, early detection of pathological changes and assessment of risk of death is possible. The main tasks of the project include the development and validation of artificial intelligence models performing the above tasks and the development of methods for the explainability of model results, in particular with regard to the influence of low-level and high-level image information.



The basic computing technology for deep machine learning methods, particularly in the field of convolutional neural networks (CNNs), is so-called tensor computing. One of the most powerful architectures implementing such computations is the graphics processing units (GPUs). Due to the abundance and size of the data, the size of models, and the need to iterate over multiple computing epochs, supercomputer-class solutions are best suited for this type of project, providing the computational power of GPUs, large host and GPU memory resources, and fast access to stored data and metadata resources.


Issues investigated using AI techniques require large volumes of datasets and metadata (e.g. labels) for training and validation – often in the order of tens or even hundreds of thousands of images and associated records. In the case of medical imaging data from 3D imaging (e.g. CT scans), a single data sample is usually an entire imaging study (so-called series) of the order of 100 MB (e.g. 200 2D 512x512x2B images). A training set of tens of thousands of examinations is a data resource of several TB, and the process of training the model requires multiple uses of the full set. In addition, metadata, and in particular labels, are often assigned at the level of individual sections or individual pixels of an imaging study, requiring adequate database resources to efficiently store and search metadata resources that are several orders of magnitude larger than the number of studies themselves.


The thoracic screening analysis task undertaken in the project was based on a data resource of 90,000 CT image series. Due to the size of the data and the complexity of the trained model, the issue required solving the following problems: efficient metadata management at the single image level (18 million) for experiment planning, efficient access to the datasets, efficient computation and a large GPU memory to accommodate the model and a sufficient number of data samples (so-called batch).

Using the Centre’s resources, all components of the model training process described above were achieved. Using high-speed data resources (SSD/NVME) and in-memory databases (IMDB), a dedicated solution was prepared to integrate the database with the medical image storage system (PACS) for metadata management and image data addressing.

Rysy cluster based on NVIDIA V100 32GB GPU cards was used for the calculations, providing high computational performance and a large model memory resource. The calculations were implemented in Python in the TensorFlow environment.

Due to the high data access intensity (high ratio of data reading to computing time) and the large size of the training and validation set, it was necessary to provide data access from the GPU computing system in a hierarchical model.

Data was stored holistically in the Lustre file system (Tethys) and asynchronously allocated on the high-speed local resources (SSD/NVME) of the computing cluster.

Thanks to the use of HPC infrastructure:

  • the process of training models with higher complexity at higher batch-size values was enabled;
  • the response time of the metadata base was reduced by several orders of magnitude;
  • data access times have been reduced by several orders of magnitude.

The realisation of the calculations assumed in the project’s multiple experiments model only became time realistic due to the above infrastructure improvements.

90 000


18 milion



It is possible to use the results of the project in practice and future implementation to clinical applications subject to extension of validation, demonstration of the credibility of the results and the embedding in measurable clinical benefits in terms of thoracic screening, with particular emphasis on the examination of lung cancer screening.

Information about cookies

The The National Competence Centre in HPC service uses cookies, which are pieces of information stored on users’ devices in the form of small text files. These data confirm that the user has visited the website and allow the recognition of their device to tailor subsequent page views to their preferences. The cookies used by the NCK service do not store any personal user data or information that could identify them; they only recognise the specific device’s browser.

Cookies are essential for the proper functioning of the service and help customise the content of the page to the user’s preferences. They provide statistical data on traffic on the site.

Websites, including the The National Competence Centre in HPC service, typically allow the storage of cookies by default, but this can be changed at any time to block the automatic addition of cookies or to be notified each time they are sent. It is important to note that disabled cookies may disrupt some functions of the website or prevent the use of certain services.

Skip to content