RadOnc News

Data Sharing and Dragon Slaying

Access to large amounts of high-quality data is a formidable roadblock on the path to best outcomes with AI.

Data in an AI-driven World

Every day 2.5 million terabytes of new data are produced by humanity. Companies like Facebook, Google, and Amazon were built on processing volumes of data and turning this data into meaningful information. Or at least, the information is meaningful enough to allow artificial intelligence (AI) algorithms to suggest a friend to add to your network, or to recommend the sprinkler to buy the kids for summertime fun.

But it might be a while before we have the same level of predictive power in radiation oncology. While we produce enormous volumes of data in the clinic, there are several roadblocks preventing its use in sophisticated models for patient benefit. The ability of these models to generalize the principles learned on training data—effectively and accurately—is largely predicated on the volume and quality of the data used.

There are a couple of problems here. First, accurately labeled data sets are needed. Quality labeled data is hard to come by, but it allows the model builders to create a level of ground truth and determine the accuracy of the model’s predictions relative to the human expertise that is encapsulated in the labels. Second, sufficiently large datasets with diverse samples are required. In a simplistic example, if a dog breed recognition model is trained on a set that only includes three breeds of dogs, it will only be able to detect those three breeds accurately.

Having sufficient high-quality data is the linchpin for success in solving problems with AI. Yet, while there are virtually limitless volumes of data, how do we (1) access the data and (2) sort out the incomplete, erroneous, or meaningless data?


The Problem – Data Silo Dragons

In our field, there are massive amounts of data within every clinic. Clinics handle everything from initial consultation to discharge and follow-up, and all the treatment data generated is reviewed, organized, and stored either locally or on the cloud with access limited to the institution or specific individuals. In computer science, there is a term for this: data silo. Data silos exist across all industries and even within departments under the same roof.
I like to think of the obstacles that prevent this valuable repository of data from being utilized to enable research and development of tools that will improve the quality, efficiency, and safety of patient care as the “data silo dragon”. Each clinic has one. The obstacles that form the dragon fundamentally have two types: Technical and Bureaucratic.

Technical Obstacles:

  • How can Protected Health Information (PHI) remain protected through de-identification while still making enough data available to be useful?
  • How can the data be extracted efficiently and (if necessary) transformed into a format that can be used for training AI solutions? Clearly if it takes weeks of manual work from clinical staff to extract the data, it’s not going to happen.
  • How can the data be efficiently cleaned and standardized? For example, if labeling is inconsistent, the data must be modified to match a standard convention (like TG-263).


Bureaucratic Obstacles:

  • How can institutional barriers to data sharing be addressed?
    • Is legal review and a data-use agreement necessary? Who in the institution can “sign off” on sharing the institution’s data?
    • What does the institution receive in exchange for sharing their valuable data?
    • What concerns do members of the clinical team have when it comes to data sharing? Are those concerns valid and/or is there a technical solution that would address them?
Before diving into how we can slay the data silo dragon, I hope to clearly convey why we should all join in this crusade.

Who “wins” when data is shared?

Everybody. Here’s why:

1. The patient wins because they will be able to be treated faster, more efficiently, and with higher quality as better tools are developed. They also win because shared data enables research that leads to better decision-making by clinicians which can save more lives.

2. The clinic wins because the solutions that are developed with their data are better able to account for the nuances of their treatment approaches (i.e. the AI becomes more general). They will be able to treat patients much sooner after their initial consultation with a radiation oncologist. The tools will ultimately enable them to implement advanced treatment approaches such as adaptive radiotherapy.

3. Vendors (like Radformation) win because they are able to develop novel products that enable better and more efficient patient care.

Please consider this: If indeed our purpose as members of the medical community is to provide the best possible care to our patients and there is a resource that sits idle which is absolutely necessary to progress in that endeavor and over which we have control, do we not have an obligation to do what we can to make the best possible use of that resource to improve patient care?


How to Slay the Dragon – Radformation’s Approach

At Radformation, we have expended great effort to solve the Technical obstacles described above when it comes to developing AutoContour, our automatic segmentation product. We have made it extremely easy for institutions to share data by creating an application that allows our clinical partners to automatically and securely upload anonymized data sets for review and inclusion in our training data sets. Setting up and running this application takes less than 30 minutes of their time.
Once we receive the image data, our dedicated Autocontour team reviews, cleans, and approves labels for inclusion in our training set. This shifts the burden of including accurate, high-quality data onto our team. By constantly increasing the size of our training pool, our models will continue to improve which benefits everyone who has the product.

So what about the Bureaucratic obstacles? If you are a clinician with access to a data silo, you are in a unique position to help slay the dragon. You understand the clinic’s challenges, and you understand the importance of data in developing accurate tools to improve patient care. What can you do?


I have a few recommendations:

1.  Proper de-identification causes the data to no longer being protected under HIPAA and GDPR.

“In most jurisdictions, including the European Union, anonymisation is considered a permitted use. This means that it is not necessary to obtain patient consent to anonymise the data.”

In the US, state laws vary and only New Hampshire specifies that medical records are property of the patient. Other states either have no specific law or specify that medical records are property of the provider or hospital.

2.  Find out if there are any specific policies within your institution for sharing data and push for a streamlined data-sharing pipeline in your institution such that when an opportunity presents itself (e.g., a clinical trial or a collaboration with a vendor), it does not take inordinate amounts of time and resources to get final approval or a data-use agreement set up.

3.  Find the best outlets to share data and connect with them. This might be vendors (like Radformation) or perhaps clinical trials. In the recently published HyTEC introduction, there is an entire section dedicated to “Opportunities for Better Data.” The “idealized future state” envisioned in that article will only happen if we work together with “a concerted multi-pronged approach (e.g., involving vendors, administrators, providers, etc).” We must understand that the insights and evidence currently hoarded by the data silo dragons have the power to literally save lives – slaying these dragons won’t be easy, but “we should not shirk from this challenge.”



From Radformation’s perspective, the hardest part about slaying the dragon is not the Technical obstacles. The biggest challenge is changing the culture around data sharing. Sequestering data in private vaults prevents large-scale analysis of clinical practices, decreases confidence in findings, prevents the development of better tools, and limits the quality of care that patients can receive.

At Radformation, we are pioneering ways to enable clinics and their patients to benefit from sharing data, and our new product AutoContour represents a great opportunity for clinics to collaborate with us in that endeavor. If you want a new style, structure, support for special cases (HDR, contrast scans, synthetic CT, etc.), or a structure to perform better on your data, adding that functionality to AutoContour starts with the contribution of data using our automated anonymized data export tools. The development of Autocontour simply could not have happened without the aid of our outstanding clinical partners who faced down their data silo dragons with us. We hope you join in this crusade to use your data for the development of tools that push radiation oncology into the future.



If you are interested in becoming a clinical partner, reach out to us at info@radformation.com.


Leave a comment