From Beamline to Browser: Automating High-Throughput X-ray Imaging with WEBKNOSSOS at EMBL P14

March 2026

Illustration

Angelika Svetlove, member of the X-ray imaging team at EMBL P14

At the European Molecular Biology Laboratory (EMBL) in Hamburg, the X-ray imaging team working at the P14 beamline on Petra III at DESY has established a fully automated pipeline that transfers reconstructed synchrotron X-ray tomography datasets directly into a browser-based 3D environment powered by WEBKNOSSOS.

We spoke with Angelika Svetlove, a member of the X-ray imaging team who has been responsible for engineering of this pipeline, about the challenges of managing terabytes per experiment, and why data analysis — not acquisition — is now the real bottleneck.

Can you briefly describe your role and what P14 focuses on?

I work at the P14 X-ray imaging beamline at EMBL Hamburg. We operate as a user facility: scientists both internal to the organisation and external apply for beam time, come to Hamburg, and acquire their data here.

Our core expertise is in X-ray imaging at high throughput. We focus on technical development, automation, and making sure users can access their reconstructed data as quickly and easily as possible — ideally during beam time or immediately after they leave.

Personally, I started working on X-ray imaging methods and their application in biology. Over time, I transitioned more into data engineering. We realized that while we were very good at producing data, our data flow and management infrastructure needed improvement. So I stepped into that gap.

How much data are we talking about?

A single tomography dataset is typically around 28–60 GB. But that’s just the beginning. We can produce up to 12 tomograms per hour, and beam time can run for 24 hours straight. So a single user can easily generate multiple terabytes in one session.

For large stitched volumes — multiple tiles combined into a bigger 3D dataset — we’re talking several terabytes per dataset. And most of our users are biologists or medical researchers. They’re not necessarily equipped with workstations that can open 60 GB volumes, let alone terabytes. Downloading everything “just to see what’s inside” is unrealistic.

That was our bottleneck.

What was the situation before WEBKNOSSOS?

Internally, we had optimized reconstruction extremely well. Members of the team have written in-house software that reconstructs raw data automatically into TIFF stacks almost immediately after acquisition. Users could inspect their data on-site with our local tools.

But once they went home, the data remained on our storage in Hamburg. They had to download it — often blindly — and try to work with it locally. In many cases, they didn’t even have the hardware to store it, let alone open it. So although we were fast in producing data, access outside the facility was a major limitation.

How did WEBKNOSSOS change that?

We discovered WEBKNOSSOS through collaborators and saw it in action at the beamline. It became clear that it could solve our problem exactly: enabling remote, browser-based inspection and analysis of massive 3D datasets.

The key idea was full automation.

Every reconstruction at P14 runs automatically. So we designed the pipeline such that once reconstruction finishes, the dataset is automatically converted into a WEBKNOSSOS-compatible format and uploaded to a dedicated WEBKNOSSOS server. We set up a dedicated machine that handles everything related to WEBKNOSSOS and nothing else.

For single tomograms, conversion happens immediately after reconstruction. For stitched datasets, the stitching software triggers the conversion once it completes. Depending on size, data appears in WEBKNOSSOS within minutes: two to five minutes for standard datasets, around 20–30 minutes for multi-terabyte stitched volumes.

How did you handle user management and permissions?

This was one of the more complex parts. When a user is granted beam time, they receive a beamline account. From that account, we generate individual access that allows automated actions on the WEBKNOSSOS machine.
We designed a structured data management scheme:
Each user automatically gets a dedicated folder in WEBKNOSSOS.● Within that, subfolders are created per beam time date.● Datasets are automatically sorted into the correct location.● A corresponding team is automatically created for that user.● Permissions are assigned so users only see their own data.
The beamline checks existing teams and folders via the API. If something doesn’t exist, it creates it. Manual intervention is only required when a user needs access to datasets across multiple teams, which we can do on request.
In parallel to folder and permission management, metadata from the beamline is automatically propagated into each WEBKNOSSOS dataset. This includes acquisition parameters and technical settings relevant for reproducibility and publication. The metadata is structured in a way that allows users to reproduce experimental conditions later, even if they did not document everything during beam time. We also automatically write all metadata into a Google Sheet as an additional structured record. If users forget to document something during beam time, we still have it stored.

Illustration

WEBKNOSSOS interface: datasets organized in folders, permissions assigned to teams and users, metadata directly propagated to dataset.

Were there challenges integrating WEBKNOSSOS with your IT environment?

Yes: especially authentication. Our organisation requires two-factor authentication. WEBKNOSSOS doesn’t natively support our specific 2FA setup. So we built a custom authentication layer on top. We essentially replaced the generic login with our own interface that authenticates against our internal system.

For external users — which are about 90% of our audience — we had to implement a new protocol for account creation, group assignment, and 2FA setup. It was a big project, but now it works smoothly.

How do users continue working with the data?

Our users can continue working on their data in WEBKNOSSOS. They can browse, annotate, and segment directly on our server. Also, instead of downloading a full TIFF stack, users can define a bounding box in WEBKNOSSOS and download only the region they care about. That’s extremely useful for multi-terabyte datasets. For more advanced users, we provide Python scripts to query specific regions directly from WEBKNOSSOS without full downloads.

We keep two copies of the data: the WEBKNOSSOS-Zarr dataset and the TIFF stack. Most commercial software is not able to work with OME-Zarr yet, which forces us to keep the TIFF stacks as well.

What’s the biggest benefit of this automation for the facility?

Immediate, remote access to data. Users can leave Hamburg and continue inspecting their datasets the same day without transferring large volumes of data.

And from a facility perspective, everything is automated. No one has to manually convert, upload, organize, or assign permissions.

It also makes us relatively unique. We focus on high-throughput imaging — fixed optics, extremely fast acquisition, very large numbers of datasets. We may not compete on the highest possible resolution, but we do one thing very fast and very reliably.

What is the biggest bottleneck now?

Data analysis. We can generate data much faster than people can analyze it.

Unlike electron microscopy, where you repeatedly see similar subcellular structures, we operate at tissue and organism scale. Every project can look completely different. That makes generic analysis pipelines difficult.

Machine learning will be critical here — but we’re still waiting for more mature, flexible solutions. We’re currently working on integrating WEBKNOSSOS datasets with external machine learning–based segmentation tools via Zarr links. The goal is to create a seamless workflow where users can move from acquisition to analysis with minimal friction.

What’s next?

We would like to integrate the WEBKNOSSOS datasets into external ML-based segmentation tools, enabling users to continue with automated analysis seamlessly.

Long term, we also want to avoid maintaining duplicate data representations and move to a unified Zarr format, generated directly from the beamline. Unfortunately, many commercial analysis tools still cannot read Zarr. Until the ecosystem catches up, the TIFF stacks remain necessary.

Many thanks to Angelika for taking the time to explain the technical details behind this pipeline and for sharing insights into how high-throughput imaging facilities are adapting to growing data demands. Read more about WEBKNOSSOS for imaging facilities here.