“High-throughput screening of catalytically active inclusion bodies using laboratory automation and Bayesian optimization”

Background In recent years, the production of inclusion bodies that retain substantial catalytic activity was demonstrated. These catalytically active inclusion bodies (CatIBs) are formed by genetic fusion of an aggregation-inducing tag to a gene of interest via short linker polypeptides. The resulting CatIBs are known for their easy and cost-efficient production, recyclability as well as their improved stability. Recent studies have outlined the cooperative effects of linker and aggregation-inducing tag on CatIB activities. However, no a priori prediction is possible so far to indicate the best combination thereof. Consequently, extensive screening is required to find the best performing CatIB variant. Results In this work, a semi-automated cloning workflow was implemented and used for fast generation of 63 CatIB variants with glucose dehydrogenase of Bacillus subtilis (BsGDH). Furthermore, the variant BsGDH-PT-CBDCell was used to develop, optimize and validate an automated CatIB screening workflow, enhancing the analysis of many CatIB candidates in parallel. Compared to previous studies with CatIBs, important optimization steps include the exclusion of plate position effects in the BioLector by changing the cultivation temperature. For the overall workflow including strain construction, the manual workload could be reduced from 59 to 7 h for 48 variants (88%). After demonstration of high reproducibility with 1.9% relative standard deviation across 42 biological replicates, the workflow was performed in combination with a Bayesian process model and Thompson sampling. While the process model is crucial to derive key performance indicators of CatIBs, Thompson sampling serves as a strategy to balance exploitation and exploration in screening procedures. Our methodology allowed analysis of 63 BsGDH-CatIB variants within only three batch experiments. Because of the high likelihood of TDoT-PT-BsGDH being the best CatIB performer, it was selected in 50 biological replicates during the three screening rounds, much more than other, low-performing variants. Conclusions At the current state of knowledge, every new enzyme requires screening for different linker/aggregation-inducing tag combinations. For this purpose, the presented CatIB toolbox facilitates fast and simplified construction and screening procedures. The methodology thus assists in finding the best CatIB producer from large libraries in short time, rendering possible automated Design-Build-Test-Learn cycles to generate structure/function learnings. Supplementary Information The online version contains supplementary material available at 10.1186/s12934-024-02319-y.

Table S1: List of all constructed plasmids.The sign "/" symbolize that each of the sequences were integrated into a vector backbone respectively."N" and "C" symbolize the enzyme terminus that is fused with the linker and aggregation-inducing tag.
Vector Genotype

Overview of semi-automated CatIB construction workflow
To start the CatIB cloning process, plasmids containing the GGA fragments, i.e., the gene of interest, linker and aggregation-inducing tag sequence, were commercially synthesized by Synbio Technologies (Monmouth Junction, US).Competent E. coli DH5α and E. coli BL21(DE3) cells were prepared manually, because it was conducted only once in a large batch.The first transformation step of E. coli DH5α to multiply the copy number of the synthesized plasmid was automated using the Opentrons OT-2 liquid handling system together with an integrated thermocycler for heat-shocking and cooling of the cells.After transformation, 5 µL of the cells were spotted on an agar plate with the robotic platform.The transformed cells were used to purify the multiplied plasmid via an accelerated plasmid preparation by attaching the preparation filters on a vacuum station instead of performing several centrifugation steps.
Automation of this step was challenging because plasmid amounts after automated purification were too low for efficient GGA.However, the acceleration of this step already led to a timesaving of approximately 20 %.The master mix for GGA was set up by mixing the required enzymes and buffer with Milli-Q ® water.This short step was performed manually to ensure a remaining activity of the heat-sensitive reagents.GGA was performed again with the Opentrons OT-2 system in combination with the integrated thermocycler for the incubation of BsaI restriction enzyme or T4-ligase.The following transformation of E. coli DH5α with assembled GGA constructs was performed manually because the volumes for transformation in the integrated thermocycler was limited to 200 µL.However, to ensure successful transformation after GGA, a minimum of 1 mL batches were required.The following plasmid preparation and the retransformation of E. coli BL21(DE3) to generate the CatIB production strains could be accelerated or automated as described before.

Validation of the overall CatIB workflow after optimization
For validation of the workflow with optimized cultivation and purification steps as described in the main manuscript, E. coli BL21(DE3) with BsGDH-PT-CBDCell was cultivated in a FlowerPlate with 48 biological replicates.After cultivation, the CatIBs of each well were separately purified in an automated manner (see Automated protein production and protein purification in the main manuscript) and the automated enzymatic assay was performed.
Instead of enzyme inactivation with methanol, the enzymatic assay was measured online with a 20-fold diluted CatIB suspension.The enzymatic assay was performed with 42 biological replicates and 6 purified CatIB samples were analyzed as controls without substrate addition (Figure S2).S4.

Comparison of cultivation conditions with microscopic analysis
To further investigate the influence of standard CatIB cultivation conditions (3 h at 37 °C and 69 h at 15 °C) compared to the novel cultivation conditions (72 h at 25 °C), microscopic images were generated using a 1,000x magnification.Cells with successful CatIB production were counted and compared to cells without CatIB formation (Figure S8).

Analysis of volumetric activity for selected BsGDH variants
Three CatIB variants (BsGDH-PT-3HAMP, BsGDH-PT-CBDCell, BsGDH-PT-18AWT) were tested for their specific volumetric activity Pv at the two different cultivation conditions (shift from 37 °C to 15 °C or constant 25 °C) to test the influence of the temperature on the specific Pv (Figure S9).To determine the productivity of the CatIBs, the tested strains were produced in a larger scale in shake flasks.A manual purification and enzyme assay were performed and the CatIB dry weight was determined [2].
The results for all three CatIB variants showed that the cultivation at 25 °C positively influenced the productivity of the CatIBs.Even if the specific Pv of BsGDH-PT-18AWT was lower than 10 g L -1 d -1 gCatIB -1 , this variant in general only showed activity after 25 °C cultivation.Moreover, with a 25 °C cultivation, the specific Pv of BsGDH-PT-3HAMP was increased by approx.40 % and BsGDH-PT-CBDCell was increased by approx.18 %.

Mathematical description of process and calibration model
According to Bayes' theorem, posterior probabilities can be calculated using prior probabilities  ,/%# = (, , , ) =  + • P1 −  67 !""!# (9,;,<) • @(A)B S ℒU ,$ | (2! X =  3$ ( (2! ,  ,/%# ) HalfNormal and LogNormal indicate the choice of a probability distribution defining the prior belief in the respective parameter.Indices , ,  and  refer to the column in the assay microtiter plate, the cultivation well, the well in the microtiter plate of the assay, and the CatIB variant, respectively.The prior for  0&/1&'" is defined by the mean and the standard deviation of the population, which are expressed as hyperpriors  $%&' and  !"# .Since the reaction is started column-wise by addition of the reaction substrate, a column-specific time offset  ())!%" is modelled, which was measured experimentally to be around 17 seconds per columns.
Similarly, the time to put the plate to the reader was measured as 20 seconds.These measurements were chosen as the mean of the respective prior distributions.cf_nadh_assay reflects the pipetting error during dilution.The batch effect between biological replicates is represented by the respective parameter and is dependent on the cultivation well .In the prediction of the product concentration in a specific well  in the microtiter plate of the assay, the effect of the priors for the variants, the time offset, the pipetting error and the batch effect are combined.
In order to derive posterior distributions for all mentioned parameters of the process model  ,$ in Markov chain Monte Carlo Sampling, a likelihood function ℒ needs to be defined.While homoscedastic, normally distributed measurement noise is often assumed if no knowledge exists, the likelihood function can also be calibrated using known concentrations and measurement readouts.Using this calibration data, the relationship between independent and dependent variable can be modeled as a separate function, which we call calibration model.
More details and theory on the methodology of process and calibration models can be found in [3].
In this study, we fitted a calibration model  3$ that describes the exponential relationship between fluorescence observation  (2! and predicted concentration  ,/%# , obtained from the process model.We applied the Python package calibr8 to fit the model to the calibration data.An analysis plot of the resulting calibration model is given in Figure S10.Essentially, the left panel shows the trend between the independent variable (NADH concentration) and the dependent variable (fluorescence readout).An exponential function was chosen to model this trend, reflecting that the relationship between concentration and readout is non-linear.In addition, the measurement error is increasing with the NADH concentration, which is why a linear function was chosen to describe the width of the likelihood bands in dependency of the NADH concentration.The right panel with the residuals between model and data shows mostly random scattering, with a stronger deviation at around 0.17 mM NADH.Overall, the chosen calibration model describes the data well and was thus applied as a likelihood function in the Bayesian process model.The code to reproduce the calibration model is provided in the accompanying GitHub repository.

Figure S1 :
Figure S1: Overview of 76 tested BsGDH-CatIB combinations and BsGDHWT control using semiautomated cloning workflow.The blue marked constructs were verified by sequencing.The red marked constructs revealed incorrect sequencing results or no colonies were formed at the end of the cloning process.

Figure S2 :
Figure S2: NADH fluorescence of BsGDH-PT-CBDCell replicates as a validation study.The CatIBs were purified from 650 µL cells.The enzymatic assay was performed with 40 mM TAE (pH 7), 200 mM glucose and 0.4 mM NAD + .The resuspended and 20-fold diluted CatIBs and the enzyme assay solution were preheated at 40 °C for 10 min before mixing.The final reaction volume was 250 µL.The reaction was performed at 37 °C for 180 min and measured online with an excitation wavelength of 340 nm and an emission wavelength of 470 nm.For testing of the enzyme activity, 42 biological replicates were analyzed, and 6 replicates were used as a negative control without the addition of substrate.One column corresponds to six wells of a FlowerPlate.

Figure S3 :
Figure S3: Microscopic images of strains producing BsGDH-CatIBs with different linker/aggregationinducing tag combinations tagged at the C-Terminus of the enzyme and BsGDHWT.Percentage of CatIB producing cells for each variant is displayed.Phase contrast microscopy was conducted with a 1000fold magnification.All strains were cultivated for 72 h at 25 °C in M9 AI medium.

Figure S4 :
Figure S4: Microscopic images of strains producing BsGDH-CatIBs with different linker/aggregationinducing tag combinations tagged at the N-Terminus of the enzyme.Percentage of CatIB producing cells for each variant is displayed.Phase contrast microscopy was conducted with a 1000-fold magnification.All strains were cultivated for 72 h at 25 °C in M9 AI medium.

Figure S5 :
Figure S5: Microscopic images of strains producing BsGDH-CatIBs with different lengths of glycine or proline linker tagged at the C-and N-Terminus of the enzyme.Percentage of CatIB producing cells for each variant is displayed.Phase contrast microscopy was conducted with a 1000-fold magnification.All strains were cultivated for 72 h at 25 °C in M9 AI medium.

Figure S6 :
Figure S6: Microscopic images of strains producing BsGDH-CatIBs with different lengths of L6KD tag linker tagged at the C-and N-Terminus of the enzyme.Percentage of CatIB producing cells for each variant is displayed.Phase contrast microscopy was conducted with a 1000-fold magnification.All strains were cultivated for 72 h at 25 °C in M9 AI medium.

Figure S7 :
Figure S7: Evaluation of 63 BsGDH-CatIB formation and BsGDHWT by SDS-PAGE analysis.After cultivation, the cells were disrupted and the crude cell extract was separated by centrifugation into the soluble protein containing supernatant and the insoluble CatIB-containing pellet fractions.The pellet fraction was washed once with Milli-Q ® water.The samples were diluted 1:1 with SDS sample buffer and 15 µL of each sample was loaded onto the gel and stained with SimplyBlue TM SafeStain.The molecular mass of the wildtype BsGDH is 28 kDa.The molecular masses of tags and linkers is listed in TableS4.

Figure S8 :
Figure S8: Influence of cultivation temperature on CatIB formation analyzed via microscopy.The cultivations were performed at 25 °C (72 h) or 37 °C (3 h) + 15 °C (69 h) with M9 AI medium in a FlowerPlate.Microscopic images of all 63 strains were generated using a 1,000x magnification.The numbers indicate the percentage of CatIB producing cells given the overall number of counted cells.

Figure S9 :
Figure S9: Influence of cultivation temperature on specific volumetric productivity of BsGDH-CatIBs.The standard deviation of the technical triplicates for all CatIB variants were calculated.

Figure S10 :
FigureS10: Calibration model that describes a normally distributed measurement error for measured NADH fluorescence in the assay.The mean of the normal distribution is described by an exponential trend between true concentration and fluorescence.The standard deviation is linearly increasing with the NADH concentrations, reflecting higher errors for increased NADH concentrations towards the upper detection limit of the plate reader.More details on calibration models can be found in[3].

Table S4 :
Expected molecular weights for linker and tag fragments.