Machine learning enhanced evaluation of semiconductor quantum dots

A key challenge in quantum photonics today is the efficient and on-demand generation of high-quality single photons and entangled photon pairs. In this regard, one of the most promising types of emitters are semiconductor quantum dots, fluorescent nanostructures also described as artificial atoms. The main technological challenge in upscaling to an industrial level is the typically random spatial and spectral distribution in their growth. Furthermore, depending on the intended application, different requirements are imposed on a quantum dot, which are reflected in its spectral properties. Given that an in-depth suitability analysis is lengthy and costly, it is common practice to pre-select promising candidate quantum dots using their emission spectrum. Currently, this is done by hand. Therefore, to automate and expedite this process, in this paper, we propose a data-driven machine-learning-based method of evaluating the applicability of a semiconductor quantum dot as single photon source. For this, first, a minimally redundant, but maximally relevant feature representation for quantum dot emission spectra is derived by combining conventional spectral analysis with an autoencoding convolutional neural network. The obtained feature vector is subsequently used as input to a neural network regression model, which is specifically designed to not only return a rating score, gauging the technical suitability of a quantum dot, but also a measure of confidence for its evaluation. For training and testing, a large dataset of self-assembled InAs/GaAs semiconductor quantum dot emission spectra is used, partially labelled by a team of experts in the field. Overall, highly convincing results are achieved, as quantum dots are reliably evaluated correctly. Note, that the presented methodology can account for different spectral requirements and is applicable regardless of the underlying photonic structure, fabrication method and material composition. We therefore consider it the first step towards a fully integrated evaluation framework for quantum dots, proving the use of machine learning beneficial in the advancement of future quantum technologies.

www.nature.com/scientificreports/robustness against decoherence and environmental noise.This makes them particularly advantageous, e.g. for long-distance communication through optical fibres 9,10 .Besides, photons are comparatively easy to manipulate, making photonic setups experimentally very accessible 11,12 .Obviously, the development of an efficient and on-demand single photon source is key, with brightness, purity and indistinguishability of the emitted photons taking priority 13,14 .Many setups today make use of spontaneous parametric down-conversion, where a pair of entangled photons is generated from laser light in a non-linear birefringent crystal.While photons produced this way are highly indistinguishable, there is an intrinsic trade-off between brightness and single photon purity due to the Poissonian statistics of down-conversion 15,16 .Improving on these limiting factors, quantum light sources embedding semiconductor quantum dots in photonic structures or cavity resonators have increasingly established themselves as promising candidates and valid alternative 17,18 .

Semiconductor quantum dots
Semiconductor quantum dots (QD) are nanoscale heterostructures with a lower band gap between the disjoint valence and the conduction band than their semiconductor environment.Their small size in terms of the de Broglie wavelength of electrons confines charge carriers (electrons, holes) in all three spatial dimensions, which results in a band structure allowing for discretised, i.e. quantised electronic states resembling the shells of atoms 13,19 .Their energetic landscape is graphically outlined in depth in Fig. 1a.Here, photons are represented as blue wavelets, while solid arrows trace optical transitions and dotted arrows indicate non-radiative relaxation.As shown, under above-band laser irradiation (dark blue), an electron is promoted from the low-energy valence band to the high-energy conduction band by absorbing a photon whose energy exceeds the band gap.Through non-radiative energy dissipation, the excited electron and the remaining hole relax to the respective lowest energy state of the QD (s-shell), forming a bound pair called exciton.This subsequently fluorescently recombines, i.e. the electron radiatively decays by emitting a single photon with the energy of the occupied QD state (light blue).The fluorescence wavelength is dependent on the quantisation of the QD states, which, in return, is directly determined by the size and geometry of the QD 20,21 .
Interestingly, additional charge carriers can typically be found in a QD, causing real QD spectra to exhibit multiple emission lines.This is showcased in Fig. 1b, where the spectra of the excitation laser (dark blue) and the QD emission (light blue) are schematically plotted as functions of the photon wavelength .
While there are several fabrication techniques and material compositions for the realisation of QDs, for this work, we consider self-assembled InAs QD samples grown in the Stranski-Krastanow mode on a GaAs platform using molecular beam epitaxy 22 .A schematic cross-section of such an InAs/GaAs QD wafer is given in Fig. 1c.Despite having state-of-the-art performance in their photon emission properties, as a result of self-assembly, these QDs grow randomly distributed in space.On top of that, even when synthesised under the same growth conditions, their size can vary from dot to dot, resulting in different quantum confinement and therefore different emission wavelengths 23 .This implies, that in order to perform some intended experiment, a suitable QD has to be found on a given wafer.This is exemplarily illustrated in Fig. 2a, where a confocal µ-photoluminescence intensity map 24 of a representative InAs/GaAs wafer is shown.The image is recorded using the measurement setup schematised in Fig. 2b: an above-band laser is scanned across the sample in a 50 × 50 µm 2 area, while the QD luminescence is collected and analysed in a spectrometer.Plotting the intensity over the spatial position yields the colour map on the left, where QDs appear as bright spots over a dark background.The normalised emission spectra I( ) of the yellow encircled QDs are given in Fig. 3.All three emit at a fluorescence wavelength of about 900 nm .However, while in the first spectrum one optical transition is predominant, in the second one, in contrast, several transitions are excited simultaneously, resulting in the spectrum having multiple peaks.The third one, finally, besides having a significantly worse signal-to-noise ratio and an elevated baseline, exhibits a broadband feature in its emission spectrum at around 920 nm .For use as single photon sources in photonic systems, QDs ideally emit at only one specific, spectrally isolated wavelength with high intensity.In this regard, only the first QD appears to be applicable.Nevertheless, this does not mean that the other two are to be discarded straight away.For instance, with more sophisticated excitation schemes a single desired optical transition can be addressed.Equally, spectrally matching a photonic resonator to the driven transition and exploiting cavity quantum electrodynamic effects can prove advantageous 25,26 .Then again, a seemingly perfect QD photon source can turn out to be unfit on closer inspection, e.g. in a polarisation measurement 27 or a surface topography 28 .Overall, this implies the suitability of a semiconductor QD as single photon source cannot be definitively decided solely based on its emission spectrum, but that further analyses are necessary.However, given that these are usually either time-consuming and/or resource expensive promising candidates must be pre-selected.Currently, this is done by hand: specifically trained experts consider the emission spectra of all candidate QDs on a given sample and assess based on their subjective experience whether they meet the spectral requirements for additional in-depth investigations and an eventual application.This evaluation is neither trivial, nor are there any quantified conventions for it.Both in research and industrial applications, this manual selection process represents an actual bottleneck limiting the productivity and thus the scalability of photonic technologies.

Contribution
This paper directly addresses this challenge.Here, we propose a data-driven machine-learning-based solution for the automated evaluation of the applicability of a semiconductor QD as single photon source based on its emission spectrum.Specifically, the goal is to build an expert system approximating the current experience-based suitability analysis.With this contribution, the (pre-)selection of viable QDs can be parallelised and processing times can be significantly reduced.Moreover, the technical relevance of this paper increases in the long-term, as methods like the one presented here are a necessary requirement for a large scale production of semiconductor QD single photon sources and thus for an industrial implementation of photonic technologies.To the authors' best knowledge, at the time of writing, no such approach has yet been reported in the literature.In Ref. 29 , a machine-learning-based classifier of quantum sources is proposed, although it is limited to discriminating between the emission of single and non-single photons from nitrogen-vacancy centres in diamond.Otherwise, the focus lies on using machine learning to enhance the QD fabrication process itself 30,31 .In particular, a variety of methods is employed to either provide the design parameters of the synthesis 32,33 or to make predictions about the resulting optical properties 34,35 .
This paper is structured as follows: after covering the physical background and relevance of the topic in the introduction, next, the evaluation algorithm is presented in depth and contextualized within the state of the art.It is subsequently implemented and its performance showcased, which, finally, allows for a discussion of the proposed solution.

Quantum dot evaluation method
The development of a method for evaluating the photonic usability of a semiconductor QD based on its emission spectrum is mathematically equivalent to the identification of a function f : I( ) → s that maps a given emission spectrum I( ) to a rating score s ∈ [0, 1] , with s = 1 encoding a potentially perfect single photon source recommended for further analyses, and s = 0 a most likely unfit candidate.As discussed in section "Semicon- ductor quantum dots", given the complexity of this gauging combined with the lack of established conventions and the subjectivity involved, an analytical derivation of such a function is highly impractical.Therefore, in this paper, a data-driven regressive approach is proposed instead.Regression analysis is the statistical estimation of a functional relationship between an independent input u ∈ U and a dependent output y = f (u) ∈ Y by minimising a loss function over the parameter vector β ∈ B of a pre-defined regression model train , y As this involves detecting underlying pat- terns in the data, limiting redundancies and noise by reducing the dimension of the function space can significantly improve the overall performance of the regression.With high-dimensional data, it is therefore common practice to first compress the independent variable u ∈ U into a lower-dimensional feature vector x(u) ∈ X ⊂ U from which to subsequently predict the dependent target variable y = f (x) 36,37 .
Within the scope of this paper, this implies, a meaningful feature representation x is to be derived for the emission spectrum u = I( ) of a semiconductor QD.For this, we consider both explainable spectral parameters, extracted by conventional methods of signal processing, as well as an abstract latent representation learned by an autoencoder.A subset X of minimally redundant, but maximally relevant features, that still sufficiently accurately describes the data 38,39 , is then selected by correlation analysis and used as input of a regression model freg (x, β) .Here, we propose a multivariate neural network regressor, specifically designed to not only predict a technical suitability score ŷ1 = ŝ , but to also return a measure of confidence ŷ2 = σ for its estimate.A visual abstract of the overall scheme is given in Fig. 4. The top half outlines the pre-processing and training, whereas the bottom half visualises the workflow for predicting the output ŷtest of some unknown test input u test , with its feature representation x test being passed to the now optimised regression model.The following sections elaborate on the neural network regression analysis, highlighted in yellow, and the feature engineering, marked in blue.

Neural network regression analysis
Overall, the performance of any regression analysis is determined by the ability of the trained model to generalise, i.e. to make accurate predictions for unknown inputs x test .Here, besides the quality of the training data and the numerical optimisation, the selection of the model function freg itself is key.In this regard, different regression techniques are distinguished.Most common is linear regression, which is easily implemented, but limited in its application 40 .For non-linear systems, kernel-based methods like support vector machines 41 or Gaussian process models are widely established 42 .Lastly, artificial neural networks (NN) are a class of universal function approximators 43 with a characteristic parameter structure of hierarchical layers intended to resemble interconnected biological neurons 44 .A fully connected feed-forward NN regression model is defined as where L ∈ N denotes the number of layers.Besides the first layer ℓ = 1 , which is passed the feature vector x (0) = x ∈ X , each layer ℓ is passed the output of the previous layer x (ℓ−1) ∈ R d ℓ−1 as input.The last layer, finally, (1) Visual abstract of a regression analysis scheme (yellow) including feature engineering (blue).In this paper, the independent input u corresponds to a measured QD emission spectrum I( ) , and the dependent output y to the evaluation score s ranking the QD's technical usability as single photon source between 0 and 1 with confidence σ.
returns the model predictions ŷ = x (L) .A schematic depiction of the setup of an arbitrary layer ℓ is given in Fig. 5.Each layer comprises a non-linear activation function ϕ (ℓ) : R → R which is applied element-wise to the output of an affine mapping where the weight matrix W (ℓ) ∈ R d ℓ ×d ℓ−1 and the bias vector b (ℓ) ∈ R d ℓ are the model parameters to be optimised during training.Note, that the general matrix multiplication in (3) can also be replaced by a discrete convolution with a set of learnable kernels 45 .
Compared to conventional regression models, NN regressors stand out for their huge parameter space, which is further extended by the inclusion of additional connections, shortcuts or feedbacks between the layers in more sophisticated network architectures 46,47 .Because of this, NNs are particularly good at recognising patterns in unstructured data and making generalising predictions.Accordingly, they have been applied successfully to a wide range of problems, from the calibration of biosensing systems 48 , to the evaluation of chess positions 49 and the identification of objects in images 50 .

Feature engineering
As discussed above, the use of properly optimised features is crucial for pattern detection in data analysis and thus for regressive function modelling, improving the overall performance and the prediction accuracy in particular.Like all spectral data, QD emission spectra are characterised by some rather self-evident features, first and foremost, the number of peaks n peak ∈ N , which infers how many optical transitions are excited at the same time.However, considering an ideal single photon source emits at only one specific wavelength, usually only the brightest peak with the maximum emission intensity u max ∈ R + is of interest.Its relative dominance can be quantified by the ratio of its amplitude to the height of the next larger peak, denoted by r dom ∈ R + .Besides, its sharpness, best described by its full width at half maximum w FWHM ∈ R + , and its minimum distance d min ∈ R + to neighbouring peaks determine the feasibility of isolating the corresponding level transition for single photon generation.Note, that at this stage, the exact emission wavelength is of secondary importance and will hence not be taken into account.
All of the mentioned features represent explainable parameters that can be extracted from the data using conventional methods of signal processing.Here, we employ the Ordered Statistics Constant False Alarm Rate (OS-CFAR) peak detection algorithm, which is commonly used in radar technology, as it is capable of adapting the detection threshold to the surrounding noise baseline 51 .As showcased in Fig. 6, this prevents spectral broadband features to be identified as a collection of subsequent individual peaks.Once the peaks have been localised within the spectrum, the algebraic computation of the corresponding feature values is straightforward.
(3) www.nature.com/scientificreports/These features, however, are not necessarily sufficient to describe every aspect of an emission spectrum and to fully evaluate QDs in regard to their suitability as single photon sources.Therefore, additional abstract features are extracted using a so-called autoencoder, an unsupervised machine learning technique for non-linear dimensionality reduction 52 and representation learning 53 .
At its core, an autoencoder is a NN regression model estimating the identity function that projects an independent variable u ∈ U onto itself.However, the network is set up as such, that the output of one intermediate layer is of reduced dimension 54,55 .As can be seen in Fig. 7, this implies, the information contained in the data vector u is first encoded into a lower-dimensional latent feature vector ξ ∈ � ⊂ U and subsequently decoded again, in order to produce a reconstruction û ∈ U of the original input.Training the network by minimising the reconstruction error e recon = �u − û� 2 ∈ R requires as little information as possible to be lost when propagating the data vector through the network.Hence, the latent representation ξ is automatically optimised as well and can subsequently be extracted by evaluating only the encoder part of the network (blue).Note, that, while highly informative, the features derived this way are not necessarily explainable or unique.The residual reconstruction error e recon , meanwhile, provides a measure for the loss of information and thus for deviations from learned patterns and regularities.It is therefore often considered in fault detection, for instance to catch sensor or actuator errors 56,57 .In the context of this paper, it is exploited to quantify noise and spectral oddities, like the broadband feature in Fig. 6.
As outlined in section "Quantum dot evaluation method", subsequently, a set of minimally redundant, but maximally relevant features is to be selected to be used as input x for the NN regression model estimating a QD's viability as single photon source.The results hereof are presented in the next section.

Results
In this section, the proposed evaluation algorithm is implemented and validated.For this, we consider single layer InAs/GaAs semiconductor QDs within a n-i-n diode structure.The QDs are located in the vertical antinode of a planar cavity formed by a bottom distributed Bragg reflector (DBR) and a lower reflectivity top DBR.More information on the samples can be found in 22 , alongside comprehensive optical and quantum optical characterisations.Using an above-band excitation laser close to saturation, a dataset of 25 000 emission spectra is recorded in a spectral range of 30 nm in 1024 px , thus giving an input dimension of dim U = 1024 .Moreover, a total of 300 spectra is labelled redundantly by a team of seven experienced experts in the field, i.e. personal biases are reduced to a minimum by the assignment of a score average s between 0 and 1 that rates the viability of the emitting QD as source of single, indistinguishable photons with isolated, bright emission lines and low background.An approximate representation of the score distribution is given by the histogram in Fig. 8.
Excluding the marginal extrema, the distribution is roughly Gaussian (dark blue curve).Lastly, to augment the data, each sample is spectrally shifted twice by a random number of pixels.As this does not affect any spectral properties qualitatively, the size of both the labelled and unlabelled dataset is increased by a factor of three, which benefits the training of the various machine learning models.For this, as is common practice, 80% of the data is used, with the remaining 20% being retained for testing and to track possible overfitting.Note, that this split is applied to both the labelled and the unlabelled dataset.In the following, first, the results of the autoencoder feature extraction are presented, then, a suitable subset of features is selected by correlation analysis, and eventually, the performance of the rating score prediction by NN regression is showcased.

Autoencoder representation learning
For the derivation of an abstract feature representation for QD emission spectra, in this paper, an autoencoder with latent space dimension dim = 16 is proposed.This is the result of a hyper-parameter optimisation and represents a trade-off between the loss of too much information in case the latent space dimension is too small, and a reduced correlation of the autoencoder states with the rating score in case the latent space dimension is too large.Taking normalised input data with �u� max = 1 , the encoder part handles the feature learning and the dimensionality reduction.For the former, typically, deep convolutional NNs are employed, which excel at pattern detection, but potentially suffer from vanishing or exploding gradients during optimisation 58 .In this regard, residual blocks, i.e. sequences of two convolutional layers with a skip connection, offer some numeric benefits, as any derivation yields at least an identity matrix.Moreover, the mapping of linear relations is facilitated 59,60 .
For the dimensionality reduction, in return, max pooling is state of the art.This is a downsampling technique, where the output of a convolutional layer is divided into blocks of equal size, with only the maximum value of each block being propagated further.Since this way the most dominant entries are retained, the overall performance of the network is not significantly impaired.On the contrary, max pooling introduces a certain degree of translational invariance and improves the computational efficiency of the network 61,62 .
Here, the encoder is set up as a series of four residual units, each consisting of two residual blocks followed by max pooling (ref.Fig. 9).As denoted, the results of the convolutional mappings are batch normalised before being passed to the rectified linear unit (ReLU) activation function 63 In each residual unit, the four convolutional layers have the same structure and hyper-parameters (output shape, kernel size, stride, padding), whereas a max pooling dimensionality reduction by factor three is adopted throughout.Lastly, the compact latent representation ξ ∈ �, dim � = 16 is produced by a fully connected feed-forward layer.For the subsequent recovery of the input vector and the associated increase in dimensionality, the decoder comprises six sequential transposed convolutional layers 64 .Since the input data is normalised, the sigmoid activation function is used here to constrain the output value range such that �û� max ≤ 1 .A summary of the autoencoder's complete architecture is given in Table 1.Overall, the autoencoder has 233313 training parameters, which are optimised with respect to the squared ℓ 2 -norm using the computationally efficient ADAM algorithm with learning rate scheduling 65 and a batch size of 512.As the autoencoder training is unsupervised, the unlabelled dataset is used for it.Fig. 10 displays the training and test learning curve of the autoencoder over 200 epochs of optimisation.Clearly, both decay approximately exponentially towards 0, indicating that the autoencoder's learnt latent representation is optimised without significant overfitting.The performance of the fully trained autoencoder is further showcased in Fig. 11, where the reconstructions of the three model spectra from Fig. 3 are shown.Note, that these are part of the test data and therefore not previously known.As can be seen, the first two spectra are recovered reasonably well, with all major peaks captured and the reconstruction errors correspondingly low.In contrast, the autoencoder struggles to reconstruct the third spectrum, as both the worse signal-to-noise ratio and the spectral broadband feature constitute sever deviations from learnt patterns and regularities.As discussed in section "Feature engineering", the reconstruction error e recon is considered as feature precisely to take such cases into account.On the other hand, no physical meaning could be inferred for the autoencoder's latent states.

Feature selection
Combining the autoencoder's latent representation ξ and reconstruction error e recon with the aforementioned characteristic spectral parameters, overall, a set of 22 features (4) ϕ ReLU (x) = max (0, x).
(5) www.nature.com/scientificreports/can be extracted from each QD emission spectrum u = I( ) .These are, however, neither inherently independent, nor necessarily impacting the suitability evaluation subject to this paper.Therefore, two correlation studies are performed to select a subset of minimally redundant, but maximally relevant features to be used as input vector x for the NN regression model.For both, the absolute of Spearman's rank correlation coefficient ρ ∈ [−1, 1] is used, as it is not limited to linear relationships, but rather measures monotonicity 66 .First, using the available labelled training data, the correlation between each feature and the rating score is computed.The results hereof are listed in Table 2.In particular, the reconstruction error e recon and the maximum emission intensity u max stand out for their strong correlation of ρ > 0.9 with the target value.Subsequently, only features with a correlation coefficient ρ > 0.6 are retained, which reduces the set of features under consideration to (6) ξ 1 , . . ., ξ 16 , e recon , n peak , r dom , u max , w FWHM , d min (7) ξ 2 , ξ 7 , ξ 13 , e recon , r dom , u max ,  where the significant amount of latent features justifies the use of the autoencoder.The remaining features are subject to a cross-correlation analysis.The results are visualised in Fig. 12.Clearly, the reconstruction error e recon and the maximum emission intensity u max are also correlated comparatively strongly with each other.Furthermore, both show a moderate cross-correlation with the remaining latent features ξ 2, 7, 13 .Since for the reconstruction error this can be attributed to the shared origin, i.e. the autoencoder, the maximum emission intensity is omitted in order to limit redundancies.This leaves five features, that combined form the input vector of the NN regression model estimating a QD's viability as single photon source.The comparatively high relevance and impact of the reconstruction error e recon is revisited in section "Evaluation score prediction".

Evaluation score prediction
The last building block of the proposed QD evaluation scheme is the NN regression model.Considering the goal is to replicate an expert's experienced based decision process with its inherent subjectivity, the network is set up as such, that not only a rating score prediction ŝ ∈ [0, 1] is returned, but also a measure of confidence σ ∈ R for it.To do so, using the training data, the Gaussian negative log-likelihood loss function is minimised over β for the multivariate NN regression model freg, NN with vector-valued output.For accurate predictions, this causes the optimiser to drive σ → 0 , whereas for inaccurate predictions, σ must inevitably increase for the second summand to be minimised.Note, that this is accomplished without supervision.Given that (9) is the negative natural logarithm of a normal distribution, σ can be interpreted as standard deviation of the prediction and is hence referred to as such 67 .
The regression model itself is designed as a fully connected feed-forward NN with four layers, using the sigmoid activation function (4) throughout to constrain the predictions to ŝ ∈ [0, 1] .Table 3 provides further details regarding the architecture of the network.As before, the ADAM optimisation algorithm is employed with a batch size of 64 and the resulting training and test learning curves over 2000 epochs are given in Fig. 13.Despite several outliers, both curves clearly decay and the network is accordingly optimised with negligible overfitting.In fact, considering the widespread R 2 score as accuracy metric for regression analysis 68 , a training score of 96%, and a test score of 95% is achieved.For reference, taking the reconstruction error, i.e. the strongest feature, as single input for the regression model such that x = e recon , yields a R 2 test score of 84%.Considering the reconstruction error primarily quantifies noise, this implies, that differentiating candidate QD spectra solely with respect to the signal-to-noise ratio would lead to significantly worse results.www.nature.com/scientificreports/respectively.As can be seen, both the expert as well as the proposed expert system agree on their evaluation, highlighting the potential for automatisation in this field.

Discussion and outlook
The main objective of this paper was the development of a method to automatically evaluate the viability of a semiconductor QD as single photon source based on its emission spectrum.For this, combining spectral analysis and an autoencoder, a suitable feature representation for QD emission spectra is derived and a NN regression model is trained on a given set of expert labelled data.Overall, the proposed solution achieves highly convincing results by reliably predicting accurate ratings for unknown test inputs.Embedding the evaluation algorithm in a user application and establishing a required minimum rating enables the automation of the manual pre-selection of candidate QDs for further analyses.This does not only significantly reduce processing times, but also introduces a certain degree of objectivity and comparability.Overall, this work showcases how machine learning can support and benefit the ongoing development of quantum technologies by solving practical challenges.However, several aspects are to be pointed out in this context.First, in this paper, a regressive rather than a classification based approach is employed.This has the advantage, that the cut-off score can be chosen freely to render the selection more conservative or more radical.In fact, it can even be defined as a function of the estimated measure of confidence of the rating prediction.Secondly, note, that the data used here to train the regression model was labelled by a team of experts to eliminate any personal bias.In practice, however, different experts work on different topics and therefore have different spectral requirements.In particular, the distinction between exciton, biexciton and trion excitation is technically highly relevant.This can be accounted for by optimising the network only with regard to the labels assigned by one expert.In this case, the trained model will replicate their personal assessment and will be tuned to their application scenario.Since only a comparatively small dataset is required to be re-labelled and the cross-training of the prediction model is of low computational effort, adapting the proposed evaluation method is considerably more efficient than adjusting a rating system not based on machine learning, for which a plethora of decision variables and threshold values would have to be fine-tuned.Note, that the demand for QDs of one specific emission wavelength can be met by simply filtering the evaluated and selected spectra accordingly, which is why this parameter was not included as feature in the analysis.Finally, it should be mentioned, that while this work focusses on self-assembled QDs grown in the Stranski-Krastanow mode by molecular beam epitaxy, comparable challenges arise with other fabrication methods as well.However, since the solution proposed here is transferable, the same approach can be adopted for each fabrication method, material composition and photonic structure.
In the long-term, the framework presented in this paper is to be expanded to a fully automated evaluation tool for semiconductor QDs, capable of taking into account not only emission spectra, but also further measurements and custom requirements, in order to streamline and support the synthesis of high quality single photon sources.

Figure 2 .Figure 3 .
Figure 2. Confocal µ-photoluminescence intensity measurement.(a) Normalised emission intensity in a 50×50 μm 2 area of a representative self-assembled InAs/GaAs QD sample.Three exemplary QDs are marked in yellow.(b) Schematic measurement setup: the QD sample is placed inside a Helium cryostat and excited by an aboveband laser guided through a beamsplitter.The luminescence signal is collected and sent to a spectrometer.

Figure 4 .
Figure 4. Visual abstract of a regression analysis scheme (yellow) including feature engineering (blue).In this paper, the independent input u corresponds to a measured QD emission spectrum I( ) , and the dependent output y to the evaluation score s ranking the QD's technical usability as single photon source between 0 and 1 with confidence σ.

Figure 5 .Figure 6 .
Figure 5. Schematic representation of a layer in a feed-forward NN.

Figure 7 .Figure 8 .
Figure 7. Schematic representation of an autoencoder NN.The input vector u is first encoded into a lowdimensional latent feature representation ξ , which is subsequently decoded again to produce an estimate û of the input.

Figure 10 . 33 Figure 11 .
Figure 10.Average training and test loss of the autoencoder during training.

Figure 13 .Figure 14 . 67 Figure 15 .
Figure 13.Average training and test loss of the NN regression model during training.

Table 2 .
Absolute Spearman correlation between each feature and the rating score.Significant values with |ρ| > 0.6 are in bold.

recon n peak r dom u max w FWHM d min
ξ 2 ξ 7 ξ 13 e recon r dom u max