Machine Learning techniques for state recognition and auto-tuning in quantum dots

Recent progress in building large-scale quantum devices for exploring quantum computing and simulation paradigms has relied upon effective tools for achieving and maintaining good experimental parameters, i.e. tuning up devices. In many cases, including in quantum-dot based architectures, the parameter space grows substantially with the number of qubits, and may become a limit to scalability. Fortunately, machine learning techniques for pattern recognition and image classification using so-called deep neural networks have shown surprising successes for computer-aided understanding of complex systems. In this work, we use deep and convolutional neural networks to characterize states and charge configurations of semiconductor quantum dot arrays when one can only measure a current-voltage characteristic of transport (here conductance) through such a device. For simplicity, we model a semiconductor nanowire connected to leads and capacitively coupled to depletion gates using the Thomas-Fermi approximation and Coulomb blockade physics. We then generate labelled training data for the neural networks, and find at least $90\,\%$ accuracy for charge and state identification for single and double dots purely from the dependence of the nanowire's conductance upon gate voltages. Using these characterization networks, we can then optimize the parameter space to achieve a desired configuration of the array, a technique we call `auto-tuning'. Finally, we show how such techniques can be implemented in an experimental setting by applying our approach to an experimental data set, and outline further problems in this domain, from using charge sensing data to extensions to full one and two-dimensional arrays, that can be tackled with machine learning.


I. INTRODUCTION
Tremendous progress in realizing high-quality quantum bits at the few qubit level has opened a window for new challenges in quantum computing: developing the necessary classical control techniques to scale systems to larger sizes. A variety of approaches [1][2][3][4][5][6][7][8][9] rely upon tuning individual quantum bits into the proper regime of operation. In semiconductor quantum computing, devices now have tens of individual electrostatic and dynamical gate [4,[10][11][12][13][14][15][16][17] voltages which must be carefully set to isolate the system to the single electron regime and to realize good qubit performance. A similar problem arises in the control of ion positions in segmented ion traps [18][19][20][21]. Preliminary work to automate the laborious task of tuning such systems has primarily focused on fine tuning of analog parameters [22,23] using techniques from regression analysis and quantum control theory. At the same time, tremendous progress in automated classification suggests such techniques may be used [24][25][26] to bootstrap the experimental effort from a de novo device * sandeshkalantre@iitb.ac.in † jmtaylor@umd.edu to a fully tuned device, replacing the gross-scale heuristics, developed by experimentalists to deal with tuning of parameters particular to experiments. In this work, we specifically consider the control problems associated with electrostatically defined quantum dots (QDs) present at the interface of semiconductor devices [27]. Each quantum dot is defined using voltages applied to metallic gate electrodes acting as depletion gates which confine a discrete number of electrons to a set of islands. We use machine learning (ML) and numerical optimization techniques to efficiently explore the multidimensional gate voltage space to find a desired island configuration, a technique we call 'auto-tuning'. Toward this end, we use ML to recognize the number of dots generated in the experiment.
In order to improve on the accuracy, we work with convolutional neural networks (CNNs) [25,28]. CNNs are a class of artificial neural networks designed for efficient pattern recognition and classification of images. When trained on high quality simulated data, CNNs can learn to identify the number of QDs. Once the neural network is trained to recognize dot configurations, we can recast the problem of finding a required configuration as an optimization problem. As a result, a neural network coupled to a optimization routine presents itself as a solution for determining a suitable set of gate voltages.
Training of a machine learning algorithm necessitates the existence of a physical model to qualitatively mimic experimental output and provide a large, fully labeled data set. In this paper, we develop a model for transport in gate-defined quantum dots and train neural networks to identify number of islands under a given gate voltage configuration. We also describe the auto-tuning problem in the double-dot to single dot transition regime. Finally, we discuss the performance of the recognition and autotuning for both simulated and experimental data. We report over 90 % accuracy for with very simple neural network architectures on all these problems, where accuracy is defined as the fraction of times when the predicted configuration agreed with the pre-assigned label.
This paper is organized as follows. In Section II, we motivate the problems associated with tuning of quantum dot arrays and their relation to ML problems. In Section III, we present the physical setup for our devices and the model for transport calculations. In Section IV, we start with a toy example of using a neural network to learn Coulomb blockade and identify charge states of a single quantum dot. The charge and state identification problem for a double dot and its solution using CNNs is presented in Section V. In Section VI, we define the autotuning problem and its resolution. In Section VII, we test our trained CNN for state identification and autotuning on experimental data. In Section VIII, we describe how the machine learning techniques described in this work can be incorporated in an experimental setting and speculate on further problems that can be potentially solved using neural networks for quantum dots. Finally, we present our conclusions in Section IX.

II. MOTIVATION
Electrostatically defined quantum dots offer a means of localizing electrons in a solid-state environment. A generic device, consisting of a linear array of dots in a two-dimensional electron gas (2DEG), is presented in Fig. 1(a). Gate electrodes on top are used to confine electron density to certain regions, forming islands of electrons. The ends of the linear array are connected to reservoirs of electrons, i.e., contacts, which are assumed to be kept at a fixed chemical potential.
By applying suitable voltages to the gates, it is possible to define a one dimensional potential profile V (x). Alternating regions of electron density islands and barriers are formed, depending on the relation between the chemical potential and the electrostatic potential V (x) ( Fig. 1(b)). Barrier gates are used to control tunneling between the islands while the plunger gates control the depths of the potential wells.
A fixed number of islands requires a specific number of gates. Since the voltage on each gate can be set independently, the state space for the gate voltages is R m , with m denoting the number of gates. By suitable choices of the gate voltages, it is possible to have a certain number of islands, each with a certain number of charges along the nanowire. We refer to the number of islands as the state. Though having a large number of gates implies a higher degree of control, it also presents a challenge in determining appropriate values for the gate voltages, given a required configuration [4].
Standard techniques of assigning voltages to the gates rely on heuristics and experimental intuition. Such techniques, however, present practical difficulties in implementation when the number of gates increases beyond a modest number. Hence, it is desirable to have a technique, given a desired configuration of the device, to determine an appropriate voltage set without the need for actual intervention by an experimenter.
Machine learning (ML) [29] is an algorithmic paradigm in artificial intelligence and computer science to learn patterns in data without explicitly programming about the characteristic features of those patterns. An important task in machine learning is classification of data into categories, generically referred to as a classification problem. The algorithm learns about the categories from a dataset and produces a model that can assign previously unseen inputs to those categories.
In supervised learning models, ML algorithms rely on labels identifying each data point to learn to classify data from a predefined and known representative subset (training data) into assumed categories (thus the term supervised ). Once trained, the algorithm then general- izes to an unknown data set, called the test set. Deep neural networks (DNNs) i.e., neural networks with multiple hidden layers, can be used to classify complex data into categories with high accuracy of over 90 % [25].
The central aim of this work is to enable an automated approach to navigation and tuning of quantum dot devices in the multidimensional space of gate voltages. Here, we define auto-tuning specifically as finding appropriate values for the gate electrodes to achieve a particular configuration. Identification of the state of the device is the first step in the tuning process. In light of the requirement for learning the state to achieve tuning and the success achieved with DNNs for data classification, we propose to use DNNs to determine charges and states of quantum dots. Once it is achieved, auto-tuning is reduced to an optimization problem to the required state and can be done with standard optimization routines.

III. PHYSICAL MODEL OF A NANOWIRE
A prerequisite for training of neural networks is the availability of a training data set which mimics the expected characteristics from a test set. We develop a model for electron transport under the Thomas-Fermi approximation to calculate electron density n(x) and current I (see Appendix A for details). This model allows us to construct a capacitance model for the islands given a potential landscape V (x) and the fixed Fermi level of the contacts. The potential profile, in turn, is determined by the voltages set on the gates (Appendix C).
An infinitesimal bias is assumed to exist between the contacts. The discreteness in the number of electrons in the islands, along with inter-electronic Coulomb repulsion, leads to transport being blockaded across the nanowire. The charge configuration changes when there are two or more degenerate charge states. Such a degeneracy in energy leads to electron flow across the leads, i.e., current at an infinitesimal bias.
We model electron transport using a Markov chain among the charge states (N 1 , N 2 , . . . , N k ) of the k islands. N i represents the number of electrons on the i th island. The rate of going from one state to another is calculated under a thermal selection rule set by the energy of the two configurations evaluated from the capacitance model and the tunneling rate. The tunneling rate is modeled as a product of the Wentzel-Kramers-Brillouin (WKB) tunnel probability [30] across the barrier and classical attempt rate of electrons in the islands. From the steady state configuration of the Markov chain, we calculate the current for a given potential landscape V (x) (Appendix B). In all, this simplistic approach provides the minimum model to reproduce basic charge configurations and transport characteristics qualitatively for linear arrays of quantum dots.
As a check on the qualitative performance of our model, we consider 3-gate and 5-gate configurations, as  Fig. 2a and 3a, respectively. We consider a single island (two islands) defined using three (five) electrostatic gates, V Bi , with i = 1, 2 (i = 1, 2, 3) and V P j with j = 1 (j = 1, 2). By changing the depth of the wells, electrons can tunnel in or out of the islands. At a given value of the gate voltage, a fixed integer number of electrons are assumed to exist on each island. Current flows through the device when two charge states have the same energy predicted by the capacitance model. In such a state, electrons tunnel through one of the contacts into the island (or tunnel between islands for a 5-gate device) and tunnel out of the island through another contact. The direction of the electron flow is set by the sign of the bias applied across the leads.
In the simulation for the 3-gate device ( Fig. 2(a) and 2(b), a single dot is present along the nanowire. The contact chemical potentials are fixed to µ 1 = µ 2 = 100.0 meV with respect to the conduction band minimum. An infinitesimal bias of 10 µeV is present across the leads. The barrier gates are assumed to be kept at a fixed voltage with V B1 = V B2 = −200 mV. The third gate, V P , is swept from 0 mV to 350 mV.
As can be seen in the current trace in Fig. 2(c), Coulomb blockade is reproduced. Our model also allows us to predict the most probable charge configuration from the Markov chain analysis. We see that the charge configuration jumps to a different state exactly at the position of the current peaks. The steady increase in the height of current peaks is a result of lowering of the tunnel barriers on increasing V P . The decrease in spacing between adjacent current peaks with increasing values of V P is due to a slow increase in the capacitance of the dot with increasing electron number.
For the 5-gates device, (Fig. 3(a)), the barrier voltages are set to V B1 = V B2 = V B3 = −200 mV. These values were chosen so that the device operates in a double dot configuration ( Fig. 3(b)). We calculate the current as function of the two plunger gate voltages, V P 1 and V P 2 . We reproduce the expected features for such a system [27], current flow only at triple points and honeycomb-shaped fixed-charge contours. (Fig. 3(c) and 3(d)).
We note that while more sophisticated models should be used in future studies, we find our approach to be sufficient for showing how ML can help with the challenges outlined in Sec. II.

IV. LEARNING COULOMB BLOCKADE
We start our analysis from investigation of whether a machine can learn to identify the charge on a single quantum dot, given the current as a function of V P (Fig. 2(c)). Formally, we define the broader problem of Learning Coulomb Blockade as: where I is the current at infinitesimal bias, V denotes the vector of voltages applied to the gates and CS is a vector of number of electrons on each island.
In the case of a single dot, only one gate voltage, V p , is varied and the charge state is simply the number of electrons on the dot. Hence, V and CS are scalars. It is easy to see that this just amounts to learning to integrate the current characteristics and scaling to the appropriate charge number (Fig. 4(a)).
We generated a training data set for 1000 distinct realizations of the dots. Each sample point is a current and charge state vs. V P characteristic. Across the samples, parameters such as the gate positions, widths and heights are sampled from a Gaussian distribution with mean values in the parameter set (standard deviation for the Gaussian was set to 0.05 times the mean value) (see Appendix D for details). Fig. 4(b) and 4(c) show sample current and charge data, respectively, of 100 such dots. The rationale behind generating a large dataset for the dots is twofold: having a variation in the dot parameters models the variations in different dots that are used in experiments and it presents a way to generate a generic training dataset for learning.
The machine learning problem is intended to map the current, given in Fig. 4(b), to the charge state, shown in Fig.4(c). One can think of this as a regression problem from the vector of current values to the vector of charge values.
We used a deep neural network with three hidden layers [31] and achieved 91% accuracy for the charge state values (see Appendix E for a description of the computing environment). Here, the accuracy for a single currentgate voltage curve (see Fig. 4(d) and Fig. 4(e)) is calculated from the predicted charge state from the neural network and the charge state from the Thomas-Fermi model over the gate-voltage range. This accuracy is them averaged over all the samples to produce an accuracy for the test set. The size of input and output layers correspond to the number of points in the I(V) and CS(V). We used a 512 point input and 512 point output layer. The result from the output layer was rounded to the nearest integer to get the charge state. The hidden layers comprised 1024, 256 and 12 neurons, respectively. The outcome of the training is a set of biases and weights corresponding to each neuron that allow the calculation of the final output.
Interestingly, we observed that a successive decrease in the number of neurons across the hidden layers was critical to achieving a respectable accuracy. This suggests a redundancy of information encoded in the current characteristics that the network must learn to ignore when estimating the charge states.
We can visualize the learning by means of a validation set at the end of a fixed number of training epochs. We observed that in the initial training stages, the network learning the charge boundaries in the plunger volt-age space of an average dot and then slowly starts to learn to identify charge states of individual dot samples.
We note that the problem identified above suffers from the charge-offset problem in the real world since the initial number of electrons on the dot might be unidentified. Hence, the network trained as a solution to Problem P1 has limited applicability in experimental settings but nevertheless exemplifies that machine learning can, in principle, be applied to charge identification.
The charge number identification on the single dot also offers a trivial solution to identifying the state of the single dot. If the charge on the dot is non-zero, we can then conclude that a single dot exists whereas a zero charge implies a no dot device. The identification of state of the device with multiple islands from the current presents additional possibilities which we describe in the next section.

V. LEARNING STATE
The state is the number of distinct dots or islands that exist in the nanowire. We now consider a 5 gate device which can exist in 4 possible dot configurations: Quantum Point Contact or a Barrier, single dot (SD), double dot (DD) and a short circuit (SC) (see Fig. 3 To quantify the definition of a dot configuration, we define a probability vector p at each point in the V space. The elements of p correspond to the probability of being in each of the configurations as described above, i.e., p = (SC, Barrier, SD, DD). For example, for a state of a single dot, the probability vector p = (0, 0, 1.0, 0). For a region in V space, p is defined as the average of the probability vectors for the points in the region.
We are interested in determining the dot configuration (i.e., distinguishing between SC, Barrier, SD and DD) for a given set of barrier and plunger gate voltages. Formally, we define the problem as follows: Problem P2: State Identification for full region Given I(V), find the probability vector p at each point in the given voltage space.
We generated a training set of 1000 gate configurations. Each sample point is the full two-dimensional map (100 × 100 points) from the space of plunger gate voltages (V P 1 , V P 2 ) to current (see Fig. 5(b) for an example of such map). A state map corresponding the the current map presented in Fig. 5(b) is shown in Fig. 5(c). The states are calculated via the electron density predicted from Thomas-Fermi model. The number of distinct charge islands in the electron density separated by regions of zero electron density corresponding to the barriers is used to infer the state of the nanowire (see Appendix A). Note that there is more than one way for some of the configurations to exist. For instance, lower voltages on the barrier B2 with respect to the barriers B1 and B3 or higher voltages on B1 and B2 as compared to B3; all lead to a single dot configuration (see Fig. 3(a)). Analogously to the single dot case, gate and physical parameters are sampled from a Gaussian distribution with mean values in parameter set (see Appendix D).
We note that Problem P2 is a regression problem from the I(V ) space to the space of probability vectors. The aim is to go from Fig. 5(b) to Fig. 5(c). We used a similar neural network with three hidden layers that we employed for the single dot problem. The input and output layers are now of the size equal to number of points in the I(V ) and CS(V ) relationships, i.e., 100 × 100 points. It was possible to achieve 91 % on state values i.e., it was possible to reproduce the state map in Fig. 5(c) across different devices with the state label agreeing to 91 % with the actual values. As far as tuning the device is considered, it is not very useful to know to probability vector at each point in the voltage space. Hence, we move to defining a probability vector for a sub-region as opposed to a single point in the voltage space.

VI. AUTO-TUNING
We define the process of finding a range of gate voltage values in which the device is in a specific configuration as auto-tuning. The ability to characterize the state at any point in the voltage subspace provides a promising starting point for the automated tuning of the device to a particular state. In particular, having an automated protocol for achieving stable desired electron state would allow for efficient control and manipulation of the few electron configurations. In practice, auto-tuning compromises of two steps: (i) identifying the current state of the device and (ii) optimizing the voltage configuration to achieve a desired state. The steps are then repeated until the expected state is reached. For a device with m gates, this leads to a problem of finding a m dimensional cuboid in the space of the m gate voltages. From a machine learning perspective, the recognition and tuning of the state can be expressed as the following two problems: Problem P3a: State Identification for sub-region Given I(V), find the average probability vector of the region.
Problem P3b: Auto-tuning: Given the I(V) characteristics, an initial subregion in V space and a desired dot configuration, find (tune to) a subregion with the desired dot configuration.
The idea behind auto-tuning in a two-dimensional space is presented in Fig. 6. For the case of 5-gate double dot device, defined in Sec. V, we consider the restricted problem with two gates V P 1 and V P 2 being controlled and the barrier gates remained fixed (see Fig. 3(a)). We start out in a double dot region and the desired dot configuration is set to be a single dot region.

A. State learning
As mentioned earlier, the first step in the auto-tuning process is the recognition of the existing configuration of Such problems have been successfully solved by convolutional neural networks (CNNs). CNNs have one or more sets of convolutional and pooling layers, that precede the series of hidden layers (see Fig. 7(a)). A con- volutional layer consists of a number of fixed size kernels which are convolved with the input. The number of kernels in a layer is referred to as the number of features in that layer. The weights in the kernel are determined by the training on the dataset. In order to reduce dimensionality of the input for faster operation and to effectively learn larger scale features in the input, a convolutional layer is generally followed by a pooling layer. A pooling layer takes in a sub-region in the input and replaces it by an effective element in that region. A common pooling strategy is to let the effective element be the maximum element in the sub-region which leads to the notion of a max-pooling layer.
The training set for the voltage subspace learning was generated based on the set of 1000 full two-dimensional maps of I vs (V P 1 , V P 2 ) from Sec. V. 50 000 sub-maps of a fixed size (30 × 30 pixel) were generated. 90 % of the 50 000 samples were used as the training set and the rest were used to evaluate the performance of the network. The network achieved 96 % accuracy in prediction of the state. Two examples of the sub-maps and corresponding probability vectors from the evaluation stage are presented in Fig. 7(b), (c) and Fig. 7(d), (e) respectively.
For the training, we used two convolutional layers with kernels of size [5,5]. The layers both had 16 features. Each convolutional layer was followed by a max-pooling layer, wherein the pool size was set to [2,2]. The two hidden layers consisted of 1024 and 256 neurons. Rectified linear units (ReLU) with a dropout rate of 0.5 were used as neurons. Dropout regularization was introduced to avoid over-fitting [32]. Finally, an Adam optimizer was used to speed up the training process [33].
We found that the introduction of the convolutional layers was crucial in achieving better results in terms of both accuracy and efficiency. Here, accuracy is defined with the prediction of the state with the highest probability and efficiency is defined in terms of training time. We note that the state is predicted from the highest probability, though it might be possible that this highest probability is less than 0.5 (see Fig. 7(d) and Fig. 7(e)). Introducing more hidden layers did not affect the accuracy as much as introducing convolutional layers; this indicates that classification over the features seems to be a simpler task than producing an effective representation for the features.

B. Tuning the device
Having the state of the device identified for a subregion, the procedure of auto-tuning corresponds to a simple optimization problem. Let p be the probability vector of a given sub-region and p 0 be the desired probability vector. Define δ(p, p 0 ) = |p − p 0 |, where | · | denotes the vector norm. The problem of auto-tuning is then equivalent to minimization of δ(p, p 0 ) over the space of gate voltages V.
We used COBYLA from the Python package SciPy [34] as a numerical optimizer. The probability vector p was calculated using the neural network described in Sec. VI A. The starting region was set initially in a double dot region, as can be seen in Fig. 6(c). Around 15-30 evaluations of the probability vector using the CNN were required to ultimately find the required sub-region ( Fig. 6(d), II in Fig. 6(a)) depending on the position of the initial subregion. The starting region was varied over the space of (V P 1 , V P 2 ) and in each case it was possible to auto-tune to the required sub region.

VII. WORKING WITH EXPERIMENTAL DATA
We ran the CNN with the set of weights and biases established during the training on the simulated dataset described in Sec. VI A on an experimental dataset for a 3-gate device from our group [35]; the device is as described in [36], and the measurements are very similar to those presented in [37]. The device used in the experiment had two barrier gates (B1 and B2) and one plunger gate (P). In this device, the barrier gates were also used as generic plunger gates. By choosing appropriate voltage values for the plunger (P) gate, the device could be operated as a single dot or a double dot device. The measured data consisted of 2D differential conductance maps in the space of barrier gate voltages (V B1 and V B2 ) for varied but fixed values of the plunger voltage (V P ). Since the qualitative features are similar for a current map and a differential conductance map, we could feed in the differential conductance output into the CNN.

A. Identification of state in experimental Data
For state identification, we considered small regions in the space of barrier voltage for a fixed plunger so that in each of the maps the device was in only one of the states, single or double dot. The maps were then taken at different values of the plunger voltage, ranging from −0.76 V to −0.60 V. The barrier gates are varied from −1.44 V to −1.34 V. Fig. 8(a) shows the 2D maps for different values of the plunger voltage. A gradual transition from a single dot device to a double dot device is seen.
Since our model produces the current value only qualitatively, the experimental data had to be re-scaled (by a constant number) prior to feeding into the CNN to match the simulated data. The CNN characterizes the state present in the device through a probability vector. Results for different values of the plunger voltage are shown in Fig. 8(b). As can be seen, our CNN can effectively distinguish a single dot and a double dot state from the current maps.

B. Auto-tuning of the device to a double dot state
Since the device state could be predicted with reasonable accuracy, we considered tuning gate voltages from one state to another based on the experimental data. For this part, a dataset with a larger variation in barrier voltages was used. Figure 9 shows 2D maps of differential conductance vs the barrier gate voltages (V B1 and V B2 ) for four different values of the plunger voltage. 2D sub-regions of these maps were used as input to the CNN in the tuning procedure.
We considered the auto-tuning of all three gate voltages (two barriers and the plunger). The final tuned state was set to a double dot region. See Fig. 10 for a visualization of the auto-tuning process. Two kinds of initial regions were considered: a single dot region (Fig. 11(a)) and a region with no current (Fig. 11(c)). In both cases, it was possible to find a set of barrier and plunger gate voltages that map to a double dot state (Fig. 11(b) and Fig. 11(d)). Effectively, the CNN predicted the probability vector describing the device state (sec. V) from maps at different plunger voltages and the optimizer tuned the probability vector to a required form (in this case, a double dot). We used the same optimizer as described in Sec. VI B. The tuning process was completed within 40 to 50 iterations, depending on the initial region. Hence, the CNN coupled with an optimizer can be used with data from actual experiments for auto-tuning the device state.

A. Neural Networks in an Experimental Setting
We describe how a generalized auto-tuner neural network can be implemented in an experiment to automati- cally adjust the parameters of the device to an expected state. Consider a quantum dot device with a set of gate voltages V . We showed that a neural network can be trained to predict a probability vector p describing the state of an arbitrary sub-region in the V space. This predicted vector p, together with a destination probability vector, can be then fed to an optimizer controlling the space parameters in order to obtained the desired single or double dot state.
In particular, let's assume that p 0 is the probability vector of the desired state. Starting in a random region of the voltage space, the trained CNN can predict a probability vector p for this region. A fitness function δ is then used to compare the predicted probability vector p and the destination vector p 0 . By minimizing δ, the auto-tuning of the device takes place. An optimizer determines an optimal set of parameters that leads to a new sub-region. The process is then repeated until the fitness has been minimized to a particular value.
Since, the entire voltage space in V does not have to be explored, this implies a saving in terms of experimental time. Also the process does not use human intervention at any step in the tuning of the dot signifying the use of auto in our definition of the auto-tuning problem.

B. Further Problems
We have presented novel techniques towards tuning of quantum dot devices. Given that building scalable quantum computing devices is now on the horizon, we hope that such methods will present themselves as natural sub- 10. The idea behind auto-tuning in the three dimensional space of two barrier and plunger gate voltages. The successive squares represent the sub-regions encountered in the tuning process which are fed as input to the CNN. The arrow represents the direction of movement in going from an initial region to a final region. See Fig. 9 for the VB1 and VB2 range.
routines for construction of real devices and will do away the need to rely on heuristics. Hence, we outline further problems that are more realistic and useful in an experimental setting and can be potentially tackled with machine learning.

Problem P4: Inductive Learning
Moving to learning and auto-tuning of multiple dots will present new challenges as a result of the higher dimensional space of gate voltages. This curse of dimensionality might detrimentally affect the design of autotuning algorithms. Pattern recognition in dimensions greater than 2 has not been studied extensively. Instead we propose a different solution that can be generalized based on an inductive strategy. We refer to it as inductive learning.
In Inductive Learning, we make use of the fact that gates which are spatially far apart are likely to be loosely coupled to each other. Hence, a strategy emerges in which we use the auto-tuning algorithm to tune the first two barrier gates. A second type of neural network will be used to tune the plunger gate. This will be repeated until all the single dots formed by 2 barriers and a plunger are tuned to the required configurations.
Problem P5: Charge Tuning The capacitance matrix is an effective model of the device and it determines the quantitative size of features in the current output. For instance, in the case of a single dot, the capacitance matrix can be directly related to the charging energy of the device. For the double dot, the capacitance matrix elements determine the size of the honeycomb hexagons. Hence, establishing a learning algorithm for the capacitance matrix is the next logical step. A capacitance matrix along with the voltage values of the gate can be used to estimate the charge on the device. Estimation of the charge can then be coupled with an optimizer to tune the device to required charge values exactly like tuning the state as described in this paper. We refer to this process of learning the capacitance matrix and tuning the charge on the device as Charge Tuning.
We remark here that these further problems and any other that might arise may require different types of machine learning algorithms beyond just deep and convolutional neural networks described in this paper.

IX. CONCLUSION
We have described a bare-bones physical model to calculate the capacitance matrix for a linear array of gate defined quantum dots. We used a Markov chain model amongst the charge to simulate transport characteristics under infinitesimal bias. Our model can qualitatively reproduce the current vs gate voltage characteristics observed in experiments.
This model was used to train deep neural networks to learn the charge and state of single quantum dots from their current characteristics. We used a convolutional neural network to identify state of a double quantum dot device from two-dimensional current maps in the space of gate voltages. We defined the auto-tuning problem for quantum dot devices and described strategies for tuning single and double dot devices. The trained networks were tested on experimental data and successfully distinguished the single and double dot device states. We also demonstrated auto-tuning in a three-dimensional space of barrier & gate voltages on an experimental dataset.
Finally, we described how an auto-tuner network might be incorporated in an experiment and outlined further problems in tuning of quantum dot devices. Moreover, our work presents an example of machine learning tech-niques, specifically convolutional neural networks, fruitfully applied to experiments, thereby paving a path for similar approaches to a wide range of experiments in physics.

ACKNOWLEDGMENTS
We thank Eric Shirley and Michael Gullans of NIST for helpful discussions. SSK acknowledges financial support from the S. N. Bose Fellowship. We acknowledge funding from the NSF Physics Frontier Center at the JQI and the Army Research Laboratory funded CDQI. Any mention of commercial products is for information only; it does not imply recommendation or endorsement by NIST.
We model the electron density as an inhomogeneous electron gas originally used in the statistical theory of Thomas and Fermi for atoms [38]. In this theory, properties of a homogeneous electron gas are applied locally to the inhomogeneous electron gas. This assumption is referred to as the Thomas-Fermi (TF) approximation and is justified when the electron density or the potential acting on it do not change appreciably over a characteristic electron wavelength.
An externally created potential V (x), e.g. from gates, is assumed to be given. Electron density n(x) is treated as the dynamical variable to be found in the theory. A Fermi level µ F is given for the electron gas. For the purposes of simulations presented in this paper, we assume a electron density n(x) on a finite one dimensional grid. (Fig. 1(b)) Consider a Fermi sea with Fermi energy µ F . Let the bottom of the conduction band be at energy 0 . In the absence of an external potential, the electron density in the conduction band can be calculated as, where g( ) is the density of states in the conduction band and β is the inverse temperature. Due to the presence of an external potential, the conduction band minimum shifts in energy. Moreover, the electron density produces an effective potential due to the Coulomb self-interaction. As a result, the band minimum is modified as, where 0 (x) is the new spatially varying band minimum, V(x) is the externally applied potential and K(x, x ) = K0 √ (x−x ) 2 +σ 2 gives the Coulomb energy between points x and x . K 0 sets the energy scale of the interaction. A softening parameter σ has been added to the denominator and serves a twofold purpose: it models the effective one-dimensional interaction for a higher dimensional gas of electrons as would be present in the device and prevents a numerical singularity at x = x .
K(x, x )n(x )dx gives the effective Coulomb potential created as a result of the electron density n(x). Since the effects of the electron density on the conduction band minimum are also included, equation A1 with the modified band minimum, equation A2, provide a selfconsistent calculation of the electron density n(x).
In our calculations, we assume a two-dimensional electron gas (2DEG) to model the electron density of states. The density of states for a 2DEG, g( ) = g 0 = m * π 2 is equal to a constant. Equation A1 was solved in an iterative fashion. The starting solution was taken as n(x) = 0 which was plugged in A2. The modified band minimum was then used to calculate the n(x) using A1. This iteration was repeated until the density n(x) converged. The strength of the Coulomb interaction was increased in a linear fashion to its required strength for a fixed initial number of iterations to avoid pathologies associated with numerical convergence in the self-consistent calculation.
The device is assumed to be connected to large reservoirs of electrons present as the contacts. The contacts are assumed to be kept at a fixed and equal chemical potential µ = µ F . As an approximation, in the absence of gate potentials the conduction band minimum of the entire one dimensional device is assumed to be a constant function of x, being equal to chemical potential of the contacts. Intuitively, the points where V (x) = µ F are the classical turning points for the electrons and differentiate regions of islands and barriers. The regions where µ > V (x) constitute islands of electrons and the rest where µ < V (x) (classically forbidden regions) as forming barrier regions between islands.

Calculation of a capacitance model
Consider a system of m conductors. A capacitance can be defined between each conductor and every other conductor as well as a capacitance from each conductor to ground. The relation between charges on the islands and their electrostatic potential can then be conveniently expressed with a capacitance matrix C of size m × m.
Q is the vector of charges on each conductor and V is a vector with the voltage on each conductor with respect to a ground potential. The conductors are coupled capacitively to fixed voltages which act as gates in the actual device. The electrostatic energy E of the system of conductors can be expressed as: where Z is the vector of induced charges due to the gates. The physics of transport in electrostatically coupled quantum dots with negligible inter-dot tunnel conductance can be described by an orthodox Coulomb blockade theory [27]. We work with a purely classical description of the electron density islands in our one dimensional system without the inclusion of discrete quantum states. We regard them as a system of conductors having a discrete number of electrons and influencing the charges on each other via a capacitance matrix.
A capacitance model of the system is defined as tuple (C, Z) where C is the capacitance matrix of the islands and Z is the vector of induced charges. We wish to establish a procedure to calculate a capacitance model for islands formed in our system.
Electron density n(x) is calculated using equation A1. Assume that the electron density is such that it is nonzero in certain regions (the islands) and zero between the islands ( Fig.1(b)). Z is then calculated by integrating the electron density over each island and is treated as the charge induced by the gate potentials in the capacitance model.
Let Q be the vector of charges on each island. Since the number of electrons on each island is assumed to be an integer, each element of Q is an integer times the electronic charge as opposed to elements of Z which can take arbitrary real values. The energy E of a charge configuration is given as: The energy calculated using this capacitance model is a manifestation of the kinetic energy of the Fermi sea in each island and the Coulomb interaction between the islands. We can use this correspondence to calculate the inverse capacitance matrix elements E i,j using the Coulomb interaction potential K(x, x ) and the electron density n(x).
where δ i,j is the Kroneckor delta function and c k is the coefficient which sets the scale for kinetic energy of the Fermi sea in each island. The integration subscript i denotes that the integration is to be performed only over the extent of the i th island. The denominator has been added to normalize with the total number of electrons on each island. Determination of the elements E i,j and Z amount to determination of the capacitance model for the islands.

Calculation of equilibrium charge distribution
Once the capacitance model has been calculated, we calculate energies of charge configurations closest to the induced charges values, Z, while constraining the number of electrons on each island to be integers. The equilibrium charge configuration is set to the one with the lowest energy. In order to simulate transport characteristics and calculate a current given a potential profile V (x), we introduce a Markov chain model. The actual physics included in this abstract model is calculated using the Thomas-Fermi approximation defined in appendix A. We assume that the contacts are kept under an infinitesimal bias so that the current flow can be modeled by an elastic tunneling Hamiltonian. The tunnel rates are estimated using the WKB approximation.

Graph definitions and construction
Consider a system with k islands where each island is assumed to have N di electrons (i = 0, 1, 2, ..., k). N d = (N d1 , ..., N dk ) is referred to as a charge configuration of the islands. Let G = (V, E) be a directed graph. Each node v ∈ V is a charge configuration as defined above. An edge exists between two nodes if they are connected by an electron tunneling event, either across adjacent islands or through the leads. We introduce an order p of the graph model which is defined such that |N di − Z i | ≤ p ∀i = 1, .., k.
Each graph is constructed in a breadth-first fashion from a starting node. The charge configuration Z as defined in appendix A is used as a starting node for the constructing the graph. All charge configurations which can be reached from this state by a single electron tunneling event are found and are added to the set of nodes. This procedure is recursively repeated for the new nodes added to the graph until no new nodes can be added under the order p of the graph model.
In this work, all Markov chain graphs are constructed to order p = 1 implying only single electron tunneling events. In future works, tunneling of multiple electrons, i.e., co-tunneling, can be incorporated by going to higher orders in the graph model. Fig. 12 shows an example of a simple state diagram for Markov chain in case of a double dot, with arrows representing the possible state transitions.

Calculation of edge weights
For two adjacent edges, the rate of going from one node to another is modeled as a product of two factors: a selection rule set by the capacitance model energies and a WKB tunnel rate. The rate for going from node 1 to 2 (arb. units) is given as: where f T (E) = 1 1+exp E kT is the Fermi function at temperature T , E 2 and E 1 are the capacitance model energies calculated for the charge configurations of nodes 2 and 1 respectively, p W KB is the WKB tunnel probability and τ = l dot ve , is the classical travel timescale inside a dot for an electron (l dot is the dot size and v e is the classical electron velocity).
The WKB tunnel probability p W KB is calculated by treating the electron as a free particle of energy equal to the Fermi level µ F moving in a effective potential including inter-electron repulsion V e (x) = V (x) + K(x, x )n(x )dx . The tunneling probability is then computed as follows: where the range of integration extends over the barrier region adjacent to the two locations through which the electron travels. The classical travel time scale, τ , is calculated by treating the electron as a non-relativistic particle with kinetic energy µ F and calculating the time it would take for the electron to transverse the extent of each island.