Wafer-scale functional circuits based on two dimensional semiconductors with fabrication optimized by machine learning

Triggered by the pioneering research on graphene, the family of two-dimensional layered materials (2DLMs) has been investigated for more than a decade, and appealing functionalities have been demonstrated. However, there are still challenges inhibiting high-quality growth and circuit-level integration, and results from previous studies are still far from complying with industrial standards. Here, we overcome these challenges by utilizing machine-learning (ML) algorithms to evaluate key process parameters that impact the electrical characteristics of MoS2 top-gated field-effect transistors (FETs). The wafer-scale fabrication processes are then guided by ML combined with grid searching to co-optimize device performance, including mobility, threshold voltage and subthreshold swing. A 62-level SPICE modeling was implemented for MoS2 FETs and further used to construct functional digital, analog, and photodetection circuits. Finally, we present wafer-scale test FET arrays and a 4-bit full adder employing industry-standard design flows and processes. Taken together, these results experimentally validate the application potential of ML-assisted fabrication optimization for beyond-silicon electronic materials.

1. In Fig. 2 a., the MoS2 in the process is marked as 2-3nm (~3-5 layers), but the authors claim a monolayer MoS2 film in the Fig. S1. Thickness of the grown sample, such as monolayer or few layer MoS2 film, is significant to fabrication and performance. It is needed to confirm this issue and provide essential data, such as PL and Raman mapping of the representative devices. If the few layer MoS2 is adopted for most demonstrations, please explain the reasons.
2. In Fig. 1.b, the seeding layer is partially deposited on the active region of the FET. Please explain why not fully deposited on the active region? The asymmetric FET design might cause some issues.
3. In fig. 2, the device is optimized with mobility with the Vth of ~2.1V. It would be ideal to include more discussion to explain how to realize the Vth tuning in loading transistor and keep the optimized mobility. 4. In this study, the authors mainly focus on mobility and Vth but more significant properties of the device are essential for real application, such as speed and power consumption. 5. Fabrication of top gate dielectrics on the surface of 2D materials is significant to device performances. It would be ideal to include more discussion on this issue and more details on ALD process of the high k dielectrics. 6. It seems that the devices are directly fabricated on sapphire wafer. Is it required to avoid damage in the transfer process for better electronic performances? It would be ideal to include more discussion on the issue because further fabrication or integration with the sapphire wafer might be issues. 7. All measurement of various logic circuit units are plotted in the time scale of seconds. It would be ideal to show high frequency output characteristics. 8. In fig. S17, photoresponse time of the device, such as raising and falling time, is in the scale of second. The performance might be a issue for real application. Is it due to interface issue or any possible reasons. 9. In fig. 3, the optimized performances are demonstrated with specific aspect ratio. This design of the device might raise issue on the speed or operations. Please explain this issue. 10. In most reported papers on the grown MoS2, overall performances are usually determined with many issues, such as grain size, interface, crystallinity, defect density, variation in the batch synthesis and more process details. The issues might be highly coupled. It might be a bit difficult for readers to understand how machine learning could work for the optimization. 11. It is necessary to include detailed information on the fabricated devices, such as length/width and geometry of the FETs, material and size of the seeding layers, and thickness of the top dielectric.

**THIS REPORT WAS WRITTEN IN COLLABORATION WITH REFEREE #2
The manuscript "Wafer-Scale Functional Circuits Based on Two Dimensional Semiconductors with Fabrication Optimized by Machine Learning" by Xinyu Cheng and co-authors describes a Machine Learning based process optimization method. This this method was used to optimize the performance of MoS2 channel field effect transistors. After the authors identified the optimized process, it was used to fabricate different circuits and to realize wafer scale MoS2 device fabrication. While it is an interesting approach, the paper does not convincingly describe the advantage of the approach compared to conventional process development. In particular, it is not clear where the ML-based pattern recognition provides an advantage over classic design of experiment guidelines. I therefore think this manuscript is not appropriate for publication in Nature Communication.
Below are my detailed comments. 1. What is the advantage of the process optimization based on ML, over the traditional way (in which we optimize each step and then combine whatever we would like to have for a specific application). The traditional way of designing an experiment even appears to be more efficient, since one may not need to go through all fabrication process possibilities. In a clean DOE, one can run specific process combinations and just by looking at the data identify the most promising route. In the current form of the manuscript, this aspect is not discussed at all. 4. Figure 2f. First, grouping the data, as was done here, does not require machine learning. It is obvious to the naked eye what is a favourable outcome and what is not. Second, the figure leaves open the question about the other gray dots. Are they all from the same process flow? This would mean that the authors have in total only 5 process flows, which is not in agreement with the main text. The data points should be made distinguishable (if applicable). 5. The manuscript treats the device aspects as a black box, i.e. there is no correlation between measured values and the underlying physics. However, in device operation and optimization, this understanding is extremely important. It would be very interesting to understand how ML could be used to gain understanding in these intricate details? 6. Assuming that the main message of the paper is to establish Machine Learning for process optimization, I am wondering how the approach would work for a mature technology. In MoS2, where there are extremely large differences, a simple algorithm can easily work. This has been shown in the past with statistical analysis for graphene and MoS2 devices1-4 It is easy, because the differences in devices and circuits are very large form process parameter to process parameter. It would be much more interesting to understand if the method can also work to optimize a mature technology, where the changes from run to run are miniscule. 7. In summary, the authors show some circuit examples and wafer scale fabrication. These represent in themselves nice results. However, these results are not discussed in detail and no insight is provided on the relationship between process, underlying physics phenomena and device / circuit performance. As such, this aspect of the paper does not contribute to the state of the art. Thus, the remaining aspect is the focus on machine learning. Here, the device / circuit results are not discussed in detail in respect to the machine learning process development, which arguably is the focus of the paper. Hence, the relevance of the ML procedures, also compared to previous statistical analysis of wafer scale 2D materials1-4 is not entirely clear.
The hysteresis is originated from the complex interfaces, which have been widely accepted as the trapping/de-trapping processes of gate-oxide and oxide-channel interface [1]. These traps are mainly caused by the adsorbed impurities (water and gas molecules, etc.) on the channel surface [2] and the trapped charges in the channel, dielectric, and at the channel/dielectric interfaces [3][4][5]. The TG-FET structure adopted in our work excludes the influence of most impurity molecules, so the hysteresis of our device is mainly affected by the latter.
During the preparation of this reply, we measured the devices fabricated before to obtain the hysteresis characteristics, and the following has been added in the revised supplementary materials:  with different processes (a-d in Table R1) at VDS = 0.1 V. (Fig. S8 in SI)

Impact of seeding layer on the hysteresis characteristics for MoS2 TG-FETs
In this group of comparison experiments, the hysteresis characteristics for MoS2 TG-FETs prepared with various seeding layers (SLs) are investigated. The fabrication recipes and the hysteresis characteristics for top-gated MoS2 FETs are shown in Table S7 and Fig. S8. It is found that the process using 2nm SiO2 as seeding layer has the smallest hysteresis, which indicates that the border trap density of the SiO2/HfO2 interface is less than that of the other conditions 7 . The hysteresis voltage is determined by the VT difference between the dual-sweep transfer characteristic curves.
Meanwhile, the border trap density can be extracted by measuring the low-frequency noise, shown in Fig. R2 (Fig. S10 in SI). A relatively small border trap density of 3.5994×10 19 cm −3 eV −1 is estimated from the device with the smallest hysteresis (SiO2/HfO2 interface), lower than the value stated in previous reports [6][7][8].
In our previous work [9], we also systematically investigated the hysteresis of 2D FETs based on micro-scale mechanical exfoliated MoS2 sheets by optimizing the TG material and structure, as shown in Fig. R3. Graphene TG gives a clean interface between TG and the dielectric layer, giving rise to a small FET hysteresis. Nevertheless, it was not adopted in this work because of its impracticality and complexity of wafer-scale fabrication.      Q2: Throughout the report, the authors investigate a few key parameters for the performance of the transistors, such as mobility and threshold voltage. One of the important factors especially for extremely scaled transistors is contact resistance. Did the authors investigate the influence of the process parameters on the contact resistance?

Reply to the reviewer:
We thank the reviewer for the valuable suggestion. We strongly agree that the contact resistance (Rc) is a key parameter for the practical application of 2D FETs. A low Rc is critical to improving on-state current and high-frequency performance. Previous studies have shown that the high contact resistance stems from the unique interface between 2D semiconductors and 3D metals [10,11]. In modern Si CMOS technology, heavy doping through ion-implantation or alloying of the contact region is usually adopted to achieve an Ohmic-contact, but this is apparently not applicable to 2D materials because lattices can be easily destroyed due to their atomic thickness Unfortunately, the advantages of these methods displayed in mechanically exfoliated samples are no longer evident for wafer-scale fabrication because of their uncontrollability, complexity, and incompatibility with industrial equipment.
To obtain a method suitable for large-scale preparation, we simply deposited four types of metals (Ti/Au, Au, In/Au and Ag/Au) as the source and drain electrodes, as shown in Fig. R4 (Fig.   S2 in SI). The results of extracted Rc in Fig. R5 (Fig. S3 in SI) show that in this comparison group.
Obviously, the devices with Au or Ti/Au electrodes exhibit a better contact and larger on-state current. We did not further examine the physical insight of these results because this work focuses on the optimization methodology of wafer-scale fabrication.   give values in the MHz region, which is two orders of magnitude larger than obtained in this work (https://doi.org/10.1021/nl302015v). The investigation of the low operating frequency may give insights into further improvement of the circuits.

Reply to the reviewer:
We thank the reviewer for the insightful suggestion.
The frequency is indeed a significant index for the ring oscillator (RO) device. The selfoscillation frequency is mainly influenced by the driving current, gate capacitance, parasitic capacitance, stage # and supply voltage, etc. We agree with the reviewer that its self-oscillation frequency is indeed lower than that obtained in the early work [16], which is mainly caused by the following reasons: (1) In most of the previous results, mechanically exfoliated single crystalline MoS2 sheets were adopted to fabricate RO devices, and the device performances of them are much higher than those wafer-scale polycrystalline MoS2 films synthesized by the CVD method; (2) The channel length of the previously demonstrated devices (< 1 micron) is much smaller than the size in our work (tens of microns), which gives rise to a larger drive current, as well as a smaller gate capacitance; (3) In our MoS2 transistors, a high parasitic capacitance further reduces the RO frequency due to a relatively large overlapped region between the gate and source/drain electrodes.
Considering the above reasons, it has a large room for future improvement of our MoS2 circuits through optimization of fabrication techniques, such as fine alignment during successive lithography steps and further scaling down through Electron-beam lithography (In this work the laser direct-writer is used to perform lithography, which is fast but limited by a low resolution). However, at this moment, these are not the key issues of this work. To make this clear, we have modified the manuscript as follows: The self-oscillation frequency of our RO s is relatively low compared with previous results 11 , but there is a large room for future improvement via down-scale of device size. Q4: Most of the measurements were performed in either DC mode or at a very low frequency. It seems important to discuss the possible operating frequencies and the limiting factors.

Reply to the reviewer:
We thank the reviewer for the constructive comment.
The following discussion has been added in the revised manuscript: On the other hand, sensing applications require high-frequency operation and thus it is important to explicitly mention that the measurements were acquired in low-frequency mode.

Reply to the reviewer:
We thank the reviewer for suggesting this useful reference [21], in which the operating modes of the MoS2 photodetectors were investigated systematically. We also cite this work in the revised manuscript.

Reply to the reviewer:
We thank the reviewer for the insightful suggestion. This is indeed a milestone literature by demonstrating fundamental building blocks of analog electronic devices [23]. We have added this paper in the revised manuscript and the revised Table S8 in the supplementary information. The mobility in the table has also been noted as the maximum mobility according to the suggestion. Q7: The yield of a 1-bit full-adder with 39 n-FETS is 50%. It is worth to discuss the reasons for the failures in non-working devices and possible routes to overcome these issues.

Reply to the reviewer:
Substrate Area We thank the reviewer for the constructive suggestion.
As we all know, there remain lots of obstacles for the wafer-scale application of 2D materials, including scalable synthesis and device processing [24][25][26][27]. In this work, the failed 1-bit full-adder circuits are all tested, and failure reasons were summarized as the gate leakage, high barrier contact, and unstable VT drift, all of which can result in incorrect logical output in the adder circuit. The following explanation is added in the revised supplementary materials: We attribute the low yield to two main issues: 1) The poor uniformity of MoS2 film, such as grain boundaries and local defects, is detrimental to the yield of wafer-scale integrated circuits.

Reviewer#2&4
The manuscript "Wafer-Scale Functional Circuits Based on Two Dimensional Semiconductors with Fabrication Optimized by Machine Learning" by Xinyu Cheng and coauthors describes a Machine Learning based process optimization method. This this method was used to optimize the performance of MoS2 channel field effect transistors. After the authors identified the optimized process, it was used to fabricate different circuits and to realize wafer scale MoS2 device fabrication. While it is an interesting approach, the paper does not convincingly describe the advantage of the approach compared to conventional process development. In particular, it is not clear where the ML-based pattern recognition provides an advantage over classic design of experiment guidelines. I therefore think this manuscript is not appropriate for publication in Nature Communication.

Reply to the reviewer:
We sincerely thank the two reviewers for the careful reading and suggestive feedback, and we have made the following revision to our manuscript according to your valuable suggestions and comments.
Q1: What is the advantage of the process optimization based on ML, over the traditional way (in which we optimize each step and then combine whatever we would like to have for a specific application). The traditional way of designing an experiment even appears to be more efficient, since one may not need to go through all fabrication process possibilities. In a clean DOE, one can run specific process combinations and just by looking at the data identify the most promising route.
In the current form of the manuscript, this aspect is not discussed at all.

Reply to the reviewer:
We thank the reviewer for the insightful suggestion. The advantage of the ML is indeed the core point of our work. Although we have tried to clarify it in our original manuscript, we agree that it was not clear enough and requires additional illustration. The following is a more detailed explanation: (sentences with black italic are added in the revised manuscript) The most promising property of 2D semiconductors is the ultimate confinement in the perpendicular dimension, which is approximately several atoms thick. Such intrinsic nature makes it extremely sensitive to the exterior environments and fabrication processing. For example, various processing recipes, exterior temperature, humidity, exposed atmosphere, etc., can influence the final device performance, especially for the top gate (TG) structure. In our research group, a detailed experimental record table, shown as below, has been used for years to fabricate TG MoS2-FETs. Actually, all our data were collected by more than ten graduate students during the past five years.
Although we can optimize each step independently, and this is what we have been doing for the above list for years, and indeed we had accumulated lots of useful data. Nevertheless, we noticed that one could not simply "combine whatever we would like to have for a specific application" for the TG FETs because successive processing steps are coupled together through the 2D channel interface and an ultrathin top dielectric layer, and thus can strongly influence each other, as shown below (Fig 1b in the manuscript). For example, the contact fabrication that has already been optimized can still be influenced by the successive growth and annealing of the dielectric layer, as well as the top gate fabrication.
The most critical interface is the top surface 2D channel, although it is only directly contacted by the seeding layer, the ALD growth of high-k dielectric and TG fabrication can also influence the performance of the 2D channel, such as a drift of the VT, because the thickness of TG dielectric is only tens of nanometers. Thus, all individual processing steps are highly coupled because any subsequent processing steps will influence the previous ones, making the processing optimization of 2D semiconductors more complicated than those in bulk semiconductors such as Si and Ge.
Another example is that after the annealing of TG dielectric, not only the contact interface between the 2D semiconductor and the 3D metal electrodes is improved, it is also advantageous to repair the oxygen defects in the dielectric layer. Thus, the interface between the 2D channel and the dielectric layer can also be improved. We can list many more similar examples, and one can hardly verify one by one via careful characterizations and comparison experiments.
On the other hand, since the gate-last architecture (TG FET) is our goal, the traditional step-by-step optimization is not practical, since it requires careful characterization after each step [28][29][30]. But in TG FETs, we can only measure the device after the TG is finalized (see more details in the next question). Therefore, if the traditional design of experiment (DOE) method is adopted, a large number of combinations are needed for comparison, which dramatically increases the research workload and reduces the optimization work efficiency.
As far as we know, the fabrication optimization methodology is also the key R&D topic for industrial IC fabrication foundries, which have already paid much attention to the so-called "yield ramp up". Because in advanced technology nodes, such as the 7-nm-mode which includes more than 3000 processing steps, it has become more challenging to do optimization and "search of key factors" only by human experience. Before we started this research project, we collaborated with Samsung to optimize the fabrication processes through a machine learning algorithm, which indicates the recognition of such methodology by the industrial community. However, we can not provide more details here due to our previous commercial agreement. Besides, one of our corresponding authors (Jing Wan) has worked in Globalfoundries Inc. for years as a process integration engineer and, another co-author Ye Lu, has also worked in Intel Inc. for years as a process integration engineer (10-nm-node). According to their experience, high-throughput data mining techniques have already been used extensively to analyze the data and optimize the process in the industry. Prof. Lu also collaborated with a famous semiconductor consulting company "PDF solution Inc.", which also uses similar machine learning algorithms ("expert system") to provide their service for foundries, as shown below: The following has been added to the revised manuscript: for understanding material quality issues, failure modes, contact resistance issues and so on. These details appear to be ignored in the approach.

Reply to the reviewer:
We thank the reviewer for the constructive suggestion. We apologize for the unclear description in the previous version, and the following is a detailed explanation:

Reply to the reviewer:
We thank the reviewer for the insightful suggestion. We agree that our previous illustration was inaccurate and a little misleading. Here we provide a more explicit discussion: As mentioned by the referee, intrinsic mobility is only related to the properties of the material and not related to the contact. The mobility extraction methodology adopted in our work is the so-called "Y-function method", which has been widely used Si FETs and also suitable for 2D FETs

VGS.
Through the Y-function, the extracted mobility can theoretically rule out the impact of contact resistance. Although, as discussed in the recently published paper [34], μY still depends on the contact resistance Rc, it is more accurate than that of . The "4-probe method" is indeed more accurate since the contact resistance can be excluded entirely. However, the "Y-function" method is more convenient since it is performed directly on the fabricated MOSFET, and thus widely used for 2D FETs [37-38].
In order to avoid misunderstanding, we have modified the manuscript:

Reply to the reviewer:
We thank the reviewer for the constructive suggestion. We also apologize for the unclear explanation of the data in Fig. 2f.
To clarify this, the original gray dots are replaced by the colored dots in the revised Fig. 2f.
Each color corresponds to one processing combination. Most processing combinations were designed by our experiences based on step-by-step optimization. The main text is revised as: "We then demonstrate that ML can also be used to co-optimize all process parameters, as shown in Fig. 2d. After the EL training, a score predictor can predict the results from a specific processing combination (i.e., one process recipe). All possible process recipes are then sorted using a grid search method, as shown in Fig. 2e. To demonstrate this, we fabricated more than 500 MoS2 FETs, which are summarized in the μ-VT plot in Fig. 2f Fig. 2f ) following the suggestion of the sorting result (red arrow in Fig. 2e). This recipe combination (details see Supplementary Table 6 Fig. 2g. "   Fig. R8 (Fig. 2f). More than 500 MoS2 TG-FETs summarized in a μ-VT plot. Each color corresponds to one process recipe. The red stars are the results of the process recipe in e pointed by the red arrow.

) also gives rise to an average μ about 75 cm 2 /V•s and VT of 2.1 V, as well as a high wafer-scale uniformity that is important for large-scale circuits, as shown in
Q5: The manuscript treats the device aspects as a black box, i.e. there is no correlation between measured values and the underlying physics. However, in device operation and optimization, this understanding is extremely important. It would be very interesting to understand how ML could be used to gain understanding in these intricate details?

Reply to the reviewer:
We thank the reviewer for the kind suggestion. We strongly agree with the referee that the

Processing
Step 3 guided by ML also reveals some underlying physics that can explain our experimental data.

Grid search
For example, in Table R2 (Table S6 in   shown in the past with statistical analysis for graphene and MoS2 devices1-4 It is easy, because the differences in devices and circuits are very large from process parameter to process parameter.
It would be much more interesting to understand if the method can also work to optimize a mature technology, where the changes from run to run are miniscule.

Reply to the reviewer:
We thank the reviewer for the insightful suggestion.
Since machine learning is a data analytics technique that teaches computers to learn from accumulated experimental data, this method can be adopted in most areas as long as the data set is sufficient. In fact, the ML optimization method is even more suitable for mature technologies since the stable process can generate a large amount of data with less variability and noise.
As mentioned in the answer to Q1, one of our corresponding authors (Jing Wan) has worked in Globalfoundries Inc. for years as a process integration engineer and, another coauthor Ye Lu, has also worked in Intel Inc. for years as a process integration engineer (10nm-nodel). According to their experience, data mining techniques have already been used extensively to analyze the data and optimize the process in the industry.
Q7: In summary, the authors show some circuit examples and wafer scale fabrication. These represent in themselves nice results. However, these results are not discussed in detail and no insight is provided on the relationship between process, underlying physics phenomena and device / circuit performance. As such, this aspect of the paper does not contribute to the state of the art.
Thus, the remaining aspect is the focus on machine learning. Here, the device / circuit results are not discussed in detail in respect to the machine learning process development, which arguably is the focus of the paper. Hence, the relevance of the ML procedures, also compared to previous statistical analysis of wafer scale 2D materials1-4 is not entirely clear.

Reply to the reviewer:
Again, we sincerely thank the reviewer #2 & 4 for their constructive suggestions. We hope the above reply can address the questions raised by them.
To summarize, the focus of this work is to optimize the fabrication process efficiently for the appropriate device performance in line with the requirements of MoS2 circuits. We do not focus on one parameter, such as mobility and subthreshold swing, and we also agree that the performances of our devices are not "state of the art". However, unlike other research [40-43] that focuses on optimizing one individual factor, various factors in many variables and process combinations are comprehensively considered with the assistance of a machine learning algorithm.
Therefore, we focus on a more comprehensive optimization of the MoS2 FETs for wafer scale application, which is pretty challenging, as already explained above. As far as we know, no results show working TG structured enhance-mode MoS2 FETs under a satisfactory wafer-scale uniformity.
Regarding the "the device/circuit results are not discussed in detail in respect to the machine learning process development", it has been added in Supplementary Fig. S9 and discussed in section 18, where we can see that both processing and material uniformity are critical to the yield of circuits: We attribute that the relatively low wafer-scale yield to three main issues: 1

) The quality and uniformity of MoS2 films, such as grain boundaries and local defects, is detrimental to the yield of wafer-scale integrated circuits. It can be improved by a further upgrade of synthesis methods and facilities.
2) As shown in Fig. S9, the uniformity of MoS2 FETs still depends on the processing recipes. The transistor size might also influence the yield since the length and width of the MoS2 FETs are not variable parameters in our ML algorithm.
3)The quality of processing tools and cleanroom grade (10 4

Reviewer#3
This manuscript presents optimized fabrication of CVD-grown MoS2 for scalable circuits by involving the idea of machine learning. In this study, the MoS2 is grown on sapphire with solid precursors of MoO3 and S powders and the enhancement-mode FET are fabricated with gate-last process. The authors tend to conclude that device fabrication and performances are optimized by machine learning. Representative devices on digital, analog, and optoelectrical circuits are presented. Overall, this study is helpful for following research. The referee would further consider acceptance for publication if the authors could carefully address following issues:

Reply to the reviewer:
We thank the reviewer for carefully reviewing our manuscript and raising many insightful comments. We also appreciate the reviewer for recognizing that our study "is helpful for following research" along with a positive recommendation. Q1: In Fig. 2 a., the MoS2 in the process is marked as 2-3nm (~3-5 layers), but the authors claim a monolayer MoS2 film in the Fig. S1. Thickness of the grown sample, such as monolayer or few layer MoS2 film, is significant to fabrication and performance. It is needed to confirm this issue and provide essential data, such as PL and Raman mapping of the representative devices. If the few layer MoS2 is adopted for most demonstrations, please explain the reasons.

Reply to the reviewer:
We greatly appreciate the reviewer for pointing out this problem in Fig. 2a. The MoS2 thickness noted as "2-3nm (~3-5 layers)" was indeed a mistake made by us, which should be corrected as 0.8nm (~ monolayer).
To confirm this, Raman and PL spectra are characterized for our MoS2 film, shown in Fig.   R9(a-b). An additional AFM image is also shown in Fig. R9(c)    Q2: In Fig. 1.b, the seeding layer is partially deposited on the active region of the FET. Please explain why not fully deposited on the active region? The asymmetric FET design might cause some issues.

Reply to the reviewer:
We thank the reviewer for the careful reading.
During the practical processing of preparing the device, the seeding layer actually fully covers the entire channel region for the following growth of the high-κ dielectric layer, which means the device structure is symmetric as the conventional transistors. Fig. 1b is merely a schematic view, and it can more clearly show the cross-sectional view of the gate stack with several different interfaces, and the seeding layer acts as a buffer layer between the MoS2 channel and the high-κ dielectric layer (HfO2).
In the revised manuscript, we add an explanation in the Figure caption: "… the seeding layer is actually fully deposited on the complete channel region…." Q3: In fig. 2, the device is optimized with mobility with the Vth of ~2.1V. It would be ideal to include more discussion to explain how to realize the Vth tuning in loading transistor and keep the optimized mobility.

Reply to the reviewer:
We thank the reviewer for the insightful suggestion.
Multiple factors can influence the VT of the FETs, including 1) material type, thickness and deposition method of seeding layer; 2) material type, thickness and growth temperature of high-k material; 3) material type of the top gate electrode. An overall optimization is extremely difficult. This is also the motivation for applying the machine learning method.
In the revised supplementary material, one more section is added for illustrating the influence of gate metal:  Q4: In this study, the authors mainly focus on mobility and Vth but more significant properties of the device are essential for real application, such as speed and power consumption.

Reply to the reviewer:
We thank the reviewer for the constructive comment.
Actually, one of the determinants of the speed is the RC delay which is mainly affected by the gate capacitance, parasitic capacitance, and equivalent resistance of the transistors. It is analyzed in detail in the following questions.
Regard the device speed, RC delay is the main limiting factor that restricts the operating frequency of this circuit. The RC delay is mainly affected by the load capacitance and resistance, gate capacitance, parasitic capacitance, and equivalent resistance of the transistors in the circuit. To simplify the situation, we exclude the influence of additional load capacitance and resistance in the circuit, so the operating speed is mainly determined by the cut-off frequency T = m 2π G of the MoS2 transistor, where m is the transconductance of the channel and G is the equivalent gate capacitance. In Fig. R12 ( Fig. S15 added in the revised SI), we measured the experimental gate capacitance G , which is approximately 4.5 pF when g [1.5,3.0] V , and the transconductance m ≈ 3.8 μS when g [1.5,3.0] V . Therefore, the maximum value of T is approximately 134.5 kHz, which can be regarded as a reference value of the possible circuit operating frequencies. In general, it is necessary to manipulate all of the FET parameters, including load capacitance, the on-state current, VDD, VT, Ion/Ioff, SS, and the operating frequency to optimize the overall speed and power consumption, as shown in Fig. R13 (Fig. 1c in the main text). Q5: Fabrication of top gate dielectrics on the surface of 2D materials is significant to device performances. It would be ideal to include more discussion on this issue and more details on ALD process of the high k dielectrics.

Reply to the reviewer:
We thank the reviewer for the beneficial suggestion.
The deposition methods of large-scale uniform high-κ insulating materials via ALD are also critical in Si CMOS devices and have been developed for years. Those methods can also be adopted in 2D FETs [49][50][51]. ALD is the self-limiting growth mode in which saturation can be achieved at each step of the reaction process under appropriate conditions. However, due to the absence of dangling bonds on the surface of 2D materials, the direct growth of high-κ dielectric layer such as HfO2 is rather tricky. Therefore, the seeding layers such as SiO2, Al2O3 and YO2 are commonly adopted as a buffer layer between the high-κ dielectric and the channel to ensure the quality of the high-κ dielectric layer and perfect device performance [52,53].
To include more details on the ALD process of the high k dielectrics, we also added the following detailed recipe description in the revised Supplementary Materials: Q6: It seems that the devices are directly fabricated on sapphire wafer. Is it required to avoid damage in the transfer process for better electronic performances? It would be ideal to include more discussion on the issue because further fabrication or integration with the sapphire wafer might be issues.

Reply to the reviewer:
We thank the reviewer for the kind suggestion. Regarding the technical issues of further fabrication or integration with the sapphire wafer, in fact it is not a big problem because all the subsequent device processing steps are compatible with the current CMOS technologies. Moreover, the sapphire substrate can be processed at a high temperature of more than 1000 ℃. The only issue is that the sapphire substrates are more expensive than the Si wafer, but the cost of sapphire is now decreasing, and a 6-inch wafer is already widely available for industry applications such as GaN technology.

Reply to the reviewer:
We thank the reviewer for the constructive suggestion.
Actually, there is a technical problem with high-frequency measurement. Compared with the impedance of a regular oscilloscope, the equivalent output impedance of our MoS2 pseudo-NMOS inverter is too high to allow the detection of the output signal through an oscilloscope.
Thus, a standard oscilloscope measurement is not possible, and more professional RF

Reply to the reviewer:
We thank the reviewer for the insightful suggestion. We acknowledge that the response speed of our photodetector based on 2D MoS2 FETs is relatively slow, mainly due to the photogating effect [60], which is the dominant mechanism in our device.

Reply to the reviewer:
We thank the reviewer for the constructive suggestion. can be adopted as the top gate electrode to adjust the VT of the transistor, so that the pull-up transistor becomes a depletion-type device, the resistance of which is much smaller than that of the enhanced-type device at a particular voltage [65]. In this way, the equivalent impedance of the pull-up network can be reduced without increasing the size of the device, which is beneficial to improve the operating speed and power consumption of the integrated circuits. Q10: In most reported papers on the grown MoS2, overall performances are usually determined with many issues, such as grain size, interface, crystallinity, defect density, variation in the batch synthesis and more process details. The issues might be highly coupled. It might be a bit difficult for readers to understand how machine learning could work for the optimization.

Reply to the reviewer:
We thank the reviewer for the insightful suggestion.
We completely agree with the referee that the material characteristics of wafer-scale MoS2, such as crystallinity, grain size, grain boundary property, defect density, etc., play an important role in device performance. These material characteristics depend on the synthesis recipe of MoS2, which involves precursor types, growth temperature, carrier gas and other factors. Thus, if the material synthesis is coupled with the device processing, the burden of experimental work will be much increased and beyond the capability of a research lab. So, in this work, the optimization of material synthesis is not the focus of this paper. The synthesis recipe adopted in this paper has been investigated in detail before [66], which is relatively stable and is capable of synthesizing largearea films with high quality.
Therefore, the focus of this work is to optimize the fabrication process efficiently based on a fixed material. Actually, the ML method is quite like a black box, which is the working principle of ML (simply speaking, a statistical classification algorithm for large data sets), and people have already applied ML methods to assist the searching of low-dimensional materials [67]. In our case, ML represents a computer-aided learning process from large amounts of device data and human experience of device optimization.
Though ML represents a computer-aided learning process from large amounts of device data, the optimization process with ML also reveals some physical mechanisms behind the experimental data. For example, in Table S6

Reply to the reviewer:
We thank the reviewer for the beneficial suggestion.
According to this suggestion, we rovide a complete version of the optimized fabrication process in the supplementary information, shown below. We hope this can be helpful for other researchers to repeat our results.

Reply to the reviewer:
We sincerely appreciate the reviewer for the elaborative reading of the reply letter and affirmation of our work. It is very encouraging to receive your positive comments on our work. Thanks again.

Reviewer #2 (Remarks to the Author):
The impressive. Hence, I am not convinced by the manuscript that machine learning is the best approach towards demonstrating a fledgling technology.

Reply to the reviewer:
We sincerely thank the reviewer for carefully reading our reply letter and revised manuscript. The additional insightful comments put forward by the reviewer are of great help to us. We also follow these comments to strengthen the discussion of machine learning in 2D device processing, and we hope that this version can qualify the publication standard of Nature Communications.
First, we entirely agree that the optimization can also be obtained by "selection the metal of choice, dielectric of choice etc. based on careful design of experiment and literature study".
For example, people have exerted many efforts to achieve single-step optimizations, such as compatible contact [1,2] and dielectric [3,4] recipes for 2D semiconductors. However, it is inconceivably tough to co-optimize the complete processing because all individual processing steps are highly coupled (as we explained in the previous reply letter, 2D semiconductors, especially those with small band-gap, are extremely sensitive to the exterior environments and fabrication processing, so any subsequent processing steps will influence the previous ones). This makes the process optimization of 2D semiconductors more complicated than those in bulk semiconductors such as Si and Ge. Thus, it inevitably requires numerous experiments for comparison and verification to comprehensively improve the device performance, especially under a conventional full-factorial design-of-experiment (DOE), which is highly time-consuming and labor-consuming. As far as we know, no experimental results have been reported before on a comprehensive optimization for 2D semiconductor device fabrication.
Moreover, unlike most previously reported results on large-scale 2D devices and circuits, the structure adopted in this paper is top-gate (TG) instead of their bottom-gate (BG) structures [5,7,8,10,11]. BG device architecture can avoid the TG doping problem, but it is inevitable to adopt a large-scale transfer technique, introducing extra impurities from the transfer tape and defects into 2D semiconductors. The uniformity also degrades after the transfer of 2D films, which is detrimental to the yield of large-scale circuits. Thus, it is unfair to compare our work with those based on BG-FETs. For TG structured 2D devices, it is difficult to perform a "step-by-step" optimization. For example, the contact fabrication that has already been optimized can still be influenced by the successive growth and annealing of the dielectric layer, as well as the top gate electrode deposition. Another example is that after the annealing of TG dielectric, not only the contact interface between the 2D semiconductor and the 3D metal electrodes is improved, it is also advantageous to repair the oxygen defects in the dielectric layer so that the interface between the 2D channel and the dielectric layer can also be improved.
Regarding device performance comparison, we also agree with the statement by the reviewer, "There is really no category in which the individual performance clearly outperforms the state-of-the-art". However, previous results focus only on individual factors, such as mobility [8,9], threshold voltage [11], and subthreshold swing [4], etc. In our work, we tackle a more comprehensive optimization and wafer-scale device fabrication with a TG architecture, rather than aiming for individual "state-of-the-art" performance. As far as we know, no reported results show working TG structured enhance-mode 2D-FETs under a satisfactory wafer-scale uniformity. Most of the reported TG 2D-FETs with atomic layer deposition of high-k dielectric layer suffer from severe n-doping, limiting the cascading of large-scale circuits. It is also challenging to obtain a uniform and high-quality dielectric layer on wafer-scale 2D semiconductors, mainly because they lack dangling bonds on the surface for a homogeneous reaction.
Lastly, for the early development stage of devices based on emerging advanced materials, material quality variation is relatively large, hindering the following device optimization.
Therefore, the light-weighted machine learning process used in this work can quickly locate the crucial aspects, and the predictive scoring and grid search were adopted to recommend possible working combinations. Combined with the knowledge and experience of the experts, the speed of device optimization could be significantly improved, which also converges the investigation efforts on device applications. So machine learning could be a powerful tool to assist researchers in reducing the investigation burden.
At the end of the supplementary material, we also compared the device performance and the circuit scale with previous works, as shown in Table R1. Our MoS2 TG-FETs exhibit a satisfactory comprehensive performance, and the maximum transistor number in a functional circuit.