CLEAR: A Holistic Figure-of-Merit for Post- and Predicting Electronic and Photonic-based Compute-system Evolution

Continuing demands for increased computing efficiency and communication bandwidth have pushed the current semiconductor technology to its limit. This led to novel technologies with the potential to outperform conventional electronic solutions such as photonic pre-processors or accelerators, electronic-photonic hybrid circuits, and neural networks. However, the efforts made to describe and predict the performance evolution of compute-performance fall short to accurately predict and thereby explain the actually observed development pace with time; that is all proposed metrics eventually deviate from their development trajectory after several years from when they were originally proposed. This discrepancy demands a figure-of-merit that includes a holistic set of driving forces of the compute-system evolution. Here we introduce the Capability-to-Latency-Energy-Amount-Resistance (CLEAR) metric encompassing synchronizing speed, energy efficiency, physical machine size scaling, and economic cost. We show that CLEAR is the only metric to accurately describe the historical compute-system development. We find that even across different technology options CLEAR matches the observed (post-diction) constant rate-of-growth, and also fits proposed future compute-system (prediction). Therefore, we propose CLEAR to serve as a guide to quantitatively predict required compute-system demands at a given time in the future.


System level CLEAR breakdown
The model of our system-level Figure of Merit (FOM) consists of Capability-to-Latency-Energy-Amount-Resistance (CLEAR), and consists of the following details: 1) Capability: is the product of million instructions per second (MIPS, in [million instruction/second]) times the instruction length (in [bit/instruction]). Thus, it represents the data-handling performance of a compute system. Although floating point operations per second (FLOPS) is also commonly used, MIPS is better suited in a performance comparison that includes historical systems of the 20 th century with modern systems. Since the instruction length varies among different computer systems, we use the product of MIPS times the instruction length in units of bit-per-second as the general capability of any compute system to process data. 2) Latency: Clock speed in [second]. While clock-less or asynchronous clocking are explored, here we focus our discussions around regularly clocking systems only. Clock speed is one basic metric to compare the operating speed of different computer systems, since it presents the minimum time-delay any bit of information is able to traverse (time of flight) inside the system. 3) Energy: Energy consumption of the compute system in [watt]. 4) Amount: Volume of the system in [mm 3 ]. Here the volume used for each compute system includes the associated accessories and cooling infrastructure. This is critical to obtain an accurate volumetric 'cost' when comparing different types of compute systems. For example, while supercomputers deploy a large number of cores to achieve high performance, they require enormous amounts of power. In fact, modern datacenters do not have a better (Performance/Cost)-ratio than modern personal laptops. On the other side, portable computers like smartphones sacrifice the performance for size and energy efficiency. Therefore, the Amount should not simply be limited to the areal footprint, but include the volume of the entire compute system. 5) Resistance: The resistance represents the economic model based on the Boston Consulting Group (BCG) experience curve model in [$], which is defined as "each time the cumulative production doubles, the unit cost falls by a constant percentage" [Ref. 30 in the manuscript]. This model reveals the relation between learning curve effects and the economic phenomenon (which relates to the labor efficiency, shared experience effect, use-cost reduction, etc.) and is valid among a broad range of industries. Based on this, we derive a log(unit price) vs. time relationship and verified it by using the historical learning curve of a semiconductor device [R1]. Note, this relation is confirmed by the historical data of transistor cost, which shows the linear relationship between time and the logarithmic price.
Using this model, we obtain a cost prediction of the silicon photonic chips and devices in the future derive based on their recent fabrication cost. Although the unit transistor cost started to rise in the recent few years and deviated from its original BCG model, we believe that the BCG model for silicon photonics will still be valid for a few decades, since we are still at the very beginning of this novel technology and there are still plenty of room left until we reach the flat bottom of its learning curve.

Other Dominating FOMs in Compute Systems
Here we provide a brief introduction and summary of the mathematical framework of our approach resulting in Figure 1 from the main manuscript, i.e. for the conventional system level FOMs. Note, some of the conventional FOMs were introduced later than 1955, but we could still reproduce those FOMs based on the historical compute system data.

Moore's law:
It is an observation that the number of device elements doubles every 12-month and it was first introduced by Gordon Moore in 1965. With the development of the semiconductor industry, the device element was later modified to transistor count and the rate reduction down to 24-month. This observation is a simplistic, empirical observation that the industry uses as a roadmap of the semiconductor evolution speed. The following equation is used to reproduce the Moore's law curve in Fig. 1: ( = Koomey's law: This is an energy efficiency metric, which assumes the doubling time of the computation per Joule to be approximately 1.57 years. However, since it is ambiguous to define the computation among different computers, we use the number of bits as a general quantifier for the various compute systems. Thus, the Koomey's law curve we show in Figure 1 has the units of [bit/Joule] based on the following equation:

Makimoto's FOM:
This is a metric which is related to CLEAR, however, the difference between the two is that it only uses intelligence, size, cost and power but does not consider latency to describe the compute systems. This shortcoming is however critical because the latency is also a significant factor that relates to the performance and energy trade-offs. Moreover, the data bandwidth, MIPS (millions instructions per second), is a subjective metric that may vary from computer system to system. The Makimoto's FOM is given by:

Historical Compute System Data
All the data (performance, cost, size, power and other CLEAR related parameters) of the compute systems from the 1940s to 2010s are collected online from papers, computer manuals and webpages. A Microsoft Excel sheet (file name: compute system data.xlsx) of all the data could be found in the attachment together with this supplementary file.

Compute System Evolution Model
To proof the main assumption in the manuscript that the driving force of the compute system changes as the time goes on, it is very important to find the main driving force for each period of time and try to compare it with the history of the compute system evolution. Moreover, as it has been proven that CLEAR is the only FOM that is able to track the evolution with a linear (in loglog scale) growth rate, all the dominated driving forces could be considered covered in CLEAR. Thus, a factor breakdown analysis is the key to find the driving force for each time.
For this factor breakdown model, the goal is to first separate one single factor from CLEAR to see when the evolution rate starts to deviate from its original speed, and then add the next factor to it. Following the history of the compute system, the linear (in log-log scale) growth region is expected to be wider and wider until it becomes fully linear when all five factors (i.e. CLEAR) are considered.

Technology Substitution Model Analysis
To consider the actual technology substitution and make predictions in the compute systems is extremely complicated. However, recent studies on comparing data computing and data Figure S1. The factor breakdown of the CLEAR for compute systems.
communication have been done and draw the conclusion that the communication scaling is orders of magnitude more efficient and meaningful than the logic scaling since the logic building blocks are already approaching the fundamental physical limits in the quantum level [R2]. Therefore, we assume that interconnects in the system is going to eventually dominate the overall performance and all the prediction models are made for interconnects only.

Link level CLEAR model
The five-component link-level CLEAR FOM is comprised as follows: 1) Capability: The capability of a link in the unit of Gbps is calculated based on the Shannon equation for a noisy channel, which relates to the bandwidth of the entire channel and the signal to noise ratio. 2) Latency: The latency of a link is the point-to-point latency in the unit of picoseconds, which is given by the time of flight from the light source to the photodetector, and is a function of the individual waveguide's and device' model group index. 3) Energy: Energy consumption of a link in units of femtojoule per bit. It includes the energy consumed by the sum of active devices, and passive data routing components. 4) Amount: The area of a link in the unit of um 2 includes the sum of all device areas, to include the light source, waveguides, modulators, detectors, splitter, rings etc. It further incorporates required spacing to prevent the crosstalk between adjacent waveguides based on our previous work [R3]. 5) Resistance: For the link level, we use the economic resistance model from BCG discussed above. The optical link fabrication cost based on the total cost of an optical wafer (~$50,000) and the number of links that can fabricate on a single chip (~70,000 mm 2 ).

Device Parameters for Link Comparison
All the device parameters are borrowed from our previous work [R3]. The numbers are relisted in Table S1-S3. Note, only the hybrid link (i.e. HyPPI) is showed in the comparison with electronic links in the manuscript. The reason is that for a compute system, both short (μm ~ mm) and long (mm ~ cm) distance communication is needed. For a photonic interconnect which favors long-haul data transmissions with low propagation losses, the footprint on-chip makes it unfeasible for small scales of connects due to the diffraction limits and weak light-matter interaction. However, with ultra-fast operating frequency and sub-wavelength device scale, plasmonic links become the perfect option for small distance communication, but its high ohmic loss prohibits it from longer scaling. None of these two interconnect options is able to surpass the electronic link in the full communication range (μm ~ cm) and thus we decided to only show the comparison between electronics link and hybrid plasmon-photonic link. Table S1. Device latency (ps) for different interconnect options with link length L in μm.
Shaded data are used in Fig. 2   Equations used in Table S1-S3 are:

Link Parameters
All the optical links include three major components: light source, modulator and detector ( Figure S2). The light first generated from the light source and then transmitted to the modulator which controlled by an electronic driver. After the light has been modulated, it propagates through the next segment of the waveguide and been detected by the detector and converted into the electrical domain. The devices and waveguide for each technology option may vary, but the fundamental principles are similar as we shown in Figure S2.    The parameters in Table S5 are:

Fundamental physical limits of CLEAR factors
• Vacuum light speed c = 3 × 10 p m/s • Refractive index of waveguide n

Device level CLEAR model
The five-component device-level CLEAR FOM is comprised as follows: 1) Capability: The operating frequency of a device in the unit of [GHz] can be regarded as the capability of a device. 2) Latency: We replaced the latency with critical length, L, of the device, in the unit of [nm]. This is because the latency for each individual device is small, however, the critical length, which represents the length of the functional part of the device (e.g. gate length for transistor, perimeter of the photonic ring modulator and the modulation length of the plasmonic and HyPPI EOM), is also related to the latency due to the group index of the optical mode. 3) Energy: Energy consumption of a device in the unit of [fJ/bit] can be calculated by ½CV 2 where C and V are the capacitance and the driving voltage of the device respectively. 4) Amount: Since all the devices on-chip are arranged on the same plane, the amount of a device only needs to consider its 2D area in the units of [um 2 ]. 5) Resistance: The economic resistance on device level is the cost to fabricate such device in [$]. For transistors, the historical fabrication cost is known. However, for optical devices, there is no standardized fabrication cost readably available, since even Silicon photonics foundries cannot rely on high-volume data to date. Therefore, we estimated the optical device fabrication cost based on the total cost of an optical wafer (~$50,000) and the number of the device that can fabricate on a single chip (~70,000 mm 2 ). All the device data are taken from our previous work [R3].