Introduction

The Internet of Things1,2,3 has the potential to transform our daily lives, reduce global energy consumption and cut waste. This vision involves embedding electronic intelligence into billions of every day objects. Large-area printing techniques are an attractive option for manufacturing these electronics as they allow circuits to be produced in very high volumes on very thin, flexible plastic substrates. This reduces the cost and simplifies integration into everyday objects.

However, large-area electronics and the Internet of Things devices are currently just emerging and are stuck in the classic ‘chicken-and-egg’ situation of new technologies. They need to be manufactured in high volumes to become affordably priced, but they need to be affordably priced to be sold in high volumes.

In the traditional silicon-based semiconductor industry, technology cost evolution has been successfully described by Moore's Law4 for over five decades. Today, this famous prediction is usually stated as “the number of transistors on a single chip approximately doubles every two years.” Moore's original argument was economic rather than technological, based on the transistor density that enabled the minimum cost per transistor. Hence, Moore's Law can also be formulated as “the cost of a unit of computing power falls exponentially over time.”

However, Moore's Law isn't the only mathematical relationship for describing technology cost evolution. Arising from the aeronautical industry in the 1930s, Wright's Law5 postulates that the cost of manufacturing a product falls at a rate that depends on cumulative production. In other words, the more of a product we have made, the more we know about how to make the product efficiently and hence cheaply. Wright's Law has been found to be slightly more successful at predicting cost evolution than Moore's Law for a range of technologies6.

Moreover, as large-area printed electronics has limited scope for the transistor scaling, that drives Moore's Law, we can expect that Wright's Law will govern cost evolution in this field. However the Internet of Things covers thousands of different device types. Hence to accelerate Wight's Law evolution, industry needs a way to tie all these applications together to maximize the learning from cumulative production.

By its generic nature, a microprocessor offers the possibility to do just this. Flexible microprocessors were first produced by a poly-silicon thin-film transistor (TFT) transfer technology onto plastic film7,8,9. In 2012, we have reported the first microprocessors processed directly on plastic films with organic TFTs10 and subsequently with hybrid organic-oxide technology on plastic-film compatible process temperatures11. In this work, we expand the discussion of the microprocessor with on the one side an insightful presentation of its generic blocks, architecture and processor core chip and on the other side a detailed discussion of the memory architecture needed to allow any code to be programmed faultlessly by printing.

As in silicon, our thin-film microprocessor employs complementary logic for maximum robustness. As in11,12, we achieve this by using organic p-type and oxide n-type thin-film transistors. Both types of transistor are processed at low temperatures and so are suitable for production on any substrate. Differentiation for different product categories is achieved through the inclusion of a write-once, read-many (WORM) instruction generator that can be programmed via inkjet printing in a post-manufacturing step.

To optimize yield, we use a microprocessor architecture that minimizes transistor count. Our microprocessor measures 1.20 cm × 1.88 cm and includes 3504 transistors. It operates at 2.1 kHz. The WORM memory block is a 16-input unipolar NOR gate where 16 drive transistors can be added through inkjet printed jumpers and the load can be adjusted to optimize the gate's noise margin. Finally, to demonstrate the full functionality and programmability of our microprocessor, we implement the instruction code for an exponential running averager.

Results

Low-temperature thin-film electronics are based on organic or metal-oxide semiconductors13,14,15,16. They can be produced at low cost in massive volumes on plastic films using print-like processes on thin plastic films. This makes them an attractive alternative for silicon-based electronics. Low-temperature thin-film electronics are expected to find use in foil-based product applications such as smart food labels (RFID), wearable electronics, ... and in large-area electronic devices such as solar cells, flexible displays and flexible OLED lighting panels.

Fabrication process

The thin-film computer of this work is implemented using complementary thin-film transistors. Solution processable metal-oxide semiconductor iXsenic S (from Evonik Industries) is used for fabrication of n-type transistors. Devices are post-annealed at 250°C and have an electron mobility of 2 cm2/Vs. Meanwhile, the p-type transistors are based on pentacene, an organic semiconductor with representative performance. After completion of the process flow, they have a hole mobility of 0.15 cm2/Vs17. The gate dielectric is 100 nm of high-k Al2O3 deposited by atomic layer deposition (ALD). The source-drain contacts are formed by patterning of a bilayer of 2 nm Ti and 30 nm Au by means of photolithography, after the definition of the metal-oxide semiconductor. The oxide n-type transistors thus have a bottom-gate top-contact geometry with Ti as injecting metal. The organic semiconductor is deposited after this source-drain contact formation, such that the p-type organic transistors have a bottom-gate, bottom source-drain contact geometry and use Au as hole-injecting metal. This is depicted in the cross-section in Fig. 1(d). Isolation of the individual organic islands is achieved by an integrated shadow mask of 2 µm SU-8 2002. In this work, the microprocessor was processed on a rigid substrate, however the technology is compatible with polyimide films for fully flexible thin-film circuits, as demonstrated by Rockelé et al.12.

Figure 1
figure 1

Typical transfer characteristics of (a) organic p-type TFT and (b) oxide n-type TFT, measured in saturation regime and (c) both output characteristics with a step of 2.5V. (d) Cross-section of the organic/oxide TFT stack, revealing top-source drain n-type TFT and bottom source-drain p-type TFT. (e) Voltage transfer characteristics of the complementary organic/oxide inverters, with a 3:1 p:n ratio, as indicated in the inset and (f) extracted noise margin and gain of the inverters.

The transfer characteristics of these p-TFTs and n-TFTs are shown in Fig. 1 (a) and (b). Both devices have a channel width (W) of 140 µm and length (L) of 5 µm. Due to the strong depletion of the p-TFTs, a p:n ratio for the logics gates of 3:1 was chosen, resulting in optimal current matching between p- and n-type device. The minimal W/L of the n-TFT is 50/5 µm/µm. Fig. 1 (e) shows a typical inverter transfer curve at different supply voltages. The extracted gain and noise margins are plotted in Fig. 1 (f), exhibiting a maximum noise margin of 1.54 V and gain of 2.96 at 10 V supply voltage, sufficiently high to enable robust logic gates. These figure of merits of the technology can be even improved by reducing the contact resistance of the organic p-TFT, as indicated in the TFT output characteristics displayed in Fig. 1 (c). The critical dimensions used for minimal line widths, overlay accuracy, line separation etc. are defined by the photolithography steps in the fabrication process and are chosen to be 5 µm. This is within range of future printing processes18.

Computer architecture

Our thin-film computer design comprises a datapath and dedicated control unit with a few internal registers to cache data. We use a Harvard architecture, where the data bus and program bus are not shared (as shown in Fig. 2 (a)). In this architecture, data and program code are stored in physically separate memories.

Figure 2
figure 2

(a) Architecture of our thin-film computer, based on the Harvard architecture. (b) Die picture of the thin-film processor core chip and (c) block diagram of the 8-bit processor core chip.

The internal data memory or cache registers are 8-bit registers based on D-flip flops. We use three registers and a hardwired decimal “1” to ease the increment (INC) and decrement (DEC) operation. The program memory is based on print-programmable, read-only memory (P2ROM), which can be configured using inkjet printing. The datapath, or arithmetic and logical unit (ALU), is an 8-bit ALU designed as a conventional 8-bit ripple carry adder/subtractor. It can execute addition, subtraction and bit shift commands.

We realized the computer as two chips: a processor core chip that combines the datapath and data memory and a separate print-programmable instruction generator chip for the program memory. The processor core chip also includes an 8-bit accumulator to store results from the ALU. Off-chip communication is performed via an 8-bit output register and an input multiplexer which can store the data in the cache memory. At the core of the ripple carry adder are eight full adder cells based on the mirror adder implementation19. This topology was selected for speed and to reduce transistor count. Only 24 transistors are required to calculate the sum of two 1-bit numbers. Compared to the conventional combinatorial approach of a full adder cell, this minimal transistor count implementation improves area consumption and hence hard yield. The datapath implementation also uses sufficient buffer cells to optimize the critical path delay. The full architecture of the processor core chip is shown in Fig. 2 (c).

The complementary hybrid organic/metal-oxide technology enables robust logic gates allowing the use of a quasi-complete logic gate cell library. The digital standard cell library consists of 8 standard cells including inverters, NAND and NOR gates, inverting buffer cells and a mirror adder cell. In order to avoid glitches and assure good setup and hold times, we implement the internal registers as standard master-slave flipflops. The typical standard cell height is 451.5 µm and the mirror adder cell is 1315 µm wide. The processor core chip employs 3504 complementary TFTs and measures 1.20 × 1.88 cm2. A photograph of this chip is shown in Fig. 2 (b).

Performance of the processor core chip

To evaluate the behavior of the processor core chip, a PIC microcontroller and a testbench containing all instructions were used. The microcontroller was programmed to provide instructions, input data and clock signals to the processor core chip. It also generated the expected outputs for direct comparison with the outputs of the processor core chip. A typical measurement for a supply voltage of 12 V is shown in Fig. 3 (b). The maximum clock frequency is 2.1 kHz, which can be translated to 2.1 kIPS (instructions per second) for our architecture. This is 52 times faster than our p-type only thin-film processor10 and outperforms recent carbon nanotubes computers20. Fig. 3 (a) also shows the number of instructions per second at different supply voltages. The chip is fully operational from 6.5 V onwards. This low supply voltage is a key merit of the complementary organic/oxide technology. As the power consumption scales with the square of the supply voltage, the downscaling of the supply voltage from 10 V in the p-type only architecture to 6.5 V in the more robust complementary architecture results in a very substantial gain in power consumption. Furthermore, even at equal supply voltage, a complementary architecture has a much smaller static power consumption than a unipolar one as a consequence of reduced static leakage current. The static power consumption in our case can still be further improved by shifting the on-set voltage of the p-type transistor towards 0 V.

Figure 3
figure 3

(a) The measured instructions per second of the processor core chip for different supply voltages. (b) A zoom of the general test bench examining the processor core at a supply voltage of 12 V.

Print-programmable instruction generator

The block diagram of the print-programmable instruction generator chip is shown in Fig. 4 (a). The instruction generator features a 4-bit program counter which uses the same clock as the data path. A 16-line address decoder selects a row in the printable write-once-read-many times (WORM) memory and a 9-bit register subsequently stores the selected instruction and provides it on the program bus. Instructions can be programmed in the memory via a simple inkjet printing step. The layout of this configurable instruction generator chip is depicted in Fig. 4 (b). Fig. 4 (c) shows an expanded view of the layout. The area for the printed ink drop is 50 µm × 55 µm. The lines to be shorted by the ink drop are interdigitated, with 5 micron spacing. This design guarantees 100% reliable shorting, even when inkjet equipment of modest accuracy is used.

Figure 4
figure 4

(a) The block diagram of the print-programmable instruction generator, (b) die picture of the instruction generator and (c) a micrograph image of the inkjet printed area of the instruction generator. 3 bits have been printed with Ag ink to logic zero's.

The printable WORM memory block is implemented as a 16-input unipolar NOR gate as shown in Fig. 5 (a). The load transistor is connected as depletion-load. In other words, the gate is connected to its source. This unipolar topology was selected because the TFTs exhibit a normally-on behavior. Up to 16 drive transistors (Sel0, Sel1, ...) can be connected to the NOR gate by inkjet printing a droplet in the well. Fig. 5 (b) plots the simulated voltage transfer curves of the NOR gate for varying numbers of inputs. These simulations were performed by sweeping the selected drive transistor, in this case Sel0 ( = Vin), between 0 V and 10 V while biasing all remaining unselected transistors with 0 V VGS.

Figure 5
figure 5

(a) The transistor schematic of the printable WORM memory. The voltage transfer characteristics and the extracted noise margins are plotted for (b) a 1-16 input unipolar NOR gate with only 1 load transistor and (c) a 16 input NOR-gate with increased number of load transistors 1–6.

Increasing the number of drive transistors decreases the noise margin, due to the additional leakage from the unselected TFTs at a gate-source voltage bias of 0 V. To compensate for this, additional load transistors can be added through inkjet printed connections. The simulated voltage transfer characteristics of the 16-input NOR gate with increasing number of load transistors are shown in Fig. 5 (c). The noise margin can be recovered when more load transistors are added. For this configuration, the optimal balance is observed to be 5 load TFTs for 16 inputs. This optimum ratio changes as the threshold voltage of the oxide TFTs varies, for example due to process corners or variability. Another optimum is found when the bias voltage at the gates of the unselected lines is larger than 0 V, in case the complementary logic is not fully rail-to-rail. Even though the printable WORM memory is designed in a unipolar technology, it still offers good robustness due to the ability to add multiple load transistors in post-processing.

In total, the print-programmable instruction generator employs 403 organic p-TFTs and 412 oxide n-TFTs in its unprogrammed state and measures 9.0 mm × 6.9 mm. Printing the connections for the drive and load transistors in the unipolar oxide NOR gate can add up to 189 oxide n-TFTs. This number will differ according to the desired program and routines.

The thin-film P2ROM chip can store 16 lines of 9-bit instructions. As an example, we have implemented an exponential running averager algorithm configured by inkjet printing the P2ROM chip. The algorithm is executed twice in one cycle and the output register is enabled after the second loop, prior to the second LSR function. This increases the accuracy with one bit. The algorithm is executed by 12 subsequent instructions. The remaining 4 available commands are configured as NOOP (no operation). Fig. 6 (a) plots the measured outputs of the P2ROM chip at a supply voltage of 10 V and clock speed of 650 Hz. The corresponding instructions are shown in the graph. Configured as a first order averager, the P2ROM chip employs 852 TFTs. Fig. 6 (b) shows the measurement results for the full computer when the processor core and P2ROM chips are combined. The input stream changes from 0 (000000, 6bit) to 7 (000111, 6bit). The 7-bit accurate outputs are subsequently 0, 7, C and E, in hexadecimal numbers. The graph also details the instruction from the P2ROM chip which enables the output register.

Figure 6
figure 6

(a) Measured signals of the P2ROM instruction generator when configured (printed) to execute the running averager algorithm. It consists of 12 instructions and 4 NOOP commands. (b) Measured signals of both the P2ROM and processor core chips while executing a running averager algorithm. The pulses in the top part of the figure correspond to the command “store in output register”.

Discussion

In this paper, we propose a concept for a microprocessor that can exploit the power of Wright's Law to reach the economical scale required for ubiquitous computing. It uses a thin-film transistor technology that is compatible with plastic substrate and can be programmed using a post-manufacturing digital print step. Generic versions of such processor could be produced in large volumes and then tailored to specific applications at the user's site.

The use of a complementary transistor technology with metal-oxide n-type transistors and organic p-type transistors and an architecture with a minimum transistor count enable robust hardware. Programming the instruction generator is achieved by inkjet printing connections between drive transistors and a NOR gate. Load transistors can be added in a similar way to balance the load.

In the current state of the technology and with a minimum transistor size of 5 µm, the processor is shown to run at up to 2100 instructions per seconds. We implemented a print-programmable memory with 16 lines of codes, each line containing a 9-bit instruction set. These specific performance metrics are expected to improve further as the performance of organic and oxide semiconductors continues to evolve. However, as the concept relies on Wright's Law, its economic evolution does not depend on the scaling of transistor size and so can speed the emergence of ubiquitous computing and the Internet of Things.