Parallel Fourier ptychographic microscopy for high-throughput screening with 96 cameras (96 Eyes)

We report the implementation of a parallel microscopy system (96 Eyes) that is capable of simultaneous imaging of all wells on a 96-well plate. The optical system consists of 96 microscopy units, where each unit is made out of a four element objective, made through a molded injection process, and a low cost CMOS camera chip. By illuminating the sample with angle varying light and applying Fourier Ptychography, we can improve the effective brightfield imaging numerical aperture of the objectives from 0.23 to 0.3, and extend the depth of field from ±5 μm to ±15 μm. The use of Fourier Ptychography additionally allows us to computationally correct the objectives’ aberrations out of the rendered images, and provides us with the ability to render phase images. The 96 Eyes acquires raw data at a rate of 0.7 frame per second (all wells) and the data are processed with 4 cores of graphical processing units (GPUs; GK210, Nvidia Tesla K80, USA). The system is also capable of fluorescence imaging (excitation = 465 nm, emission = 510 nm) at the native resolution of the objectives. We demonstrate the capability of this system by imaging S1P1-eGFP-Human bone osteosarcoma epithelial (U2OS) cells.

each unique illumination pa ern, the incoming image data of all image sensors are simultaneously sorted, indexed, and segmented online by the le system. e individual chunks of the hypercube are then wri en to the hard drive in a linear layout, which facilitates the image segment loading and restoration method in the next step. In short, by sorting and segmenting the incoming image data on the y, it helps saving the precious data bandwidth.
Enabled by the data alignment of the image chunks and the identical illumination pa ern across all image sensors, multiple image segments can be restored by the graphical processor in a massively parallel manner. e corresponding image segments for all image sensors (i.e. at identical locations in the image FOV) can be processed simultaneously as they possess an identical set of illumination conditions. is substantially reduces the GPU idling time because a chunk in the data "hypercube" only requires one set of function calls and data read/write instead of 96 (for 96 image sensors), reducing the processing overhead.
If only a single 96-well plate is imaged and analyzed, the back-to-back data acquisition (one layer of phase image plus 10 z-layers of uorescence image) and processing pipeline requires (90 + 30 + 120) = 240 s ≈ 4 min to complete. However, if multiple plates are involved in one batch of study, the acquisition stages and the reconstruction stages can be performed simultaneously (Fig. S1), reducing the overall imaging time to around 120 second per plate.

LED position calibration
Fourier ptychographic algorithm requires accurate illumination angles from di erent LEDs in order to register the raw images in the Fourier domain. Because of the presence of Supplementary Figure S1. Parallel FPM acquisition and reconstruction process.(a) Timeline of plate image acquisition and reconstruction processes two consecutive plates. Since the reconstruction process can be done o ine, the second plate can be loaded and imaged while the workstation is reconstructing the images of the rst plate. (b) and (c) Four (4) high-throughput frame grabbers streams raw images to the internal memory bu ers of the workstation through the high speed links. (d) Front view of the 96 Eyes hardware. liquid meniscus in the 96-well culture plate, the LEDs appears to be much closer to the object than they are physically located, altering the incident angles of the light rays on the object. Here, we present the ray tracing method to estimate the incident angles due to refraction.
First, we consider the case when the liquid interface is devoid of meniscus. Let us denote the vertical distance between the object and the light source by h a , and the liquid medium (refractive index= n) height above the object by h b . For a light ray from a single LED passing through the at air-to-liquid interface [inset of Supp. Fig. S2], the angle of illumination on the sample θ is governed by Snell's law (S1) e close form solution of sin θ exists, but it involves nd-ing the root of a fourth order polynomial derived from the above equations. Instead, we numerically solve for δ with the following root-nding algorithm which guarantees to converge for |x a − x b | < h a − h b . e illumination angle can be now be evaluated by substituting δ K into Eq. S2 for a large number K.
Next, we analyze the changes to the optical light path in the presence of meniscus. e meniscus introduces a tilted air-to-liquid interface at an angle α(x b ), which is a function of the lateral position x b from the center of the well on the culture plate. Trying to incorporate this variable to the raytracing model will add unnecessary complexity to Eq. S1. erefore, we linearize the meniscus e ect by introducing a Supplementary Figure S2. Detailed illustration of the parallel illumination scheme of 96 Eyes. e source-to-source separation is chosen to maximize the e ective acquisition rate, as well as avoiding interference. is is made possible by making sure that only one single LED is responsible for bright eld illumination for any camera and for any time instance of ptychographic image acquisition. Inset: de nition of symbols for LED position calibration.
parallax shi x p , with for some constant c > 0. e meniscus-compensated illumination angle θ is now approximated by modifying Eq. S4 with Speed improvement factor and the design criteria of the parallel illumination scheme Without parallel illumination, only a single camera is active at any instance of image acquisition. Let f be the e ective frame rate of a single camera. For the 96-well plate, the total acquisition time required is equal to f −1 × number of cameras × number of illumination = 4704 f −1 . Parallel illumination scheme instead utilizes a 2D la ice illumination pa ern with a source-to-source separation of m LEDs with a LED-to-LED separation of ∆x. e total number of illumination is now reduced to m 2 . Hence, the e ective acquisition time is equal to f −1 × m 2 × number of cameras × (number of frame grabber cards) −1 = 24m 2 f −1 , for four frame grabber cards. e speed improvement is given as In the following paragraph, we will compute the lower limit of value m. For all time instances, we only allow one LED to ful ll the bright eld illumination condition with respect to the object (Fig. S2). Let us denote the vertical distance between the object and the light source by h a , and the liquid medium (refractive index= n) height of h b . e following conditions have to be ful lled in addition to Eqs. (S1) and (S2): for a given numerical aperture (NA) of the microscope objective, and some integer M. Here a power of two is preferred because it simpli es the electronic design of the LED matrix. For our system with h a = 33 mm, h b = 3 mm and ∆x = 3 mm, we picked m = 8. is implies a conservative speed improvement of at least 3 times. Compared to mechanical scanning system which has a much lower e ective frame rate f , the speed improvement can be up to 8 times compared to commercially available instruments.
Modification to the Fourier ptychography phase retrieval algorithm Forward model for our imaging system Let us denote a segment of the object to be reconstructed by u ∈ C n , a two-dimensional image with n 1/2 × n 1/2 pixels. We also denote the j-th illuminated low-resolution intensity image of the object by I j ∈ R m + , with m 1/2 × m 1/2 pixels (u and I j are both wri en as a vector by a lexicographical order). It can be shown that I j = |F H diag(p)Q j F u| 2 . e pupil function p ∈ C m can be considered as the circular aperture at the back aperture plane of the imaging system. Binary matrix Q j ∈ R m×n depicts the downsampling of the object u by cropping a region of m pixels in Fourier space corresponding to the j-th position of the light source. What we measure is a stack of low-resolution intensity images . . , k, where the hyperscript H denotes a Hermitian conjugate. e operation diag(a)b represents the element-by-element multiplication 2 between two vectors a, b.
In reality, the measured sequence of low-resolution images are corrupted by (i) the ambient light level I b > 0, (ii) angular dependency of LED intensity w j > 0, (iii) background interference of suspended particulates in the liquid φ dust ∈ R n ; and (iv) dark current and readout noise of the sensor n j ∈ R m . erefore, we modi ed the forward model to

Minimizer with partial spatial coherence constraint
Since the lens aberration is almost completely unknown, one has to solve a blind ptychographic phase retrieval problem with an amplitude constraint 2 Because of the limited number of low-resolution images (=21) in the measurement, the estimated pupil function p est cannot be e ciently separated from the estimated object F u est in the Fourier domain. is shortcoming is compounded by the fact that the target biological specimen is a weak phase object, where most of the information in the Fourier domain is concentrated in that of the un-di racted light 3,4 . To suppress the crosstalk between the two, we utilize a nite number (L > 0) of overlapping segments of the object u and the corresponding local pupil p to enforce the partial spatial coherence constraint. at is, the above minimizer is further subject to for a "global" average pupil function ∑ p/L = (1/L) ∑ L =1 p and tolerance value tol > 0.
Background estimation To recover the average level of the ambient light level I b , we capture the images when all light sources are switched o . e value of I b for a particular CMOS sensor is then set to be the pixel average of the captured dark image.

Separation of the non-uniform illumination profile of
LEDs and the pupil function From Eq. (S11), it is known that the pupil function p cannot be e ciently separated from the factor w j . erefore, the factor w j is optimized only for the rst three iterations 5 , while the recovery of p is postponed until the fourth iteration.

Separation of cells and background interference
e out-of-focus suspended particulates show up as blurred shadows in the sequence of low-resolution images [Supp. Fig. S3]. We utilize this property to estimate φ dust by applying a Gaussian blur of the recovered object phase. e morphological information of the cells can be extracted from the phase di erence between the recovered eld u est and e iφ dust .
It is noted that there are existing algorithms that specializes in separation of the object from out-of-focus noise 6 . ii. Object update: solve u k+1 = arg min u f j (w k j , p k , u). iii. Weighting update: when k ≤ 3N, solve w k+1 j = arg min w f j (w, p k , u k+1 ). Otherwise, w k+1 j = w k j . iv. Pupil update: when k > 3N, solve p k+1 = arg min p f j (w k+1 j , p, u k+1 ). Otherwise, p k+1 = p k .

4/10
Choice of adaptive step size for pupil recovery While the object update in Step 4(b)ii of Algorithm 1 is solved by the time-honored Gaussian-Newton algorithm 7 , the pupil update in Step 4(b)iv of Algorithm 1 is instead solved by the gradient descent method 8 , with where s k = F u k and g j (w k , p k , s k ) = diag(p k w k j )Q j s k for a step size of γ ∈ R m + . Because of the choice of parallel illumination in our 96 Eyes system, all of the captured data are bright eld images. For a weak phase object, most of the incoming light rays remains un-di racted, that results in a strong peak in the Fourier domain 3,4 . If the step size γ is a constant, the recovered pupil will be corrupted with a constellation like artifact [ Fig. S4(a)]. erefore, we heuristically where s ∞ denotes the maximum amplitude of the complexvalued signal s. E ectively, the step size γ normalizes the value of diag(s k ). e non-dimensional number β ∈ [0, 1] adjusts the relative strength of normalization of signals k . When β = 0, the Fourier domain of the object s k = F u k will be completely normalized. In the main text, the value is set to be β = 10 −6 , around twice the order-of-magnitude of an 8bit image. is helps smooth the pupil function [ Fig. S4(b)] as well as reduce the reconstruction residual [ Fig. S4(c)], de ned as (S15) Supplementary Figure S4. Adaptive step size improves pupil function recovery. (a) Recovered phase component of pupil function with a constant step size, i.e. at β = 1, compared to (b) at β = 10 −6 . Symbols (ρ x , ρ y ) are the local coordinates of the pupil function. (c) Comparison of reconstruction residuals by applying phase retrieval to all segments (L = 80) of the cell sample captured from one camera. With our method, the residual reduces by around one-third (a er k/N = 200 iterations) with a much smaller spread, demonstrating a more robust object and pupil co-recovery.

5/10
Improving the dynamic range of fluorescence images with two-stage digital averaging Because of the limited photo-sensitivity and bit depth of our choice of consumer-grade CMOS sensor, we adopted the digital averaging approach to enhance the signal-to-noise ratio of the sensor. e digital averaging technique is also known as dithering in audio digitization community [9][10][11] , where the band-limited signal of interest is mixed with an arti cial out-of-band noise on the input side of the analog-to-digital conversion circuit to reduce the quantization error. Our technique is also very similar to hal oning of digital images 12 , where an arti cial pepper noise is added to simulate grayscale images out of a black-and-white display device. In contrast, the noise source for our CMOS sensors in the 96 Eyes system cannot be precisely controlled. Notably, similar digital averaging approaches has been proposed before for radiometry studies 13 . However, the underlying principle is poorly understood. Here, we provide a theoretical framework to o er to explain the dynamic range improvement of our uorescence images with digital averaging.
Forward model For a uorophore concentration c(x, y) illuminated by an uniform intensity I 0 , the imaging system in the uorescence channel is empirically modeled as I(x, y, t) = g amp ηc(x, y)I 0 + g amp n dark (x, y, t) + n amp (t) where the non-dimensional factor η is a product of (i) quantum e ciency of the uorophore, (ii) photon collection eciency of the microscope objectives, and (iii) quantum eciency of the photosensing circuit in the CMOS sensor. e ampli er with gain g amp > 0 naturally comes with an additive power-line noise n amp (t). e round-o operator · denotes the quantization process, which in turn can be modeled as an additive quantization error (x, y, t) ∈ [−0.5, +0.5).
Here, the photon noise is assumed to be negligible compared to dark current noise.
In rolling shu er mode, rows of pixels are read out at a traversal rate of v, so the ampli er noise is mapped to the vertical axis of the j-th image I j (x, y). I j (x, y) = g amp ηc(x, y)I 0 + n amp (y, j)+ g amp n dark (x, y, j) + j (x, y), (S17) where n dark (x, y, j) = n dark (x, y, t)| t=t j +y/v and n amp (y, j) = n amp (t = y/v + jH/v) for H rows of pixels in the CMOS sensor.
Suppressing both dark current noise and quantization error with digital averaging Consumer-grade CMOS sensors, designed for daylight applications, have a much higher quantization error than the dark current noise. For a ampli er gain value g amp at unity, the dark current noise component is typically always rounded-o to zero. In other words, direct digital averaging of multiple frames I j at unity gain usually do not result in reduction of quantization error. However, typical biological specimen is known to have a low uorophore concentration. Concerns about photobleaching also limit the illumination intensity I 0 . erefore, the ampli er gain g amp must be boosted su ciently to utilize the full quantization range of the CMOS sensor. e dark current noise is known to possess a Gaussian distribution 14 , i.e. n dark (t) ∼ N (0, σ 2 dark ) (symbols x, y are omi ed for clarity). is also applies to the power-line noise, where n amp (t) ∼ N (0, σ 2 amp ). For a su cient gain with gσ dark 0.5, the probability density function of I j is given as dI if a is an integer, 0 otherwise. (S18) e scaling factor A is de ned such that ∞ −∞ P(I j = a)da = 1. By averaging a su cient number of frames, i.e.
Supplementary Figure S5. Improving the dynamic range of uorescence images with digital averaging.
(b) Single frame at gain g amp = 8; (c) averaging 10 frames at gain g amp = 8; Suppressing the band-like pa ern noise for (c) a single frame, and (d) the digital average of 10 frames. All images are contrast-stretched to highlight the background noise and artifacts. Scale bar: 20 µm.

6/10
both the dark noise and the quantization error can be reduced. For instance, it can be shown that lim N→∞ I 1 (x, y) = +∞ −∞ a P(I j = a) da = g amp ηc(x, y)I 0 , which is independent of both noise terms.
Suppressing the power-line noise For our digitalaveraged uorescence signal captured by the 96 Eyes system, we can still observe the presence of row-wise intensity uctuation [Supp. Fig. S5(b)] originated from the power-line noise above the quantization level, modeled as I 1 (x, y) ≈ g amp ηc(x, y)I 0 + n amp (y). is is caused by the power-line noise in the 96-in-1 camera board. e size constraint of the 96-well culture plate limits the available real estate on the printed-circuit board for electronic lters, especially the decoupling capacitors. To further suppress such row-wise uctuations, we apply the same digital averaging technique to isolate it from the uorescence signal c(x, y). Here, we assume that the uorophore concentration possesses a Gaussian distribution c(x, y) ∼ N (µ c , σ 2 c ) for µ c > 0, σ c µ c . By taking a row-wise average of pixels of I 1 (x, y), we have I 1 (x i , y) ≈ g amp ηµ c I 0 + n amp (y), (S20) for W pixels along individual rows of the image. Since we only care about the morphology of the biological cells stained with the uorophore, the average uorophore concentration µ c can be eliminated as well. Hence, the recovered uorescence image is given as c est (x, y) := c(x, y) − µ c ≈ I 1 (x, y) − I 2 (y) g amp η I 0 . (S21) In practice, the signal c(x, y) does not t well with the Gaussian process assumption. e row-wise averaging operation Eq. S20 is replaced with row-wise median operation to reduce sensitivity to extreme values in c(x, y).