Generalized multi-channel scheme for secure image encryption

The ability of metamaterials to manipulate optical waves in both the spatial and spectral domains has provided new opportunities for image encoding. Combined with the recent advances in hyperspectral imaging, this suggests exciting new possibilities for the development of secure communication systems. While traditional image encryption approaches perform a 1-to-1 transformation on a plain image to form a cipher image, we propose a 1-to-n transformation scheme. Plain image data is dispersed across n seemingly random cipher images, each transmitted on a separate spectral channel. We show that the size of our key space increases as a double exponential with the number of channels used, ensuring security against both brute-force attacks and more sophisticated attacks based on statistical sampling. Moreover, our multichannel scheme can be cascaded with a traditional 1-to-1 transformation scheme, effectively squaring the size of the key space. Our results suggest exciting new possibilities for secure transmission in multi-wavelength imaging channels.

www.nature.com/scientificreports/ Figure 1. Potential implementation of our encryption scheme. A controllable metasurface (or alternatively, a spatial-light modulator or dynamic hologram) is used to imprint spatial and spectral information onto a broadband light source. To decrypt the image, a recipient must use a multispectral imager to record individual wavelength channels, or cipher images and apply a decryption key. A CMOS array will, for example, provide three channels (RGB), while advanced multispectral or hyperspectral imagers can increase the number of channels to 128 43 .

Figure 2.
Encryption and decryption algorithms. A 9 × 9 pixel binary image of the letter 'D' is input to the encryption algorithm that uses a key to convert it into a set of 5 seemingly random cipher images. The decryption algorithm uses the same key to retrieve the letter 'D' image. www.nature.com/scientificreports/ decryption as encryption, the output image is the same as the plain image. In general, the number of possible keys (size of the key space) should be large enough that an attacker is unlikely to guess the correct key at random.
Our key space is implicitly defined by a set of mathematical decryption functions, where each function is written as a sum of products (SOP) of cipher images. Decryption is performed by operating the SOP function on the cipher images. For images C 1 through C n , the output image is a sum of n s terms, where each term is a product of n p non-repeating cipher images. For instance, consider the SOP function shown in the green box in Fig. 2, {C 1 C 4 C 5 + C 1 C 3 C 4 + C 2 C 3 C 5 + C 2 C 3 C 4 + C 2 C 4 C 5 }. All operations are performed in a pixel-wise fashion. In this case, n = 5, n p = 3, and n s = 5. The corresponding key is a sequence of integers that represents the SOP function. The integer before the colon indicates the number of cipher images (here, n = 5), and the integers following the colon represent the product terms. At a given pixel location, a product term contributes a value of 1 to the output image if and only if all the cipher images in the product term have a 1 at that pixel location. For example, the product term C 1 C 4 C 5 at the pixel location (4,7) indicated by the green box in Fig. 2 produces the white pixel indicated in the output image.
The first step of the encryption algorithm can be understood as the converse of decryption (blue box in Fig. 2). For ease of reference, we call the five product terms in the example 'key triplets' . For n = 5, one can have at most 10 triplets of non-repeating integers (excluding permutations). The remaining five triplets that do not appear in the key are referred to as 'non-key triplets' . Each white pixel (value 1) of the plain image is randomly assigned to a key triplet, and a 1 is stored at the same location in its constituent cipher images. For example, the white pixel highlighted by the green box in the letter 'D' image is assigned to the triplet (1,4,5), so that cipher images C 1 , C 4 and C 5 each have a 1 at their (4,7) pixel locations. This approach uniformly divides the '1' pixels of a plain image among its cipher images, visually disguising the information in the plain image.
The second step of the encryption algorithm introduces "red herring" pixels in each cipher image. This is accomplished by assigning each black pixel (value 0) of the plain image to a randomly-chosen non-key triplet and storing a 1 at the same location in its constituent cipher images. For example, C 1 , C 2 and C 5 each have a 1 at their (6,6) pixel locations. The triplet (1,2,5) does not appear in the key, and the plain image has a 0 at this location (highlighted by the red box). The red herring pixels prevent an attacker from deducing the non-zero pixels of the plain image simply by noting the locations of any non-zero pixels in the cipher images. Moreover, since all pixels of the plain image are mapped to either a key triplet or a non-key triplet, a simple sum of all cipher images yields a uniform image, devoid of information. This point is illustrated with an example in the next section.
We note that repeated applications of the encryption algorithm to the same plain image can yield a different set of cipher images, even when the key is held fixed. This is due to the element of randomness in the algorithm that occurs when assigning pixels to key and non-key product terms.
Enabling security against a brute-force attack. For the encryption algorithm to be resistant to a brute-force attack, the key space should be large enough that an attacker cannot manually test all the possible keys. Given the number of cipher images, we can determine the total number of possible keys, N keys by combinatorics. This calculation is presented in the Methods section. Figure 3a shows a plot of log 2 (N keys ) versus n. For reference, the maximum key space size of 2 256 for practical AES symmetric encryption is shown by the solid red line. It can be observed that the total number of keys increases double-exponentially with n and becomes nearly equal to the AES limit for n = 10. This indicates that encrypting a plain image into more than 10 cipher images would render a brute force attack infeasible.
We denote the number of keys for fixed n, n p and n s as n keys . Figure 3b shows the variation of log 2 (n keys ) with n p and n s for n = 10. Here n s ranges from 1 to C(10,n p ) and n p ranges from 1 to 10. It can be observed that n keys is maximum for n p = | 10/2 | = 5 and n s = | C(10, 5)/2 | = 126 , where | x | denotes the greatest integer less than or equal to x. In general, one should choose n p = | n/2 | and n s = | C(n, n p )/2 | , which maximizes the size of the key space (see Fig. 4 for a numerical example with n = 10). www.nature.com/scientificreports/ To illustrate the security of the optimized system against brute-force attacks, consider a 50 × 40 pixel binary image of the letter ' A' (Fig. 4a) that has been encrypted using a key with n = 10, n p = 5 and n s = 126. The resulting cipher images are displayed in Fig. 4b. As can be seen, simple visual inspection of the cipher images does not reveal any meaningful information about the plain image. Moreover, a sum of all cipher images results in a uniform intensity image with all pixel values equal to 5 (Fig. 4c).
We assume that the attacker has access to the cipher images and the system parameters that were used for encryption (n, n p , and n s ). We further assume that the attacker has knowledge of the encryption and decryption algorithms but does not have access to the encryption key. So, he resorts to randomly trying out a small number of keys with n = 10, n p = 5 and n s = 126 and visually inspecting the output images to guess the encrypted plain image. Figure 4d shows the output images for three such keys, none of them being the original key used for encryption. Here again, it is difficult to gather any information about the plain image by simply looking at the output images. Therefore, one would expect that an attacker relying solely on visual inspection would have a very low probability of recovering a plain image encrypted using our scheme.
Enabling security against more sophisticated attacks. Next, we evaluate the security of our encryption scheme in a scenario where an attack more sophisticated than simple visual inspection is used to recover a plain image. In the example of Fig. 4, none of the incorrect keys tried yielded an output image that obviously resembled the plain image (Fig. 4d). However, one might ask whether there is more subtle information contained in the output images obtained from incorrect keys. A more sophisticated attacker might therefore go beyond visual inspection to calculate a similarity score with a known set of possible plain images.
We consider the problem of transmitting messages written using a four-letter alphabet. The alphabet comprises of 50 × 40 pixel binary images of the letters ' A' , 'B' , 'C' and 'D' . Let's assume that the letter ' A' needs to be transmitted and has been encrypted using a key with n = 10, n p = 5 and n s = 126. An attacker intercepts the transmission channel and gains access to the cipher images. We assume that the attacker is familiar with the encryption and decryption algorithms but does not have access to the key that was used. Since there are ten cipher images, he guesses that n = 10.
As discussed previously, the size of the key space for n = 10 is large enough to make it infeasible for the attacker to try out all possible keys. To get around this problem, the attacker constructs a randomly-selected sample set of 1000 keys for each n p and n s . He uses these keys to generate 1000 output images and computes their mean similarity score, S mean with respect to the four letters. The definition of similarity score is presented in the Methods section. The score S mean with respect to the letters ' A' through 'D' is displayed as a function of n p and n s in Fig. 5a through d. We note that in a practical situation, the attacker would stop traversing the key space as soon as he hits the right key. We assume that this does not happen in this situation as the probability of finding the right key is very low (~ 1/10 74 ).
From Fig. 5, one can observe that for n p ≤ 5, the mean scores with respect to letter ' A' are in general higher than those for all the other letters. In particular, S mean for letter ' A' takes its maximum value close to n p = 5 and n s = 126, which are the parameters used for encryption. Therefore, the attacker will be able to guess the encryption parameters and the encrypted letter by simply looking at the mean score colormaps for the four letters. www.nature.com/scientificreports/ In addition, it can be observed that the mean scores for all keys with n p > 5 are equal to zero. Since the letter ' A' image was encrypted using a key with n p = 5 and n s = 126, each of its '1' pixels is stored in one of the 126 groups of five channels. This implies that only five of the ten cipher images can store a '1' at any given pixel location. Multiplying more than five cipher images would result in a complete cancelation of pixel values and generate an image with all pixels equal to 0. This happens when evaluating the decryption function for keys with n p > 5. Decryption using such keys results in all-zero images that have a similarity score of 0 with respect to all the four letters.
In order to make it difficult for the attacker to identify the encrypted letter, we must decrease the difference in S mean calculated with respect to the four letters. One way to accomplish this is to increase n. Figure 6a presents the variation of S mean with n for letter ' A' encrypted using n p = | n/2 | and n s = | C(n, n p )/2 | . Since S mean tends to be higher close to the encryption parameters, we only present its value at n p = | n/2 | and n s = | C(n, n p )/2 | for each n. The solid lines represent the average S mean computed over 1000 trials with 1000 samples each, while the colored bands represent the corresponding error bounds.
One can notice that for small values of n, the scores with respect to ' A' are significantly larger than those with respect to the other letters. As n increases, S mean with respect to ' A' reduces while it remains nearly constant for the other letters. For n ≥ 14, the error bounds on S mean for ' A' start to overlap with those for 'C' and 'D' . This implies that encrypting ' A' into 14 or more cipher images will make it difficult for the attacker to identify it on the basis of similarity scores. One may note that the value of n needed to ensure security in this case is higher than that required to prevent a brute-force attack (n = 10). In general, the number of cipher images (n) required to defend against a sophisticated attacker who uses a randomly-selected key set is larger than for a brute-force attacker. However, provided n is chosen large enough, the system will remain secure.
Even though the analysis presented thus far is for encrypted letter ' A' , the conclusion remains the same for all letters in the alphabet. To validate this, we present the variation of S mean with n for encrypted letter 'B' in Fig. 6b. Here again, the scores with respect to letter 'B' are higher for small values of n and reduce as n increases. One can also note that the scores with respect to letters 'C' and 'D' in Fig. 6b are close to those with respect to letter 'B' due to the similarity in the shapes of these three letters. For n ≥ 13, the error bounds on S mean for 'B' start to overlap with those for the other letters. Therefore, from Fig. 6a,b, one can conclude that choosing an n ≥ 14 makes it difficult for the attacker to identify messages written using a combination of ' A' and 'B' . A similar calculation can be done for the letters 'C' and 'D' to determine a lower bound on n for the entire system. Similar conclusions are obtained for the case in which an attacker decides to use maximum scores instead of mean scores. In this

Experimental demonstration.
To illustrate the utility of our encryption scheme, we conduct a table-top demonstration using a color display and a color camera (Fig. 7a). While a true n-channel encryption scheme (as described above) requires independent control over transmission and detection at n distinct wavelengths,   www.nature.com/scientificreports/ we can emulate the full behavior using a simple RGB system (Fig. 7b) with calibration (Fig. 7c). We demonstrate our encryption algorithm within a cascaded encryption scheme (see 'Methods' for details). A plain image is first encrypted using a standard scheme to produce an apparently random image. Using the key, the white and black pixels are randomly assigned to the key triplet and non-key triplet colors, respectively (Fig. 7d) to produce the display image. We capture the display image on the camera and use the color lookup table to recover the output image, as shown. The output image shows high fidelity with respect to the original plain image with a similarity score of 0.98. This simple demonstration shows the robustness of our encryption system to noise.

Discussion
The demonstration of Fig. 7 illustrates how our multi-spectral scheme can be used to add an "extra dimension" of security to image encryption. In the example above, we cascaded standard and multi-spectral encryption schemes. To successfully decrypt the resultant image, the receiver must correctly guess both the standard key K 1 and the multi-spectral key K 2 . Suppose the standard scheme uses a 256-bit key chosen from a key space size of 2 256 . For a multi-spectral scheme with n > 10, the cascaded key space is larger than (2 256 ) 2 . The multi-spectral scheme we describe thus effectively squares the key space size. We conclude that our approach can provide an extra dimension for secure encryption, one which can leverage emerging technologies for multi-wavelength transmission and imaging. In summary, we proposed an encryption scheme based on pixel multiplexing for transmission of images across multiple wavelength channels. Our encryption algorithm divides the pixels of a given plain image into multiple, seemingly random cipher images. Decryption by the intended recipient is performed by using a key to convert the cipher images into meaningful information. We considered a generalized key space based on mathematical decryption functions, each written as a sum of products of cipher images. Using combinatorics, we showed that encryption of a given plain image into more than 10 channels ensures security against a brute-force attack. We also considered a more sophisticated attack; one in which an attacker uses mean similarity scores of randomly chosen samples of the key space to extract information about the plain image. For a 50 × 40 pixel image, we showed that encryption remains secure as long as more than 14 channels are used.
While this work uses RGB display and imaging to emulate a 5-channel scheme, we find that increasing the number of channels leads to increased noise levels and decryption errors. True implementation of n-channel encryption and decryption will require independent control of transmission and detection on n separate wavelength channels. To this end, the continued development of tunable metasurfaces, paired with multi-and hyperspectral imagers, will illustrate the true potential of the proposed encryption method.

Methods
Calculation of total number of keys for fixed n. For a given n, the number of cipher images in each product term (n p ) can range from 1 to n. The total number of distinct product terms of n p cipher images is given by C(n, n P ) ≡ n! n p !(n−n p) ! . Therefore, for fixed n and n p , the number of product terms in the decryption function (n s ) can vary from 1 to C(n,n p ). The number of keys for fixed n, n p and n s is then equal to n keys n, n p , n s = C(C n, n p , n s ). Summing over all n p and n s , the total number of keys for fixed n is given by: Definition of similarity score. The similarity score S for an image M' with respect to an image M is given by: Here cov(M',M) refers to the covariance of M' and M. In our analysis, M is a binary image depicting a letter while M' is an output image obtained by operating a key on the cipher images possessed by the attacker. We normalize M' by its maximum value to ensure that none of its pixels take a value greater than 1. As a result, the similarity score takes a value between 0 (totally dissimilar images) and 1 (M' = M). Experimental demonstration. The original plain image is first converted to a random image by performing an XOR operation with a 50 × 50 pixel one-time pad. The resulting image is converted into a display image sing the color lookup table created in Fig. 7b. The white pixels of the image are assigned randomly to one of the key triplet colors while the black pixels are assigned to the non-key triplet colors. The resulting display image is captured by a color camera and resized to the original resolution of 50 × 50 pixels. We use the calibrated color lookup table (Fig. 7c) to determine the product term corresponding to each pixel of the display image and retrieve the cipher images. These cipher images are operated upon by the decryption key to retrieve the intermediate image which is converted to an output image by performing an XOR with the one-time pad.

Data availability
The data used in this study is available from the authors upon reasonable request.