Virtual Recovery of Content from X-Ray Micro-Tomography Scans of Damaged Historic Scrolls

There is a large body of historical documents that are too fragile to be opened or unrolled, making their contents inaccessible. Recent improvements in X-ray scanning technology and computer vision techniques make it possible to perform a “virtual” unrolling of such documents. We describe a novel technique to process a stack of 3D X-ray images to identify the surface of parchment scrolls, unroll them, and create a visualization of their written contents. Unlike existing techniques, we can handle even challenging cases with minimal manual interaction. Our novel approach was deployed on two 15th and 16th century damaged historic scrolls from the manors of Bressingham and Diss Heywood. The former has become fused, probably due to exposure to moisture, and cannot be fully unrolled. The latter was severely burnt several hundred years ago, becoming thoroughly charred, heat-shrunken, and distorted, with all the sheets now brittle and fused together. Our virtual unrolling revealed text that has been hidden for centuries.

where skeleton(.) represents the morphological skeleton extraction, andS i−1 denotes the background of S i−1 . The obtained curves in C i are superimposed upon S 0 i . A pair of junction sections which should be matched are close to a curve in C i . Therefore, the curves in C i can be used to match the pairs of junction sections which should be connected, and will provide an initial guess for the final segmentation.
However, the narrow fused region in S 0 i may obstruct the junction section matching. Fig. 1(a) presents the segmentation S i−1 of slice i − 1 as well as its foreground skeleton (red curve) and background skeleton (black curve), and Fig. 1(b) shows the initial segmentation S 0 i of slice i. The background skeleton of S i−1 contained in the foreground of S 0 i is displayed in Fig. 1(c). In this subfigure, the two junction sections between the two fused layers cannot be matched, because the background skeleton of S i−1 between these two junction sections cannot be entirely contained in the narrow fused region, and is thereby broken by Eq. (1). To tackle this problem, we exploit the foreground skeleton of S i−1 to identify which two curves in C i should be connected together. In Fig. 1(d), the red curve c with p 1 and p 2 as endpoints is a foreground skeleton of S i−1 included in the background of S 0 i . We link p 1 and p 2 by the shortest path t (red dotted line in Fig. 1(e)) along the layer boundary. If t only touches two endpoints of two different curves in C i and also satisfies 1 α ≤ length o f t length o f c ≤ α, these two curves will be linked by the shortest path along the layer boundary, as illustrated in Fig. 1(f). In our work, α is set as 5.
In reality, not all the curves in C i are useful for the segmentation process, so we will remove those extraneous curves from C i . For each endpoint of a curve in C i , we try to find in S 0 i the junction section which meets the following three conditions: (i) the junction section is within l 1 pixels away from the curve; (ii) the distance between the junction section and the endpoint is less than l 2 pixels, and (iii) the junction section is nearest to the endpoint among all the junction sections meeting conditions (i) and (ii). In our work, l 1 and l 2 are set as 3 and 20 respectively. The pair of junction sections corresponding to the same curve is considered to be matched by this curve. From C i will be deleted the curves which cannot match two junction sections and are shorter than a threshold d (100 in our work), because these curves are too short and useless.
The obtained useful curves in C i will be finally preprocessed as the initial guess for the further optimization. If the endpoint e of the curve has a corresponding junction section, whose middle point m is closest to the point e ′ of the curve (Fig. 2(a)), then we will cut out the segment ee ′ and link m and e ′ together by a straight line segment. On the other hand, if e has no corresponding junction section, as shown in Fig. 2(b), then let the boundary point q be the nearest to e, and we will link q and e by a straight line segment.

Optimization Method for Even Segmentation
The main body of the paper formulated the goal of optimizing a set of boundary curves in order to separate a fused region into several layers as evenly as possible. Details are provided in the SI of how Eq. (4) is solved, and the method is then demonstrated on a simple synthetic example.

Optimization Method
The necessary condition of Eq. (4) requires p to satisfy the Euler-Lagrange equation 1 2 The traditional method to solve Eq. (2) first treats p as a function of k as well as time t, and then sets the negative partial derivative of p with respect to t equal to the left side of Eq. (2) and finally solves Eq. (3) by finite difference (FD) 2 . In the practical application of FD, if △k is fixed, in order to stabilize the solution p(k,t), the temporal step △t has to be carefully chosen to make the numerical scheme satisfy the Courant-Friedrichs-Lewy condition 3 . However, it is difficult to find a △t suitable for all different kinds of parchment scrolls, due to the fact that these scrolls are quite different in the shape of layer, image size, and gray level and so on. Therefore, for the adaptability of the algorithm, we will treat Eq. (4) as nonlinear optimization. Discretizing Eq. (4) yields s.t.
in which, the point p j corresponds to the point p(( j − 1)△k), and p h = p(m). Because △k is fixed, Eq. (4) is equivalent to the following minimization s.t.   where β ′ = β △k 4 . The two additional points p 0 and p h+1 in Eq. (5) can be constructed according to the shape of the junction sections where p(0) and p(m) are respectively located. We take p 0 as an example to explain the construction method. As demonstrated in Fig. 3, given that p(0) is an endpoint of curve r, and q 1 and q 2 are two boundary points which are w pixels away from and at both sides of p(0), p 0 can be determined by where l is the distance between p 0 and p(0). Larger values of l will make the curve r smoother at the endpoint p(0). We set l as 10 and w as 7 in our work. p h+1 can be decided in the same way. By convention, Eq. (5) can be further rewritten into matrix form where , and P = G 1 G 2 T with G 1 and G 2 as We use the steepest descent method to optimize Eq. (7). The gradient of f (X) can be produced by Thus X (t+1) in tth iteration can be updated by where I denotes identity matrix, and λ is the step-size along the direction of the gradient, automatically chosen by Armijo backtracking, and With the fourth order central difference, ∇ x j E(x j , y j ) and ∇ y j E(x j , y j ) can be respectively calculated by It can be easily proved that the update formula (9) has the same form as the explicit finite difference discretization of Eq. (3) has. However because of dynamically choosing the step-size λ , the proposed method does not suffer from the aforementioned drawback of the FD. As mentioned in the main body of this paper, all n curves in C i will be sequentially optimized for three cycles by this method, since we find that this provides sufficient convergence. The pseudo-code of our optimization method is described in ALGORITHM 1 as follows.
ALGORITHM 1: Optimization Input: n curves propagated from previous segmentation results. For cycle=1 to 3 For r=1 to n Get segmentation result by all the curves except curve r.
Generate the shape energy map E by Eq. (2). Calculate p 0 and p h+1 by Eq. (6). Optimize the curve r by Eq (7). Replace the old curve r by the updated one. End End Output: n optimized curves.

Experiment on Synthetic Data
An experiment is performed to verify the effectiveness of the proposed optimization algorithm. In this experiment, we set β ′ as 300. Fig. 4(a) illustrates a fused region which is comprised of 4 layers. Fig. 4(b) demonstrates the initial guess of the three curves which link the matched junction sections to separate the fused region. Obviously, the quality of the initial guess is fairly bad since not only are these curves quite irregular but they do not evenly separate the fused region at all. In order to update one curve, we first use the other curves to segment the fused region, getting segmentation T, as shown in Fig. 4(c). The shape energy map E of T is shown in Fig. 4(d), in which the brighter pixels correspond to higher energy. It can be noticed that the local minima in E are located in the middle of each region. The optimization result of that curve is displayed in Fig. 4(e). It can be seen that the optimized curve (blue) almost lies on the middle of the region and is much smoother than its initial guess (red). The final segmentation result is presented in Fig. 4(f). As compared with Fig. 4(b), it evident that in spite of the bad initial guess of these curves, the optimization algorithm still provides a high-quality segmentation result, which demonstrates the effectiveness and robustness of our method.

Broken Skeleton Connection and Final Segmentation
We now detail how to connect the broken skeletons in U i and the growth process used to get the final segmentation. Let f i be the layer boundary of B i , then the endpoints of the skeleton segments in U i must lie on the boundary f i , as shown in Fig. 5(a). Therefore, we can first link the broken skeletons in U i along the boundary f i . Supposing that a sheet skeleton of B i−1 has been broken into m parts in U i , which gives rise to 2m endpoints, we number these endpoints in order along the skeleton, as exhibited in Fig. 5(a). It can be seen that a point with an even number may only be connected to a point with an odd number larger than that even number, for example, the point 2 can only be linked with point 3 or point 5. Accordingly, given an even number point r and an odd number point s, where r = 2, 4, . . . , 2(m − 1) and s = r + 1, r + 3, . . . , 2m − 1, we respectively find in f i and K i−1 the shortest paths p f rs and p K rs between these two points by Dijkstra's algorithm. If p f rs does not intersect with any of the skeleton part in U i , and p f rs and p K rs meet the relationship 1 ρ ≤ length o f p f rs length o f p K rs ≤ ρ, we will delete the pixels of p K rs in U i , and then link the endpoints r and s by p f rs in U i ; we subsequently proceed to check the even number point s + 1. Otherwise, the point r + 2 will be checked by the same way. ρ is set to 3 in our work. The connection result is illustrated in Fig. 5(b).
Note that the skeleton segments contained in the three longer sheets have been correctly connected. In order to finally link the skeleton segments in the shortest broken sheet, we first calculate the convex hull H i of B i , and then obtain the Boolean union L i of H i , the background skeleton of B i−1 , and U i . L i is provided in Fig. 5(c), where the convex hull H i and the background skeleton of B i−1 are respectively represented by the brown and black curves. Fig. 5(c) indicates that L i gives the borders to the gap between the broken skeleton parts. On the constraint of such borders, the curves which link the broken parts will be regular 4/10 and prevented from intersecting with other skeletons. For a sheet skeleton broken into v parts in U i , we number these endpoints in order along the skeleton, as illustrated in Fig. 5(c). It can be observed that each even number point r should be linked with point r + 1, r = 2, 4, . . . , 2(v − 1). Therefore we first obtain the shortest path p K r(r+1) between points r and r + 1 from K i ; then fix the endpoints of p K r(r+1) , and meanwhile push p K r(r+1) away from the borders by the optimization method (see Section C in the main manuscript) with the shape energy map of the non-border region in L i+1 . The final linking result is illustrated in Fig. 5(d).
After getting the complete skeletons, we treat each skeleton as a set of seeds and grow them to generate the final segmentation result. Let F i be the foreground of B i and R j be the set of seed points initially formed by the skeleton j in U i , 1 ≤ j ≤ s, if a point p which is 8-connected to R j satisfies the condition: p ⊆ F i , p ⊆ R j and p is not adjacent to any other sets of seeds except R j , then this point will be added to R j . We add all of R j 's adjacent points that satisfy the condition to R j , and take the grown set R j as a new set of seeds. We repeatedly grow all the sets of seeds one by one until no point should be added to the seed sets. Fig. 6 demonstrates some intermediate results during growing ( Fig. 6(a)-(c)) and the final segmentation result (Fig. 6(d)). Accordingly, each single sheet can be obtained from Fig. 6(d).

Experiment on Real Scroll Data
Several experiments are performed to test the accuracy of the approach. The first experiment is conducted to deal with the Bressingham scroll. β ′ in Eq. (5) is set as 200 here. Fig. 7(a) illustrates a typical XMT image of the parchment, whose initial segmentation is shown in Fig. 7(b). It can been seen from Fig. 7(a) and (b) that there are a large number of missing parts of the layers, and furthermore, the fused region consisting of 4 layers is compressed, and is only around 10 pixels wide. We use our algorithm to separate the fused regions into several layers and connect the broken layers together. The result is shown in Fig. 7(c), where the red box highlights the segmentation of a fused region consisting of 4 layers. Although the narrowest part of this region is only about 10 pixels wide, our method still evenly separates it into four layers, which strongly testifies to the effectiveness of our method. In addition, it should also be noticed that curves linking the broken layers are equally spaced in the background of the image, thus avoiding intersecting with each other. The segmentation obtained by Liu et al.'s method 4 is illustrated in Fig. 7(d). As compared to our result (Fig. 7(c)), there exist some long and thin fused regions which have not been segmented.
The next experiment is to segment the X-ray image of the Diss Heywood burnt scroll which consists of four sheets. β ′ in Eq. (5) is set as 300 in this trial. All the sheets are fused together in this scroll, and it is therefore a great challenge for our method to segment the four different sheets out of the XMT image. Fig. 8(a)  chunks of fused regions can be clearly observed in the initial segmentation ( Fig. 8(b)). In spite of this big difficulty, our method can still generate the correct segmentation for the scroll, as shown in Fig. 8(c). The red and green boxes at the top left corner and the bottom right corner show the corresponding portions of the segmentation result. It can be seen that the fused region has been evenly separated into as many as fifteen regular layers. Fig. 8(d) gives the segmentation result of Liu et al.'s method. As observed, some long fused regions are not successfully separated into several layers. Fig. 9 displays the four sheets segmented from the Fig. 8(b), which clearly demonstrates the correctness and effectiveness of our segmentation method.

Correcting Striping Artifacts
Since the scrolls were scanned as a series of separate volumes it was necessary to perform intensity correction, as otherwise striping artifacts are clearly visible from the multiple volumes being merged together. This was carried out by normalising each row, where a row corresponds to the reconstruction from a single image. Since the majority of the surface of a scroll does not contain writing, and consequently makes up the parchment surface in the reconstruction, the median of the intensities along a row provides an estimate of the typical parchment intensity value. The median absolute deviation (mad) of the row intensities provides an indication of the contrast. It is calculated as the median of the differences of the intensity values from that median: where a i denotes the set of intensities in a row. In practice, for the Bressingham scroll, to improve robustness, only 25% of the pixels were used to estimate the median and mad, namely those towards the top of the scroll, since that portion of the scroll was less damaged and contained less artifacts. The median and median absolute deviation respectively provided the offset and scaling for the intensity normalisation.