Drone-based displacement measurement of infrastructures utilizing phase information

Drone-based inspections provide an efficient and flexible approach to assessing aging infrastructures while prioritizing safety. Here, we present a pioneering framework that employs drone cameras for high-precision displacement measurement and achieves sub-millimeter accuracy, meeting the requirements for on-site inspections. Inspired by the principles of human auditory equilibrium, we have developed an effective scheme using a group of strategical reference markers on the bridge girders to measure structural displacements in the bridge. Our approach integrates the phase-based sampling moiré technique with four degrees-of-freedom geometric modeling to accurately delineate the desired bridge displacements from camera motion-induced displacements. The proposed scheme demonstrates favorable precision with accuracy reaching up to 1/100th of a pixel. Real-world validations further confirmed the reliability and efficiency of this technique, making it a practical tool for bridge displacement measurement. Beyond its current applications, this methodology holds promise as a foundational element in shaping the landscape of future autonomous infrastructure inspection systems.

The paper proposes a framework that uses drone cameras to achieve highly accurate displacement measurements.The idea -and actual deployment-of drones for this purpose is not new (as acknowledged by the table of such deployments) so the added value/novelty is naturally in the specific mmethodology.The challenge (of drone-based displacement measurement) lies in the algorithm for compensating for errors induced by drone camera movement during hovering.This is where the novelty lies.The sampling moiré technique, previously developed by the authors, in the proposed framework has been published in other journals (this paper has cited the relevant literatures).
The application is indeed very impressive and the resolution good.This is the first time I've seen drone-based displacement measurement used for concrete bridges with such a large field of view; the camera covers a span of a bridge that extends over 60 meters.I believe that the proposed measurement framework is revolutionary, and its effectiveness has been validated in a field test.
I don't think the proposed drone-based method has significant superiority over the ground camera-based method using the sampling moiré technique.This is because the proposed method requires the installation of (rather large) artificial targets and a large field of view from the camera (which is also a benefit) to cover the two reference markers, which are placed separately at the two ends of the girder.These requirements diminish the advantages of drone-based measurement.But if you have to use drones (because you can't locate on ground) but you can place markers/targets this is a demonstrably powerful methodology.

Some specifics:
Because this is a Nature journal I'm being more pedantic than I would for more specialist journals.First three sentences of the abstract are a very clunky introduction with much redundancy.I suggest blending a couple of phrases from it into the 4th sentence, while 5th sentence has redundant 'the's.Later "utilize efficacious and cost-effective" is a starkly contrasting style that might fit in a verbose powerpoint.The abstract should do a better job.
Issue of aging infrastructure … recent years.Well, yes, the more recent, the more the more aged the structure!It doesn't automatically mean the bridge is deficient, however natural or manmade actions may lead to 'damage' or degradation.It would be better to focus on e.g.ASCE infrastructure report card or similar EU/Asian exercises that specifically look at condition, then the need becomes more stark.The point about consequent economic loss is lack of resilience which is not just that bridges fail but that they are taken out of service for a length of time, and efficient and effective inspection can inform structural intervention solutions to minimise total loss.It's not just about spotting 'damage' by inspection but looking for performance anomalies that signify deep problems you can't see (think of human internal or external bleeding).So your point about furnishing indispensable insights is spot on."Recent studies have focused on vision-based structural health monitoring methods for assessing civil infrastructure".True, but better to say that is it is a relatively hot and fast growing research area in structural health monitoring (SHM).Reliance on stationary cameras is not quite right as it's well recognised that there is no such thing in the real world as a fixed camera; even total stations use fixed references to compensate for instrument movement and some of your references should (if not already) point to methods for removing camera shake effects from vision system data.The challenge is to translate the compensation strategies to a bigger scale.
The explanation of the compensation methodology is not entirely clear.The compensation process, which is critical for readers, should be described using formulae.I'm uncertain whether the authors incorporate the parameters (4DOF) of the two reference markers for image compensation, or if they are compensating each reference marker and measuring target separately.
The proposed compensation methodology requires two reference markers to be attached separately at two stationary areas, and the measuring targets should be within the range of these two references.In the field test, these two reference markers were installed at the ends of the girder, close to the piers.This could pose a significant limitation to its application.
Exactly how hostile are real-world conditions of aerial photography?So on page 3 can you define the 'standard deflection measurement system/conventional deflection measurement sensor' and how reliable that is as the 'gold standard'?'Credible alternative' is indeed the yardstick, but how to define this? -this is rather a vague statement.
In Fig 2b will your drone always be looking directly at the pattern i.e. not oblique with missing DOFs assumed irrelevant?The further away the less oblique (i.e. more perpendicular) the angle but the poorer the resolution; what is the tradeoff here?You don't show 'delta-alhpa/beta' in figure 3, or 'LINE' (why caps?As opposed to 'as a line' ….) and how are mk-A/B shown as 'A' and 'B' on the figure?You're not describing the figure too well.'Normalised Cross-Correlation' is not a proper noun and so only the acronym should use capitals, look for similar misuse elsewhere e.g. with 'ABC'.So a 'marker' is an assembly of black squares on a grid; this is often referred to as a 'target'; the black squares could individually be targets or 'salient features'.There is no standard terminology but perhaps a look at what's conventionally used vs what you used would be helpful to you and readers.It seems the periodic pattern is essential; are there other target/marker/feature patterns that could work or is this fundamental, so limiting to specific artificial markers/targets?This, and the size of the marker/target warrant some discussion.'20m from the river' means above the river?For consistency, as you use 'mm' to quantify pitch and 'km' for speed, why not use 'm' throughout to quantify length and distance?Also 'ton' is not an SI unit, use metric ton (tonne or 't').Figure 4 use non-breaking spaces where appropriate in the captions, although type-setting this may be fixed.So with the 35 m bridge, it's not about bridge displacement but a precision-controlled movement relative to a deck that was (reasonably) assumed not to move? 'As a vision-based system, our proposed approach differs from conventional methods, relying on intensity information' -does this mean …. methods that rely on …. ?
So is this a revolutionary approach?I think there is enough to say 'yes'.One drawback is the need to artificial targets (or markers), whereas natural targets allow tracking at inaccessible locations -a common feature of a lot of infrastructure that would be ideal for this type of technology -see general comments above/ Supp info.You mean tilt not tile angle?You use it twice and maybe the collection f squares is a 'tile' -indeed it looks like one, but you don't use the term in the man document.
Reviewer #2 (Remarks to the Author): The paper presents a pipeline for drone-based displacement measurements using phase-based sub-pixel corrections.The paper is well written, and the supplementary results are impressive.However, the paper makes key assumptions which should be clearly articulated.The paper also fails to reference several highly-cited and related topics and as a result makes inaccurate claims.
The following pieces of research on using drones for displacement measurement have not been cited.This is just a sample list, there are other papers too that have been missed.Details about how the initial angle correction are applies should be provided -is this applied to each reference marker separately?If so, what algorithm is used to automatically detect the marker?Additionally, how are different rotations of the markers reconciled?Is this only applied on the first frame?
The algorithm seems to have an assumption that two stationary points must be available for applicability.This should be emphasized in the abstract and introduction.If this is not the case this should be clarified.Will the algorithm work if the reference markers are out of plan with the measurement marker?How is the proposed method effected by out-of-plane rotation of the drone (i) at the start of the data capture, and (ii) during the data capture?Similarly, how does the in-plane rotation of the gimbal affect the algorithm?Some clarification on this would be helpful.
Reviewer #3 (Remarks to the Author): In the manuscript, the authors proposed a drone-based displacement measurement method.The obtained results should be significant and have a good potential for promoting the application of the traditional sample moiré methods.However, before the manuscript can be accepted for publication, the authors should address the following item: 1.In the sampling moiré method, the obtained phase information is relative value which requires manually specifying the zero point of the phase.However, marker C moves with the bridge as a whole and there is no zero point of displacement(phase) in this area.How to choose the zero point of the phase? 2. In the part of "Methods, Coarse-compensation with pixel level precision", similarity transform is implemented to shift the markers in the video sequence to be within half the marker pitch distance from the marker template in the initial frame.The step is crucial since the subsequent sampling moiré method utilizes relative phase which means any displacement greater than half the marker pitch distance may cause multiplicity.However, how can the authors guarantee the markers will be shifted within the distance in half marker pitch (from the marker template in the initial frame) ?
3. How can the authors fabricate the maker on the bridge structure?Is there any influence from the deviation of the specimen maker from the desired position on the measurement results?

Dear Reviewers
Thank you for your thorough review and valuable feedback on our manuscript.We greatly appreciate the supportive comments, such as "The proposed measurement framework is revolutionary (Reviewer #1)", "The paper is well written, and the supplementary results are impressive (Reviewer #2)", "The obtained results should be significant and have a good potential (Reviewer #3)", and the reviewers' constructive comments for further enhancements regarding the importance of infrastructures inspection and prospects, the validity of the 4 DoF model, and details of phase analysis techniques.
Consequently, we have carefully considered all comments and made the necessary revisions to address them comprehensively.Below, you'll find our point-by-point responses to the comments, with the Referees' remarks presented verbatim and our responses in blue texts.

Reviewer #1 (Remarks to the Authors)
General comments: The paper proposes a framework that uses drone cameras to achieve highly accurate displacement measurements.The idea -and actual deployment-of drones for this purpose is not new (as acknowledged by the table of such deployments) so the added value/novelty is naturally in the specific methodology.The challenge (of drone-based displacement measurement) lies in the algorithm for compensating for errors induced by drone camera movement during hovering.This is where the novelty lies.The sampling moiré technique, previously developed by the authors, in the proposed framework has been published in other journals (this paper has cited the relevant literatures).
The application is indeed very impressive and the resolution good.This is the first time I've seen drone-based displacement measurement used for concrete bridges with such a large field of view; the camera covers a span of a bridge that extends over 60 meters.I believe that the proposed measurement framework is revolutionary, and its effectiveness has been validated in a field test.

Reply for general comments:
We are grateful for the constructive suggestions and insightful comments.During the revision process, we have carefully addressed all comments to further improve the quality of our manuscript.As acknowledged by the reviewer, our major contribution to this research is developing an efficient approach to delineating camera motions to achieve sub-millimeter precision displacement measurement.Fundamentally, it involves two supporting techniques: First, from the viewpoint of optical imaging measurement, we adopted the sampling moiré phase analysis technique, which enabled sub-millimeter precision measurement with 1/1000 pitch (or 1/100 pixel) accuracy.High-precision input guarantees all further processes, including both camera motion estimation and genuine bridge displacement measurement.Second, to estimate camera motions, we employed the 4 degree-of-freedom (DoF) formulation, which had been proved to be simple yet sufficient for this application.Theoretically, drone motion is subjected to 6 DoF non-stationary motion and therefore the model has been extensively investigated in previous studies.However, we argue that a 4-degree-of-freedom (4 DoF) approximation is applicable for the case of drone hovering.
Our approach and design have been supported by both theoretical analysis, which includes computer simulation (see Supplementary Figure 4), as well as validation experiments conducted in the laboratory (see Supplementary Figures 5 and 6) and in-field experiments (see Supplementary Figure 12).Combining the two ideas together, we achieved sub-millimeter precious displacement measurement using drone photography.Following the successful validation of this technique through rigorous verification experiments using real bridges, for the first time, this paper showcases that vision-based UAV inspection can achieve comparable accuracy in measuring bridge displacement as the commonly used Doppler sensor method.
Additional general comment: I don't think the proposed drone-based method has significant superiority over the ground camera-based method using the sampling moiré technique.This is because the proposed method requires the installation of (rather large) artificial targets and a large field of view from the camera (which is also a benefit) to cover the two reference markers, which are placed separately at the two ends of the girder.These requirements diminish the advantages of dronebased measurement.But if you have to use drones (because you can't locate on ground) but you can place markers/targets this is a demonstrably powerful methodology.

Reply for additional general comment:
We thank the reviewer for the comments regarding the drone-based method and ground camera-based method.These comments align with the specific application scenarios that the technique addresses.We would like to clarify that our intention is not to replace the fixed camera-based methods with the proposed drone-based measurement technique.
Instead, it serves as a complementary tool to compensate for situations where finding a static position to fix the camera is not feasible, such as bridges spanning over rivers and ravines.These scenarios present challenges in terms of camera placement, and our technique aims to fill in those gaps.
Moreover, compared to the fixed camera-based method, drone-based bridge inspection could be advantageous in flexibility and cost efficiency, enabling future autonomous inspection.
As highlighted in the comments, the laborious aspect of the proposed approach lies in setting the markers.Nevertheless, considering that other displacement sensor methods necessitate the attachment of reflective sheets and installation of accelerometers, the labor demands on field workers remain similar to conventional methods.Fundamentally, this arrangement is premised on a geometric abstraction of the typical structural design of bridges, encompassing the pier, girder, and deck.To measure the displacement of the entire bridge structure, it is necessary to place two reference markers at both ends of the girder, with a testing marker set near the center.Future considerations will involve efforts to simplify this process for practical applications.
Comment (1): First three sentences of the abstract are a very clunky introduction with much redundancy.I suggest blending a couple of phrases from it into the 4th sentence, while 5th sentence has redundant 'the's.Later "utilize efficacious and cost-effective" is a starkly contrasting style that might fit in a verbose powerpoint.The abstract should do a better job.
Reply for comment (1): It was very helpful and informative.Based on the reviewer's advice, we revised the Abstract significantly to make it more concise without redundancy and to show the importance of social infrastructure and the significance of this research.We also paid attention to other "the" and wording and made the Abstract as readable to the reader as possible.
[Response] We have revised the abstract following the reviewer's suggestion.The current version is more concise and clearer in conveying the intended information.

Comment (2):
Issue of aging infrastructure … recent years.Well, yes, the more recent, the more the more aged the structure!It doesn't automatically mean the bridge is deficient, however natural or manmade actions may lead to 'damage' or degradation.It would be better to focus on e.g.ASCE infrastructure report card or similar EU/Asian exercises that specifically look at condition, then the need becomes more stark.
The point about consequent economic loss is lack of resilience which is not just that bridges fail but that they are taken out of service for a length of time, and efficient and effective inspection can inform structural intervention solutions to minimise total loss.It's not just about spotting 'damage' by inspection but looking for performance anomalies that signify deep problems you can't see (think of human internal or external bleeding).So your point about furnishing indispensable insights is spot on."Recent studies have focused on vision-based structural health monitoring methods for assessing civil infrastructure".True, but better to say that is it is a relatively hot and fast growing research area in structural health monitoring (SHM).
Reply for comment (2): Thank you for raising the important point regarding a more precise description of the current status of the aging infrastructure.We strongly agree with this and have incorporated your suggestions in the manuscript.In detail, we have done the following two parts: Part-1.In the introduction section, we elucidated that the challenge of aging infrastructure globally is two-fold: bridges nearing the end of their service period and bridges experiencing condition degradation due to heavy usage or environmental factors.Drawing on information from the ASCE, as well as EU/Asian report cards, we provided a comparative review of these two categories in the introduction section.
Part-2.We present an in-depth/comprehensive discussion on the multifaceted negative impacts of aging infrastructure, which include safety risk for inhabitants, time-cost for repairment/replacement, economic losses due to out-of-service period and broader social concerns.
There is a crucial need for high-precision and efficient inspection techniques to detect subtle signs of deterioration, thus enabling early remedial action to effectively manage the increasing number of aging structures.
[Response] In response to the reviewer's comments, we have elaborated on the survey portion in the Introduction, highlighting the societal challenges brought about by aging and deficient infrastructures.We have also underscored the critical need for efficient and effective inspection techniques to address the pressing issue of worldwide infrastructure assessment.

Comment (3):
Reliance on stationary cameras is not quite right as it's well recognised that there is no such thing in the real world as a fixed camera; even total stations use fixed references to compensate for instrument movement and some of your references should (if not already) point to methods for removing camera shake effects from vision system data.The challenge is to translate the compensation strategies to a bigger scale.

Reply for comment (3):
As the reviewer points out, even a camera fixed on a tripod is affected by ground vibrations in outdoor field experiments, resulting in slight image blurring.Therefore, almost all sensor-based or image-based measurements use a fixed reference point to calculate the relative displacement to a reference point, thereby reducing measurement errors due to camera vibration.In fact, in our previous study [Ref. 46], when we experimented with the sampling moiré method using a fixed camera to measure the deflection of a concrete viaduct on a Japanese bullet train (i.e., SHINKANSEN) at a speed of 320 km/h, a single reference marker was placed on the bridge girder and the relative displacement between the measurement marker and the reference marker in the center of the bridge.
In this research, instead of finding "reference point" we devise to establish "reference line" by using two reference markers placed on the edges of bridge.The displacement can be subsequently measured by calculating the deviation between the horizontal coordinates of a measurement marker attached near the bridge center and the "reference line".Taking advantage of the sampling moiré method, efficient motion compensation with a similarity transformation can be achieved that aligns these reference lines in images before and after deformation with an accuracy level of 1/100 camera pixels.This is the fundamental model of the developed UAV displacement measurement system, and the details can be referred in Fig. 3.

Comment (4):
The explanation of the compensation methodology is not entirely clear.The compensation process, which is critical for readers, should be described using formulae.I'm uncertain whether the authors incorporate the parameters (4DOF) of the two reference markers for image compensation, or if they are compensating each reference marker and measuring target separately.

Reply for comment (4):
We sincerely appreciate the reviewer for providing valuable feedback on this matter.We apologize for the insufficient explanation of the motion blur compensation method, which is indeed a critical component of our proposed system.Taking this chance of manuscript revision, we have added detailed explanations to clarify the why and how of the motion compensation method based on 4DoF modeling in this application.First, we present theoretical and experimental analysis results of 4DoF model in Supplementary Fig. 4, Fig. 5 and Fig. 6, proving that the 4 DoF is sufficient for displacement investigation using drone hovering video.Subsequently, we present the detail process, that is, similarity transformation is employed to align the entire drone inspection video to the initial frame, utilizing the center coordinates extracted from multiple tracked markers.Through this process, we ensure that the acquired marker images align within half the marker pitch distance from the reference frame.This alignment is vital for conducting phase-based moiré analysis and accurately recovering the sub-pixel coordinates of the marker centers.It serves as a fundamental step to counteract displacements caused by camera motion, enabling precise measurement of structural displacements.The detailed formulas and calculations are provided in Supplementary Note 1 for further reference.
[Response] Theoretical and experimental analysis of 4DoF modeling is presented in Supplementary Fig. 4 and Fig. 5, respectively.We added explanations to the image blur compensation method in "Methods -Coarse-compensation with pixel-level precision" section and the detail 4 DoF modeling approach including math formulas are presented in Supplementary Note 1.

Comment (5):
The proposed compensation methodology requires two reference markers to be attached separately at two stationary areas, and the measuring targets should be within the range of these two references.In the field test, these two reference markers were installed at the ends of the girder, close to the piers.This could pose a significant limitation to its application.

Reply for comment (5):
We appreciate your feedback and agree with your point, and placing the two reference markers requires a broader field of view for image recording.Nevertheless, to accomplish this extremely challenging task of detecting sub-millimeter deflection values from aerial photography, the answer we arrived at this moment was that two reference markers, which serve as a sense of balance equivalent to two human ears, are indispensable.Thanks to this idea, as shown in the results of this study's applied experiments, we obtained results in good agreement with conventional sensors for deflections of a few millimeters, even when shooting from 85 and 100 meters.These field test results validated the proposed system is viable for use on major bridges.

Comment (6):
Exactly how hostile are real-world conditions of aerial photography?So on page 3 can you define the 'standard deflection measurement system/conventional deflection measurement sensor' and how reliable that is as the 'gold standard'?'Credible alternative' is indeed the yardstick, but how to define this? -this is rather a vague statement.

Reply for comment (6):
Thank you for pointing out the practical issue regarding the usage restrictions of the proposed drone-based displacement measurement method.As a measurement system based on optical sensors, our system is vulnerable to factors like fog and strong winds that can increase measurement variability.Therefore, it is desirable to conduct experiments in clear, lowwind conditions to minimize these effects.For the other requirements, such as adverse weather conditions, and safety concerns, we regard our technique adheres to the existing regulations governing common drone-based inspection methods.Sorry for the ambiguous expression of 'standard deflection measurement system/conventional deflection measurement sensor' on page 3. We had rephrased the words to "most extensively applied deflection measurement technique", eliminating the unclear presentation.
We found that the expression 'credible alternative' in the original manuscript was overstated.Therefore, the term 'credible alternative' in the manuscript was revised to 'an effective method' for accuracy.
[Response] We have made revised the expression of 'standard deflection measurement system" on page 3 to "the most extensively applied deflection measurement technique", ensuring clarity and accuracy.Also, we changed 'credible alternative' to 'an effective method', to accurately describe the proposed approach.

Reply for comment (7):
We thank the reviewer for point out this critical issue regarding drone-bridge relative position and also for the following constructive comments.The question directly related to the fundamental design of this displacement measurement system.We are sorry that we didn't present the content clearly and we take this chance to making further explanations.In the basic setting of our drone-based system, the drone hovers while facing the testing marker (marker-C, centrally positioned on a bridge).In practice, obliqueness can arise due to wind, slight movement during hovering and gimbal functionalities.Thankfully, the sampling moiré (SM) is not confined to capturing images exclusively from a frontal perspective of the bridge.Previous research has validated the method's capability to deliver accurate deflection measurements even when capturing images obliquely (e.g., from beneath a bridge or at a diagonal across it).As such, there is no substantial concern that a slight oblique angle will significantly impact measurement accuracy.
To demonstrate our method's robustness against these oblique angles, theoretical analysis and experimental results have been added to Supplementary Figs. 4, 5, and 6.As a result, we clarified that the oblique angle, represented by a composition of alpha/beta angles, will not introduce significant errors to our displacement measurements.In addition, as the drone moves further away from the bridge, the image resolution of markers decreases, which can lead to poor measurement performance.We have previously addressed this marker image resolution issue in our prior research [pixel limit], revealing that if the marker pitch image exceeds 10 pixels, our phase-based measurement can operate with no difficulty.
We want to express our gratitude for the feedback on Fig. 3 and the inquiry about the "LINE".In light of these comments, we have diligently revised Fig. 3, enhancing its clarity and ensuring its improved comprehensibility.Regarding the "LINE," it is indeed a fundamental aspect of our measurement system.Specifically, the two reference markers fixed on either side of the bridges establish a reference line.This reference line enables the calculation of the relative displacement between the line and the testing marker, which is fixed near the center of the bridge.The capitalization of LINE signifies its significance as the foundational model of our measurement system.Following the suggestions provided in the comments, we have made revisions to ensure consistency in the use of the terms 'A' and 'Mk-A' throughout the manuscript.Also, we have corrected the expression of 'maximized cross-correlation (MCC)' method in Supplementary Fig. 2.
[Response] We added Supplementary Figs. 4, 5, and 6 to investigate the effect on the oblique angle in detail.
Comment (8): So a 'marker' is an assembly of black squares on a grid; this is often referred to as a 'target'; the black squares could individually be targets or 'salient features'.There is no standard terminology but perhaps a look at what's conventionally used vs what you used would be helpful to you and readers.It seems the periodic pattern is essential; are there other target/marker/feature patterns that could work or is this fundamental, so limiting to specific artificial markers/targets?This, and the size of the marker/target warrant some discussion.

Reply for comment (8):
We thank the reviewer for the constructive comments regarding the "marker" used in this research.As pointed out by reviewer, it is of great necessity to manifest the terminology we used here which is different from the ones used in previous studies.To clarify the difference, we added Supp.Fig. 8a showing a photograph of markers designed and fabricated in experiment.It is noteworthy that the pattern can be either square or circular, as long as the pitch remains regular.The proposed approach can work with artificial markers as if the periodic pattern existed on the markers.
We used squares design due to the simplicity in fabrication.In addition, based on the research conclusion that the sampling moiré method's capability to detect minute displacements with a precision of 1/1000 of the grid pitch, the marker size (grid pitch) can be determined with respect to the desired precision in displacement values of the target bridges.
[Response] We added details of marker used in our system in Supplementary Fig. 8.In the explanation, we also discussed the relationship between marker design and displacement measurement precision.
Comment (9): '20m from the river' means above the river?For consistency, as you use 'mm' to quantify pitch and 'km' for speed, why not use 'm' throughout to quantify length and distance?Also 'ton' is not an SI unit, use metric ton (tonne or 't').Figure 4 use non-breaking spaces where appropriate in the captions, although type-setting this may be fixed.
So with the 35 m bridge, it's not about bridge displacement but a precision-controlled movement relative to a deck that was (reasonably) assumed not to move? 'As a vision-based system, our proposed approach differs from conventional methods, relying on intensity information' -does this mean …. methods that rely on …. ?
Reply for comment (9): We thank the reviewer for their careful reviewing of the manuscript and their constructive remarks concerning the consistency of units used in the manuscript.The "20m" mentioned in the text represents the distance between the river and the bridge.To provide clarity, we have now included this measurement in Figure 4. We agree that a standardized metric in units is preferred, such as using "meter" for consistency.However, considering the significant disparity in scale between the bridge and the displacements, we have retained "m" for larger structures and "mm" for the precise displacements.Thank you for pointing out the usage of the SI unit 't' and the caption typesetting in Figure 4. We have made the necessary revisions to rectify these issues.In the accuracy verification experiment for the 35-meter bridge, as you correctly described, we attached the marker to a precise linear moving stage to control the displacements accurately in the y-direction, and details can be seen in Supplementary Fig. 13.This experimental setup was designed to validate the precision of our proposed approach by simulating various bridge deflections.Since the experiment did not involve any external loads or vehicles, we assumed that the bridge remained stationary, with the displacement at its center equating to zero.We have also rephrased certain statements to enhance clarity throughout the manuscript.
[Response] We have incorporated the reviewer's comments and made the necessary revisions to the manuscript.To facilitate easy identification of the updated portions, we have highlighted them in the revised version.

Comment (10):
So is this a revolutionary approach?I think there is enough to say 'yes'.One drawback is the need to artificial targets (or markers), whereas natural targets allow tracking at inaccessible locations -a common feature of a lot of infrastructure that would be ideal for this type of technology -see general comments above.

Reply for comment (10):
We would like to express our sincere appreciation for the reviewer's recognition of our research as an innovative technological advancement.In contrast to marker-free methodologies, the technique in question does involve the use of artificial markers, which may be viewed as a drawback.However, this apparent limitation is offset by the inherent benefit of the approach: its reliance on cost-effective markers with a repeating pattern.This characteristic ensures a higher level of reliability in marker tracking and displacement determination, thereby ensuring measurement accuracy.
Furthermore, it holds the potential to emerge as a profoundly effective inspection method for truss bridges and railroad bridges.The feasibility of this concept rests on the availability of naturally occurring regular patterns-such as rivet arrangements in railroad bridges or triangular truss structures in truss bridges.By harnessing these natural (artificial) patterns and foregoing the necessity of marker attachment, the technique could only rely on aerial drone photography for bridge deflection measurement.In anticipation of such scenarios, we have already introduced a methodology capable of measuring displacement using arbitrary repetitive patterns.This approach can be employed seamlessly when an existing natural repetitive pattern on the structure's surface is being evaluated.
Comment (11): Supp info.You mean tilt not tile angle?You use it twice and maybe the collection of squares is a 'tile' -indeed it looks like one, but you don't use the term in the main document.

Reply for comment (11):
The typographical error in using "tile" angle instead of the correct term "tilt" angle has been rectified in the revised manuscript.We greatly appreciate your attentive review.

Reviewer #2 (Remarks to the Authors)
General comments: The paper presents a pipeline for drone-based displacement measurements using phase-based sub-pixel corrections.The paper is well written, and the supplementary results are impressive.However, the paper makes key assumptions which should be clearly articulated.The paper also fails to reference several highly-cited and related topics and as a result makes inaccurate claims.
Reply for general comments: We express our gratitude to the reviewer for thoroughly exanimating the manuscript and providing valuable feedback and constructive critiques.We have carefully considered their comments to enhance and clarify the manuscript, particularly in presenting the key assumptions related to the proposed displacement measurement system and improving the incorporation of relevant literature references.Please find below a detailed point-by-point response to all comments (reviewers' comments in black, and our replies in blue).
The following pieces of research on using drones for displacement measurement have not been cited.This is just a sample list, there are other papers too that have been missed.[Response] According to the reviewer suggestions, we added these recent drone-based displacement measurement paper in the revised manuscript (Refs. [35] [29] [31]).
There has also been a large amount of research on phase-based displacement measurement and modal analysis that has not been cited.E.g., The proposed research should be put into the appropriate context after a more detailed literature review.
[Response] We are grateful for the constructive suggestions of adding references to phase-based displacement measurement, which are critical to this research.We have conducted an extensive survey on this theme and included relevant previous studies in the reference list ).

Comment (1):
The statements in the manuscript should be appropriately modified to reflect the results from recently published research.For example, this statement may have to be reworded."Although some recent investigations have ventured into outdoor verification experiments, their accuracy has been constrained, with reported root mean square errors (RMSE) of approximately 2 mm at shooting distances of roughly 5 m" Reply for comment (1): Thank you for pointing out the issue regarding the expression of reference survey.We have revised the expressions to ensure the correctness of the literature survey in the manuscript.
[Response] We reworded the following statement in the revised Supplementary Information.Reply for comment (2): Thank you for pointing out this critical point.We agree this part should be clearly presented in the manuscript content and we made throughout revision in the Supplementary Fig. 2 to provide sufficient details regarding marker processing.In detail, in the current system, we didn't incorporate a marker detection function; practically, we manually generate the bounding boxes for each marker and further analysis is based on them for the first frame image.For the latter frames, the center of each marker can be detected automatically through a marker tracking algorithm.Then, the marker angle estimation had been performed for each marker individually based on rank minimization.The rationale behind marker angle estimation is that one marker with 0 degrees deviation over y-axis is assumed to exhibit most simple structures in a math form with minimum rank.
The details are added in Supplementary Note 1.Since the drone stays in hovering mode, we assume that the marker angles remain mostly unchanged for whole video, and thus angle estimation is performed only on the first image frame.After marker angle estimation, we perform counter-direction angle compensation to eliminate the angle effect, which greatly facilitate further process of phase analysis using sampling moiré method.
[Response] We added the details for initial angle correction in Supplementary Note 1.

Comment (3):
The algorithm seems to have an assumption that two stationary points must be available for applicability.This should be emphasized in the abstract and introduction.If this is not the case this should be clarified.Will the algorithm work if the reference markers are out of plane with the measurement marker?
Reply for comment (3): As the reviewer pointed out, two fixed points are used in this study to compensate image blurring with sub-pixel accuracy.We have added an explanatory note to clarify this point in the abstract and introduction.A slight out-of-plane displacement of the reference marker relative to the measurement marker does not interfere with the analysis.This is because the drone and the bridge to be measured are more than 30 m apart, and even if an out-of-plane displacement of a few millimeters occurs in the marker attached on the bridge, it will not make any difference in the captured image.
[Response] We have added an explanatory note to clarify this point in the abstract and introduction.
Comment (7): In Fig 2b will your drone always be looking directly at the pattern i.e. not oblique with missing DOFs assumed irrelevant?The further away the less oblique (i.e. more perpendicular) the angle but the poorer the resolution; what is the tradeoff here?You don't show 'delta-alhpa/beta' in figure 3, or 'LINE' (why caps?As opposed to 'as a line' ….) and how are mk-A/B shown as 'A' and 'B' on the figure?You're not describing the figure too well.'Normalised Cross-Correlation' is not a proper noun and so only the acronym should use capitals, look for similar misuse elsewhere e.g. with 'ABC'.

(
Before revision) "Although some recent investigations have ventured into outdoor verification experiments, their accuracy has been constrained, with reported root mean square errors (RMSE) of approximately 2 mm at shooting distances of roughly 5 m." (After revision) "The survey table indicates that the utilization of drone cameras for structural displacement measurement has become a prominent area of research in recent years, with considerable advancements achieved.However, a significant gap remains between concept-proof demonstration experiments and real-world applications in the previous research."Comment (2): Details about how the initial angle correction are applies should be provided -is this applied to each reference marker separately?If so, what algorithm is used to automatically detect the marker?Additionally, how are different rotations of the markers reconciled?Is this only applied on the first frame?