Modeling and verifying clustering properties in a vehicular ad hoc network protocol with Event-B

Vehicular ad hoc network (VANET) routing protocols resort to clustering in order to optimize broadcast traffic flooding. Clustering schemes usually rely on rules which apply to each vehicle in order to reach a targeted organization in a VANET. Most of the literature works which evaluate clustering for VANET focus on performance analysis. However, with autonomous vehicles coming to roadways, more rigorous relationships will be required between clustering rules and the resulting organization, so as to anticipate road safety in a better way. We propose a formal description of the properties which are expected in a VANET, while considering the rules of a given clustering scheme. Using Event-B, we first present a description of the VANET, the vehicles movement and the traffic generated by both routing and application messages. Then, based on an Event-B model of a basic routing protocol of the literature, we describe how the specific rules of a clustering scheme can be modeled along with the properties expected in the resulting organization. Finally, we propose a validation process of the model. This paper aims at showing how our proposals have been applied to the Chain-Branch-Leaf scheme, although they can be adapted to any rule-based clustering scheme for VANET.


Results
This section presents the main result of this work, which is a formal model of the Chain-Branch-Leaf clustering scheme allowing verifying its properties in a VANET. As for every clustering schemes, several rules are taken into account by the vehicles involved in the construction of both the clusters and the overall backbone. For example, the uniqueness of the branch node choice for each leaf node is a fundamental property of the Chain-Branch-Leaf organization 24 . However, without a formal description of the CBL scheme, such a requirement can only be envisioned, but not formally expressed nor verified. The assessment of the CBL clustering scheme implies two steps, which are requirement validation for verifying that the protocol specification fulfills the functional needs envisaged by the designer 25 , and consistency checking for ensuring that the clustering scheme does not introduce contradictions in the relations linking the nodes. Event-based modeling is particularly suitable for protocol engineering. Event-B is precisely an event-based formal method which has shown its capacity to master the system design complexity through successive refinements 15,26 . The stepwise refinement produces a correctby-construction model by formally proving the different properties introduced up to each step 26 . Regarding CBL, Event-B provides the tools necessary to perform an incremental verification by checking the properties and constraints defined at each execution step. The different execution steps are characterized by the introduced events. In order to guarantee the invariants preservation by these events, Event-B defines the concept of proof obligation. Therefore, the approach proposed in this paper should allow checking and proving the correctness of the CBL protocol as well as its requirements and properties. Then, once the CBL protocol has been verified, the proposed model guarantees that its execution does not face failure or inconsistency.
Formal description of CBL organization in a VANET. In this section, we first describe CBL functioning along with the related formal definitions. CBL is a distributed algorithm performed by each VANET node 27 , considering that the latter may not have a global knowledge of the ad hoc network. Thus, a node can only communicate directly only with its one-hop neighbors which are used as routers in order to reach the rest of the network. A complete description of CBL functioning is presented in Ref. 24 . Figure 1 shows the different elements of the CBL structure in a VANET, which can be summarized as follows: • Road configuration in any road configuration, CBL builds one backbone in each traffic direction. In our example, a 3-lane 2-way highway, CBL builds two separate backbones (Fig. 1). • A branch node is a cluster-head node elected by other nodes (branch or leaf). It is the only one allowed to retransmit the broadcast traffic to the entire network, through its downstream branch, its upstream branch or both. • A leaf node is an ordinary node which attaches itself to the closest branch node. If no branch node is detected, the leaf nodes perform a branch choice process in order to elect one of them. www.nature.com/scientificreports/ similar to that which should be obtained with a fixed infrastructure. It provides branch nodes with a path for more than one-hop communications.
In this work, the term Nodes refers to the set of all the nodes in the network. Without loss of generality, we will focus in this paper on two types of packets which are sufficient to describe CBL functioning. The hello packets are used by several VANET protocols for neighborhood discovery. The other type of packet refers to the applications traffic. The term Hello refers to the set of all the hello packets sent during the VANET scenario. We formally define VANET links using the following functions: An element n1 → n2 of the set links ( n1 → n2 ∈ links ) expresses that the node n1 has received a hello packet sent by the node n2. We formally define the local neighbors of nodes using the following function: For a given node n, the following property must be fulfilled: One of the main data required by CBL is node position. We assume that each node is aware of its position through a global positioning system such as GPS or Galileo. However, we avoid any terrestrial localization infrastructure since ad hoc networks should not rely on any infrastructure. The node's position is a two-dimensional vector, that the node transmits to its neighbors through hello packets. We formally model the nodes positions data using the following function: Let us consider a node n, the set positionTable(n) includes the position of n and those of its local neighbors. A node can be positioned in the downstream or the upstream side of any of its neighbors. We formally define the downstream/upstream neighbors of nodes using the following functions: To determine whether a node is upstream or downstream from another, we use the following operator: Let n1 and n2 be two neighboring nodes, and let p1 and p2 be their respective positions. We state the following rules (1) links ∈ P(Nodes × Nodes), (3) neighbors ∈ Nodes → P(Nodes).
When a node does not receive any hello message from a neighbor within a specific period of time, this neighbor is considered to be unavailable and it will be deleted from the neighbors table. This time interval called neighbor expiry time is defined as: In the CBL clustering scheme, a node can be either a leaf or a branch node. We formally define the node types as follows: A branch node n ( hasType(n)(n) = 1 ) is a cluster-head node elected by the other nodes in its one-hop neighborhood. It emits hello messages like every node. A leaf node n ( hasType(n)(n) = 0 ) is an ordinary node which has to connect to the closest branch node. A CBL chain is a sequence of branch nodes. We formally define this chain of branch nodes as follows: These two functions are semantically opposed. Hence, chainUP (resp. chainDO) function defines a local upstream (resp. downstream) chain of branch nodes. If a branch b2 is an upstream node of another branch b1, then b1 is a downstream node of b2. In a CBL organization, each leaf node shall elect its associated branch. We formally define this election as follows: For a given branch node n, branchChoice(n), chainUP(n) (resp. chainDO) refer to the branch choice (if the electing node is a leaf node) or the upstream (resp. downstream) branch nodes (if the electing node is a branch node) included in the neighbors table of n. A hello packet can contain different data about its sender, namely the node's position, the node's type, and according to the latter, the elected branch up/down, or the branch choice. To formally define all these data, we use the following functions: For a given hello packet h sent by a node n, helloPosition(h), helloBrChoice(h), helloChainUP(h), helloChainDO(h) and helloSrType(h) represent respectively the sets of the position, branch choice, elected branch up, elected branch down and the type of n and its neighbors. All these data are very useful in the construction of the CBL scheme. The connection time is the expected communication duration of two nodes according to their movement. More formally, we define the following function: CBL properties and rules. Now that CBL organization has been clarified, it is possible to express the expected properties, and the rules established to that end. The main requirements and resulting properties expected from CBL organization in a VANET are: • REQ 1 If a node does not have any local neighbor, it must be a leaf.
• REQ 2 The branch node choice is made according to leaf nodes only.
• REQ 3 The self-branch election is not possible.
• REQ 4 Each branch shall have at most one downstream branch neighbor.
• REQ 5 Each branch shall have at most one upstream branch neighbor.
• REQ 6 Each branch shall be elected by at least another node (branch or leaf). • REQ 7 A node having a downstream branch shall be of type branch.
• REQ 8 The electing node of an upstream branch shall be a branch-type one.
• REQ 9 If a node n1 is a downstream branch of a node n2, then n2 is an upstream branch of n1.
• REQ 10 An upstream branch of a node shall be one of its upstream neighbors.
• REQ 12 One of the neighbors of each leaf node shall become a branch.
Different constraints shall be satisfied while electing branch nodes, such as the following node election rules (NER): • NER1 The self-node election is not possible (i.e., the elected node and the electing one must be different).
• NER2 The elected node must be a neighbor of its electing node.
• NER3 The elected node must be an upstream neighbor of its electing branch node.
• NER4 A branch node must have at most one upstream neighbor branch.
• NER5 When the electing node is a leaf node which does not have any branch node neighbor, it must elect the leaf node neighbor having the maximum connection time as its branch choice. • NER6 When the electing node is a branch node which does not have any upstream branch node neighbor, it must elect the upstream neighbor node having the maximum connection time as its upstream branch node.
All nodes are initially leaf nodes. Some of the nodes can be turned into branch nodes, while others must be kept as leaf nodes. In addition, a branch can be turned into a leaf. The type changing rules (TCR) are the following: • TCR1 If a leaf node is elected by another node, it must be turned into a branch.
• TCR2 If the electing node is a branch, then it shall be added to the chain as a downstream branch of the elected one. • TCR3 The branch node which overtakes its downstream electing branch shall be turned into a leaf.

A correct-by-construction model of CBL (CCM4CBL). In this section, is presented the proposed
Event-B model of the CBL clustering scheme, which implements the aforementioned properties and rules (see "CBL properties and rules" section). Figure 2 illustrates the architecture of the resulting model. It shows three abstraction levels which will be detailed in the following subsections. The two first levels can be used for modeling any other VANET routing protocol.
Level 1: a basic routing protocol model (CBL_c0, CBL_m0). This model inspired from Ref. 18 is an Event-B formal model of a basic routing protocol. It includes the definitions of the set of nodes, that of the links, and also the events related to packet status and operations (Fig. 3). It can be extended and refined in order to model any other routing protocol. Although we rewrote the model (Fig. 4), a similar one is available in the literature 23 . This model has been implemented through the contex CBL_c0 and the machine CBL_m0 illustrated in Fig. 2.    CBL_m1). This second abstraction level of the CCM4CBL model formally defines the concepts related to VANET node dynamics and communications, notably neighborhood management, nodes positioning in the road traffic, and packet broadcasting. It consists in the CBL_m1 machine and the CBL_c1 context (see Fig. 5). In addition to the definitions in "Formal description of CBL organization in a VANET" section, the CBL_c1 context extends the initial CBL_c0 context by introducing the finite set of all the hello packets ( Hello ⊂ Packets ). Given the CBL_c1 context, the CBL_m1 machine refines CBL_m0 by introducing the variables, invariants and events modeling VANET node communications, vehicles movement in the road traffic, and neighborhood discovery and management.
Modeling VANET node communications. Communications in a VANET not only include the forwarding of application packets, but also the broadcasting of hello packets for neighborhood discovery and link detection. To formally model ad hoc communications in a VANET, in addition to the definitions proposed in "Formal description of CBL organization in a VANET" section, we introduce the following variables : • nextNodeToReceiver a variable determining the next receiver node of each sent packet (see inv2, Fig. 5). This variable is used for forwarding packets from its source to its destination. • lastGotHello: a variable determining the last hello a node received from another. We define this variable as follows:  • receptionTime A variable determining the reception time of a packet according to its destination node. We formally define this variable using an invariant as the following function: • neighboringTime a variable determining the time when a node becomes another's neighbor. We formally define this variable as the following function: These variables time, receptionTime and neighboringTime are mainly used in order to control the availability of a node's neighbors, as it will be detailed in the neighborhood management phase. The sendPacket, forwardPacket and receivePacket abstract events are also refined. Two refined versions are proposed for the packet sending event, namely sendPacket and broadcast. The first one is used for sending application packets, while the second allows broadcasting of some specific packets such as hello. As Fig. 6 shows, the abstract parameter destinations has disappeared from the packet broadcasting event, while new guards and actions have been added. In addition, a witness clause (with) is included in order to define the link between the abstract event and the refined one. This witness states that broadcast packets are able to be received by any node in the ad hoc network, with the exception of the sender. This requires a replacement of every occurrences of destinations in the action clause with Nodes \ {source}. Action act6 illustrates an example of such replacement. Guard grd3 ensures that the TTL of a Hello-type packet is equal to 1 during the broadcast process, in order to avoid its forwarding. Action act7 expresses that the hello packet sender shall broadcast its current position to all its neighbors. A new parameter T representing the local time at the destination node, and new guards are added in the refined version of the packet receiving event. The guard clause is extended with the following three predicates: where cls refers to the mathematical closure operator. For a relation R ∈ A ↔ A , cls(R) is the closure of R, which we define as follows: The actions added in the refinement of the reception event, update time, receptionTime and lastGotHello variables. In the refined packet forwarding event (forwardPacket), we introduce a new parameter nextNode (nextNode ∈ Nodes), new guards and a new action. The added guards are used to define the connection links between a packet's source/destination and the forwarding node, while the action updates the nextNodeToReceiver variable as follows: Modeling the vehicle movement. An updatePosition event is defined in the CBL_m1 machine in order to model the movement of vehicles (described as members of the set Nodes in our model) on the road. As Fig. 7 shows, this event takes the following parameters as input: • node represents a vehicle whose position shall be updated.
• XY ∈ N × N refers to the new position of the node. Modeling local neighborhood management. When a node receives a hello packet, it must update its neighbors table according to its content. To model this formally, we add an updateNeighbor event in the CBL_m1 machine (see Fig. 8). This event has six parameters, three of which (newUp, newDown and H) being automatically computed through the guards, based on the first four parameters. The first two parameters are the source (neighbor) and its position XY contained in the last hello H received from it. Parameters H (the last received hello), newUp (the updated upstream nodes) and newDown (updated downstream nodes) are merely used for simplification. The other two parameters are a destination node and its current time T. The above parameters are typed through the guards. As stated in Ref. 24 , each node periodically broadcasts hello packets to declare its availability to all its neighbors. Thus, when a node does not receive a hello packet from a neighbor within a period of time which equals to the neighbor_expiry_time , this neighbor is considered to be unavailable and is deleted from the node's  www.nature.com/scientificreports/ neighbors table. More formally, we create a dropNeighbor event in the CBL_m1 machine (see Fig. 8). As input parameters, the dropNeighbor event has a node and its unavailable neighbor. These parameters are well-defined by grd1, grd2 and grd3 guards. The grd3 guard checks the unavailability precondition of the neighboring node before applying the necessary actions. (CBL_c2, CBL_m2). The last abstraction level of our CCM4CBL model introduces the specific properties and rules of the CBL clustering scheme in a VANET. As Fig. 9 shows, this implementation level consists in a CBL_m2 machine which sees a CBL_c2 context. The CBL_c2 context extends the context of the second level ( CBL_c1 ) by introducing the concept of node connection time (Ctime). The latter is axiomatically defined according to the definition proposed in "Formal description of CBL organization in a VANET" section. Figure 9 also depicts a machine which models CBL properties and rules. This machine sees the CBL_c2 context, and refines CBL_m1 machine. The presentation of the refinements is organized in four steps, which are: (1) modeling CBL properties, (2) modeling branch nodes election, (3) refining ad hoc communications in a VANET, and (4) modeling CBL chain update.

Level 3: modeling CBL properties and rules
Modeling CBL properties. New variables are introduced in the CBL_m2 machine in order to formally model the specific properties and rules of CBL: • hasType a variable determining the type of the node (Branch or Leaf).
• branchChoice a variable determining the branch choice of the node.
• chainUP a variable determining the upstream branch node.
• chainDO a variable determining the downstream branch node.
The type of each variable is defined using a typing invariant according to the definitions introduced in "Formal description of CBL organization in a VANET" section. Given the variables, constants and sets introduced, CBL properties (req1, req2, . . . , req12) can be formally expressed as invariants in Event-B. These invariants are illustrated in Table 1.
Modeling branch nodes election. This election is a key step in the construction of the CBL structure. As stated in Ref. 24 , two election types are possible: branch choice by leaf nodes, and upstream branch election by branch nodes. To model these two operations formally, we created an electBranch event, taking three parameters as input (see Fig. 10). The first two parameters, electing and elected, refer to both the electing and elected nodes. The other parameter, V, is a binary number ( V ∈ {0, 1} ) used to simplify the verification of the NERi rules while electing branch nodes. To achieve this verification, each NERi rule is defined as a guard in the electBranch event. Each  www.nature.com/scientificreports/ one of these guards is detailed below. The first rule, NER1, states that a node (leaf or branch) cannot elect itself as a branch, while the second one (NER2) stipulates that the elected node must be a neighbor of its electing one. These two rules are defined as follows : Rule NER3 states that the elected node must be an upstream neighbor of its electing node, which shall be a branch. This rule is expressed through the following predicate: Rule NER4 is used to ensure the uniqueness of the CBL chain, in order to avoid parallel chains in the same area. Hence, it expresses that a branch node must have at most one upstream neighbor branch. We define this rule as follows: Rule NER5 states that, when there is no neighboring branch node, the electing leaf must choose the neighbor having the maximum connection time to itself. We formally define this constraint through the following predicate: Rule NER6 concerns the connection time of the elected upstream branch. This rule is applied when the electing branch node does not have any upstream branch-type neighbor. It is formally defined similarly to rule NER5 as follows: Refining VANET node communications. As stated previously, CBL is a distributed algorithm which builds a backbone of branch nodes, and clusters of leaf nodes around the latter. In order to achieve this task, CBL relies only on the information exchanged through hello packets. A hello packet contains information about its sender, such as its position, its type (branch or leaf), its branch choice when it is a leaf node, its upstream and downstream branch nodes when it is a branch node, and its direct neighbors. The following variables are added: • helloBrChoice a variable determining the branch choice of the hello sender.
• helloChainUP a variable determining the upstream branch of the hello sender.
• helloChainDO a variable determining the downstream branch of the hello sender.
• helloSrType a variable determining the type of the hello sender (branch or leaf).
These variables are defined as partial functions using invariants in the CBL_m2 machine according to their definition (see "Formal description of CBL organization in a VANET" section). The packet broadcasting event is refined in order to integrate the information contained in the hello packet. A guard related to CBL property REQ12 is added in order to enforce the leaf nodes which have neighbors, so that they can choose their branch nodes before broadcasting their hello packet: elected = electing ∧ elected ∈ neighbors(electing) ∧ electing � → 0 ∈ hasType(electing).

REQ 11
∀a, n · a ∈ Nodes ∧ a � → n ∈ chainDO(n) ∧ a ∈ up(n) ⇒n � → a ∈ chainUP(n) www.nature.com/scientificreports/ New substitution actions are also introduced in the refined version of the broadcast event, in order to update the status of the helloBrChoice, helloChainUP, helloChainDO and helloSrType variables as follows: Modeling CBL chain update. Several events trigger the update of the CBL chain locally at the VANET node level, such as the unavailability of some neighbors, or the overtaking of a branch node by its upstream branch node. In this work, we consider three events, namely: a neighbor which is no longer available, changes in the nodes' positions, and a received hello packet indicating some changes in the CBL chain (election of new branch up/down, branch choice, etc.). In order to model these events formally, we refine updatePosition, dropNeighbor and updateNeighbor abstract events, and introduce two new events, turnIntoBranch and turnIntoLeaf, which are used to change the node type according to TCR rules ("CBL properties and rules" section). The updateNeighbor variable is refined as follows: where newTypes, newChainDO, newChainUP and newBrChoice refer to the parameters introduced in the event's refinement, and are the new values of functions hasType(node), chainDO(node), chainUP(node) and branchChoice(node) after updating or adding the information (type, branch choice, and so on) about the neighbor. For example, the values of newChainUP and newChainDO parameters are computed as follows: The other parameters are computed similarly. The dropNeighbor refined event allows the local update of the CBL chain in case of unavailability of a neighbor. Compared to its abstract version, it introduces the following actions: Updating the position of a node Ni requires checking whether Ni has overtaken the node Nj that elected it as its upstream branch node. In this case, the chain link between Ni and Nj must be deleted by adding an action in the updatePosition event: where newChainUP is a new parameter referring to the updated upstream branch node, automatically computed as follows: Provided that the following conditions (expressed as a guard) are satisfied: Event turnIntoBranch allows turning a leaf node into a branch node when it has been elected by at least one of its neighbors (see grd1, Fig. 11). This type changing occurs after receiving a hello packet from a neighbor. Guard grd3 checks CBL property req6 which states that self-election is not authorized (a node cannot be elected by itself). Event turnIntoLeaf allows turning a branch node into a leaf when the latter overtakes its electing downstream branch node after updating the node's position (see Fig. 11). Guard grd1 defines a precondition of this type changing according to rule TCR 3.

Discussion and methods
The methods adopted during the validation of the CCM4CBL model rely on a two-step verification approach: validation of the model by animating it using the ProB model-checker, and proving its correctness by discharging proof obligations. neighbors(source) = ∅ ∧ source � → 0 ∈ hasType(source) ⇒ 1 ∈ hasType(source)[neighbors(source)]. www.nature.com/scientificreports/ Animation-based validation. The ProB 28 animation tool allows both the validation of the requirements, and the detection of errors in order to fix them, before starting the proof phase which can be long and complex. It cannot be performed on an abstract Event-B specification, and requires a concrete model. For that reason, we created a new CBL_m3 machine by extending the CBL_m2 machine. This machine does not introduce new events or variables. It sees a concrete CBL_c3 context, which is an extension of the CBL_c2 context. New constants and axioms are defined in this extended context for the concretization of all the sets and functions introduced in the contexts of the proposed CCM4CBL model. Figure 12 depicts the ProB animation window which is composed of three main parts. The first part (1) describes event triggering and constraint checking. The second part (2) presents the status of the model. The last part (3) allows signaling potential invariant violations and specification errors. After setting up the animation context, only the initialization event is enabled. After that, updatePosition and broadcast events are successfully activated. Once a hello packet is broadcast by a node, the receivePacket and losePacket events are enabled. Event updateNeighbor is triggered several times after executing receivePacket. Regarding the animation scenario Fig. 12 shows, we note that broadcast and turnIntoBranch events are disabled after executing both events updateNeighbor and electBranch. No node will be turned into a branch and the clustering scheme cannot proceed. This situation suggests that a given node is not able to broadcast hello  www.nature.com/scientificreports/ packets to inform the neighbors of its branch choice. It is caused by guard grd7 (see Fig. 12), which states that one of the neighbors of each leaf shall be a branch. In order to solve this issue, we modified the related guard: We used the ProB counter-examples as guides to rectify our model and trace back the specification errors, which could have caused the prover failures. The performed rectifications concern invariant violations and both guard and action alterations. We validated our model based on representative scenarios, including several cases which had not been treated previously.

Correctness of the model by discharging proof obligations.
In order to verify the model during its design process, proof obligations (POs) are discharged in a way guaranteeing that: • Model initialization leads to a state where the invariant is valid.
• When the machine is in a state where the invariant is valid, every enabled event leads to a state preserving this validity. • The concrete events can only occur in the circumstances in which the abstract events occur.
• The occurrence of any concrete event implies an occurrence of the related abstract event in such a way that the state verifies all related invariants.
POs refer to the proofs applicable to an Event-B model. Figure 13 illustrates examples of proof obligations. Those discharged are marked with , while those undischarged can be recognized thanks to the symbol . The POs which are dischared automatically also show an "A" letter. Some proof obligations need interactive discharging by users, when the automatic prover cannot perform it (Fig. 13c shows the repartition between automatically and manually dischared POs in our model). For example, using the cardinal operator implies a finite set as operand, thus making more difficult the discharging of proof obligations. As well, the universal ( ∀ ) and existential ( ∃ ) quantifiers may be sources of concerns for the instantiation of the quantified hypothesis. In the proposed model, an example of such a PO is turnIntoBranch/req5/INV, which ensures that the turnIntoBranch event preserves the req5 invariant of the CBL_m2 machine. As depicted in Fig. 13a, the sequent of such a PO is unproved due to a lack of hypothesis. For that reason, we modified the turnIntoBranch event, shown in Fig. 11, by adding the following new guard, which let to success:

Conclusion and prospective work
In this paper, we have proposed an approach using Event-B to validate the properties and rules of a VANET routing protocol. Targeting the CBL clustering scheme, a correct-by-construction model CCM4CBL is proposed for that purpose as a proof of concept. This model includes three abstraction levels. The first one is an initial specification containing the basic functions of any network routing protocol. The second level introduces specific concepts of the VANET environment such as spatial relations between the vehicles according to their positions, vehicle movement, and management of both routing tables and vehicle communications. At this step, the proposed model can be reused for modeling any VANET routing protocol. The last level formally defines the specific properties and rules of the CBL clustering scheme regarding VANET organization. The proposed model is gradually verified using proof obligation mechanisms offered by the event-B method and finally validated using the ProB animator to repair several behavioral errors. These processes are illustrated thanks to concrete examples. In our future work, we will target the coupling of this formal approach to others, such as discrete neighbors(source) = ∅ ∧ source � → 0 ∈ hasType(source) ⇒ {source} ⊳ branchChoice(source) = ∅.