High performance microservice communication technology based on modified remote procedure call

Microservice architecture is a programming method that decomposes a single application into various smaller services and then executes them. However, this approach introduces new challenges in communication between services because of the different data structures and technology types among the multiple services. Therefore, interprocess communication (IPC) between services has become one of the important challenges facing microservice architecture. Additionally, the choice of IPC technology is an important decision that can affect the nonfunctional requirements of the entire architecture. To address this problem, this study proposes a microservice communication technology based on remote procedure calls (RPC) called RPCX to improve the communication performance between services. The RPCX communication mechanism based on RPC uses the nonblocking IO communication model and Protobuf data serialization standard method. It identifies RPC communication at the client and server ends using dynamic proxy and annotation configuration technology. We use RPCX and two traditional service communication technologies to conduct performance stress benchmarking and evaluate the performance of RPCX through the time consumed to process the requests and transactions per second (TPS) performance stress indicators. The results show that the performance of RPCX is better than that of the other two technologies under different threads and requests. In this study, we show that RPCX has overall better performance than the other two service communication techniques under different threads and requests.


Related work
The performances of various IPC communication technologies and their impact on the overall performance of microservices were compared and analyzed in the extant literature. For example, Kumar et al. 2 , Shafabakhsh et al. 17 , and Hong et al. 18 discussed and compared the performance indicators of various communication technologies, such as Google Remote Procedure Call (gRPC), Thrift, REST, and RabbitMQ, and proposed the best application scenarios for them. Gan and Delimitrou 19 established a microservice system for streaming media services to evaluate various indicators in microservices, including the performance impact of RPC communication between microservices on the entire system. Georgiou and Spinellis 20 discussed the energy consumption of various IPC communication technologies under different programming languages.
Gan et al. 21 , Sriraman and Wenisch 22 , Ueda et al. 23 , and others focused on the impact of communication technology on the performance of microservice architectures. Gan et al. 21 also analyzed the time taken to process a communication request with respect to the time taken by the entire application. Sriraman and Wenisch 22 developed a suite of microservices to analyze the influence of the operating system and communication requests on the overall latency of microservices. Ueda et al. 23 tested the same on a monolithic application and a microservice architecture, respectively, and concluded that the optimization of communication between services can improve the overall performance.
To summarize, the performance of communication technology contributes to the overall performance of the microservice architecture. Therefore, we proposed an efficient microservice communication technology called RPCX, which uses the nonblocking IO network model and the Protobuf data transmission format as the underlying communication mechanism and uses dynamic proxy technology and annotation configuration rules to allow developers to call methods locally. The purpose of the remote method is to use the buffer pool technology to input time-consuming operations in the buffer pool for the program to read data at high speed during the running phase. It will play a positive role in promoting the development of the microservice IPC communication technology field.

RPCX
First, the structure of the high-performance remote communication technology (RPCX) is described. Next, the four key components of the technology, namely dynamic proxy, annotation configuration rules, network communication model, and transmitted data format, are described.
Overall structure of RPCX technology. Our objective for RPCX is to enable local applications to call services at remote servers with more ease and efficiency. Accordingly, we describe the four components mentioned in the previous paragraph: dynamic proxy, annotation configuration rules, network communication model, and transmitted data format.
As illustrated in Fig. 1, RPCX is based on the principle of RPC remote communication. It is divided into two parts: client and server processes. In the client part, the remote proxy object is implemented first. RPCX used dynamic proxy technology to proxy the remote proxy service as a local service. The information required to locally call remote services, such as server IP information, service name, parameter names, and values, was locally annotated to disguise the service name and other information, obtain service positioning, and enable developers to implement it. Corresponding information could be directly obtained via the provided annotation class. The communication model was implemented using the nonblocking IO (NIO) communication model. The communication data were implemented using Protobuf. The required information was encoded into binary data, which were transmitted to the server through the network communication model service for decoding. The binary result data returned by the server were locally decoded and returned to the "method handler" for processing.
The server part was divided into service provider and service center. The service provider was used to provide the service interface definition and service implementation method. On the one hand, the service center was responsible for publishing the local service of the server as a remote service to provide services to the client. On the other hand, it managed remote services and could perform operations such as start, stop, and obtain the port number for the remote services. Developers could disguise service names through annotations and www.nature.com/scientificreports/ then publish the disguised service names for unified management at the service center. Once the binary data were received through the NIO network service on the server, the data were decoded to obtain the information required by the service. After the service was executed, the result was encoded, and the binary data of the result were returned to the client.
Key components. Dynamic proxy. The purpose of a dynamic proxy is to call methods remotely on the server side, similar to methods on the client side, so that users can call remote methods locally without being aware of it. Therefore, while calling the local method, the user only requires the information of the remote server, namely the server address, port number, name, and the name of the server method. RPCX uses Java SE Development Kit (JDK) Proxy to create a proxy object, access the target server, and then transmit the remote method name, local method parameter value, and parameter type to the server. Upon receiving the method execution result from the server, it will return the value of the method, obtaining the effect of dynamically calling the local method to obtain the remote server method's return value during the runtime of the program (Fig. 2).
Annotation configuration rules. To improve user friendliness and secure the application programming interface (API) exposed to the server, RPCX uses an annotation configuration rule to disguise the server information and configure it on the client side. For example, the @RPCXClient annotation class is used for the called local interface class. This annotation class has three attributes, RemoteServerName, RemoteServerDomain, and www.nature.com/scientificreports/ RemoteServerAddress, which represent the remote server service name, remote server port number, and remote server address, respectively. Through this annotation, the remote server information to be called can be directly marked on the local interface class, reducing the steps of feeding relevant information into the configuration file.
In the local interface method of the client, @Method can be used to annotate the class to mark the name of the remote server method to be accessed on the local method. This enables the remote server method to be called locally. When using the annotation class, the remote server can annotate the service and method names that it requires to publish through the @PRCXServer and @Method annotation classes. The published name might not be the same as the actual name; hence, RPCX takes the annotated name as the standard. Thus, the actual name is disguised, and the actual information is masked, enhancing the security of server-side information.
For better performance, in the initialization phase, RPCX recursively traverses all the annotation classes in this project and converts them into a unified structure. It can directly use this structure to obtain the local and remote server information during runtime (Fig. 3).
Network communication model. The network communication model of RPCX adopted Netty's asynchronous NIO design (Fig. 4). For this, we created NioEventLoop threads on the client and server sides, configured each component through Bootstrap and ServerBootstrap to start and guide, and executed data I/O operations through the channel pipeline.
The client transfers data to the server in binary format. Upon receiving the data, the server decodes and serializes it and then returns the result to the client in the same manner. However, because of the multithreaded asynchronous nature of Netty's I/O operations, when the server returns the result, the identity of the thread that transferred the result is not revealed to the client. To solve this problem, when the client transfers data, the unique identifier of the request was added, and a key-value pair, with the key as the request identifier and the value as the RPCXFuture structure, was established in the client cache pool. RPCXFuture was used to save the serialized data returned by the server. This result would be stored in the key-value pair of the client's cache pool with the request ID as the key, so that the client could receive a unique result corresponding to the request when calling.   (1) is the unique identifier of the request. As described in the network communication model, the identifier is used to indicate the uniqueness of the result of the transmission request when sending and receiving multithreaded messages. This identifier is a random long number.
(2) and (3) are the service and method names sent to the server after information disguise, respectively. (4) Is a group of method array message bodies containing (5) method parameter types and (6) method parameter values. The RPCXReply message body is the data structure of the method returned by the server upon receiving the RPCXRequest request. The server adds the unique identifier (7) in the request and the result data (8) returned by the method into the RPCXReply and subsequently relays it to the client in the form of binary data. After the client receives the result returned by the method, the client structures data according to the return type of the method, which forms a complete data path for locally calling the remote method.

Experiment
Here, we compared the proposed RPCX with two service communication technologies in terms of stress performance, collated the experimental results, and interpreted them.
Experimental environment. Experimental platform. The experimental platform consists of two cloud servers with identical configurations. The specifications of each cloud server are as follows: 1 vCPU, 2 GB of memory, 40 GB of cloud storage, and 1 Mbps of bandwidth. The cloud servers are equipped with the CentOS 8.2 64-bit operating system, and the JDK 1.8 runtime environment has been installed.
Service communication technology. The communication technologies for the comparison were selected from REST and gRPC. REST is a stateless architectural style in distributed systems widely used to provide globally accessible APIs. In microservices, developers commonly use Spring's OpenFeign, a REST technology 20 . Open-Feign is a declarative WebService client with the core function of providing simple and efficient RPC calls for the REST in the form of an HTTP method. gRPC is a cross-platform, open-source, and high-performance RPC framework developed by Google. It uses Protocol buffers 3 and http/2 to boost its speed and interoperability between services. We selected gRPC as several companies using microservices (e.g., Netflix, Cisco, Coreos) are adopting it in their production lifecycle 20,24 .
Experimental architecture. In this experiment, we have set up two cloud servers with identical configurations: cloud server A and cloud server B. Cloud server A serves as the client-side server for RPCX, gRPC, and Open-Feign and is used to send information, whereas cloud server B serves as the server-side server and is used to receive information sent from cloud server A and return results. The performance testing experiment platform architecture for these three technologies is shown in Fig. 6.
To ensure the fairness of the experiment, the data transmitted by RPCX, gRPC, and OpenFeign in the experiment had the same strings. Both gRPC and OpenFeign used the example method given on the official website 24,25 .
The key software versions used in the experiments are presented in the following table (Table 1).

Performance experiment method.
To evaluate the performance of the RPCX communication technology, we conducted a performance stress benchmark test by comparing it with gRPC and OpenFeign technologies. The stress benchmark test is a method of evaluating the performance of related technologies by simulating multi-threaded and multi-request scenarios to test program runtime and transactions per second (TPS). To test    www.nature.com/scientificreports/ these two indicators, we developed a client-side testing program. The pseudocode for the testing program is as follows: The pseudocode for the client-side testing program www.nature.com/scientificreports/ test the performance of each communication technology in a fair and accurate manner and to avoid the impact of exceptional conditions, such as server or communication failures, on the experimental data of a single run, which may affect the overall accuracy and reliability of the experimental results. The test program takes thread count and request count as input variables. In Eqs. (1) and (2), we define the thresholds for thread count and request count, where α and β are integers.
As shown in line 19 of the testing program, TPS is an important metric for measuring the processing capability of a system in stress performance testing. In this experiment, it can be calculated from the count of requests and program runtime as follows: In Eq. (3), the program runtime (expressed as time) is in milliseconds; hence, when calculating the TPS, we divided the request by 1000 to convert the value to seconds.
The performance stress tests of the three communication technologies simulate data communication between the client-side and the standardized server-side program under different thread and request counts. Therefore, the server-side testing program serves to start and interact with the client-side for data communication. As a result, the server-side testing programs for gRPC and OpenFeign are examples of the official startup of the server-side program. In contrast, the server-side testing program for RPCX has the structure shown in Section "Overall structure of RPCX technology", with pseudocode as follows: Pseudocode of Server-side Test Program. To facilitate the performance experiments on cloud servers, we packaged the testing To facilitate the performance experiments on cloud servers, we packaged the testing programs of RPCX, gRPC, and OpenFeign into component packages with a file extension of .jar, which we refer to as jar packages. The jar packages of the three communication technologies are divided into client-side and server-side, and uploaded to cloud servers A and B, respectively, with irrelevant threads closed to maximize the utilization of server resources for testing program execution.
To maximize the stress performance of RPCX, gRPC, and OpenFeign communication technologies on cloud servers, a technical performance testing plan was developed that covers the full range of the two input parameters of the testing program: thread count and request count. That is, the thread count and request count start at 10 and continue to accumulate until their values are large enough to exhaust server resources, resulting in an infinitely prolonged communication time that makes communication impossible. Based on this strategy, the threshold of α and β in Eqs. (1) and (2) is from 10 to + ∞. Under the same thread count, the request count will run from 10 to + ∞ once in a loop, with both the thread count and request count increasing by 10 each time. To implement this strategy, we wrote 115 lines of shell commands to loop and run the testing program on the cloud server, with pseudocode as follows: Pseudocode for running a test program in the shell; www.nature.com/scientificreports/ In our experiments, we found that when server resources are exhausted and communication cannot be established, the runtime of a single-cycle program is generally not more than 10 min. Additionally, communication exceeding 10 min is often meaningless in practical applications. Therefore, we set the waiting time for program execution on line 6 of the test shell to 600 s. We consider communication to have failed if the single-cycle communication time exceeds 10 min, and the communication result cannot be waited for.
Experimental data analysis. During the experiment, to save server resources used for data storage, the performance stress test results of the three communication technologies were stored in real-time on the server in the form of files with thread-request count units, totaling 186,860 data. After the experiment, we needed to store the experimental data in a database for further data analysis.
First, we established an original data model and developed a data cleaning program. The original data model is shown in Fig. 7a, where (1) "id" represents the primary key of the database table with an integer data type that www.nature.com/scientificreports/ automatically increases with the increase in data volume; (2) "type" represents the communication technology with an integer data type, where type = 1 represents RPCX, 2 represents gRPC, and 3 represents OpenFeign; (3) "threadNum" represents the count of threads; (4) "requestNum" represents the count of requests; (5) "timeTotal" represents the total running time of the test program; (6) "TPS" represents the transactions processed per second;  www.nature.com/scientificreports/ and (7) "order" represents the order in which the test program runs 10 times for each thread and request count combination. Therefore, the threshold of "order" is 0 ≤ order ≤ 9 . The method of the original data model is to get the value and set the value for each attribute. The original data cleaning program aims to traverse all the original data files, read the data from each file one by one, transform the data into the original data model, and map the original data model to the corresponding database fields. Using the original data model, the program inserts the original data into the database, completing the task of reading and storing the original data from the files in the database.
Through analysis of the original data model, it was found that under the same thread count and request count, each technology performed 10 rounds of operation. In each round, the first run time was significantly higher than the other 9 run times, on average 20 times higher. Taking the RPCX technology as an example, Table 2 shows a part of the original data, with field meanings as described above. Table 2 presents the run time of two rounds, each consisting of 10 runs, for thread count 10 and request counts 10 and 20, respectively. Order 0 corresponds to the first run time of each round. From Table 2, it can be seen that the first run times for request counts 10 and 20 are 944 and 947 ms, respectively, which are 20 times higher than the other nine program run times. Analysis showed that when the test program starts each round of testing, it needs to load various additional component packages, which leads to an extended first run time. Therefore, we believe that the first run time of each round of testing is not of reference value for the purpose of detecting the time performance of communication technologies in this experiment. Therefore, before analyzing the test results data, we need to clean the original result data, remove the first test program runtime data, and calculate the average of the remaining nine run times as the final experimental result of this round. www.nature.com/scientificreports/ According to the data analysis strategy we have formulated, after completing the data cleaning, we established a cleaned data model, as shown in Fig. 7b, to store the cleaned data in the database. Items (8)- (11) in the cleaned data model have the same meaning as items (1)-(4) in the original data model shown in Fig. 7a. Item (12) in Fig. 7b, "timeTotalAvg", represents the average value of program runtime, with a double-precision floating point data type; and item (13), "tpsAvg", represents the average TPS, with a double-precision floating point data type. The methods for the cleaned data model are used to get and set the values for each attribute. By using the cleaned data model, we can map the cleaned data model to the new table fields in the database and store the data in the database, thus completing the data storage work after cleaning.
Due to the large volume of data, to present the performance of data processing for the three communication technologies more clearly, we sequentially expanded the data processing thread counts for the three technologies from low to high, starting from 10 threads and increasing by 100 threads, until reaching 990 threads. For each thread count, data processing for each technology starts with 10 requests and increases by 200 requests until reaching 900 requests to evaluate its performance. Figure 8 shows the performance of RPCX communication technology in terms of average program runtime and TPS, from a low thread count of 10 to a high thread count of 990. As inferred from Eqs. (1) and (2), we set the threshold values of thread and request counts, α and β, from 10 to + ∞. However, experimental results revealed that RPCX cannot demonstrate the difference in average program runtime in milliseconds when α < 10, which has no experimental significance. On the other hand, when β > 1000, the consumption of cloud server resources by RPCX communication reaches the limit, resulting in either a long execution time or the stopping of RPCX. Therefore, Fig. 8a displays the bar chart of the average program runtime of RPCX from a low thread count of 10 to a high thread count of 990. It can be observed that as the thread count increases, the program runtime of www.nature.com/scientificreports/ RPCX also increases, and within the same thread count, the execution time of RPCX increases as the request count increases. This distribution trend conforms to the objective law that the running time required by communication technology increases as the testing pressure increases. Figure 8b shows the bar chart of the average TPS of RPCX from a low thread count of 10 to a high thread count of 990. As indicated by Eq. (3), when the request volume is constant, the higher the execution speed, the larger the TPS. From the figure, it can be seen that from a low thread count with a short execution time to a high thread count with a long execution time, the TPS decreases as the request volume increases. This fully conforms to the rule that communication technology has a high TPS under low response times. Figure 9a shows the bar chart of the average program running time of gRPC communication technology, starting from a low thread count of 10 with an increment of 100 to a high thread count of 990, and a request  www.nature.com/scientificreports/ count of 10 with an increment of 200-500. According to the experimental results, the thread count threshold α = 10 and β = 990 is determined by Eq. (1), and the request count threshold α = 10 and β = 500 is determined by Eq. (2). When conducting the performance test of gRPC communication technology, the program cannot show the running time when the thread count is greater than 300 and the request count is above 500. Therefore, to display the results more clearly, the request count in the running time bar chart of gRPC is uniformly set from 10 with an increment of 200-500. Figure 9b shows the bar chart of the average TPS with a thread count ranging from 10 to a high thread count of 990 and a request count ranging from 10 to 500. Similar to RPCX, gRPC also follows the objective rule that the program running time increases and TPS decreases gradually as the performance pressure increases during the performance test. Figure 10a shows the bar chart of the average running time of programs using OpenFeign communication technology, with a low thread count of 10 and an increment of 100 up to a high thread count of 990, and a request count of 10 with an increment of 200 up to 900. The experimental results show that the thread count threshold of Eq. (1) is α = 10 and β = 990, and the request count threshold of Eq. (2) is α = 10 and β = 900. Figure 10b shows the TPS situation with thread counts ranging from 10 to 990 and request counts ranging from 10 to 900. Similar to RPCX and gRPC, OpenFeign follows the objective law that the program running time increases gradually and TPS decreases as the performance pressure continuously increases in performance stress tests.
According to the experimental data shown in Figs. 8, 9 and 10, we have compiled a comparison of the average program runtime and average TPS for the three communication technologies in Tables 3 and 4, respectively. Table 3 presents the comparison of the average program runtime for the three communication technologies from 10 threads with request counts of 100, 300, and 500, respectively, up to 990 threads. RPCX employs a caching mechanism to store target servers, remote services, and other information locally during program initialization, as described in the annotation configuration rules of the RPCX section. To further enhance the performance of RPCX, time-consuming operations such as traversing and parsing annotation classes are performed during program initialization, and the information of annotation classes is stored in the local cache pool for rapid data access during program execution. RPCX uses the non-blocking IO Netty network model and the binary data model protobuf for data transmission in the network communication model, and asynchronously transmits requests and response results, which are stored in the local cache pool in key-value format for local asynchronous calls. These design approaches greatly improve the performance of RPCX. As shown in Table 3, RPCX outperforms gRPC and OpenFeign by 55.9-88.9% in terms of time performance from low threads to high threads. Correspondingly, Table 4 presents the comparison of TPS for the three communication technologies from 10 threads with request counts of 100, 300, and 500, respectively, up to 990 threads, and RPCX outperforms gRPC and OpenFeign by 126.9-802.8% in terms of TPS from low threads to high threads.

Conclusions
We designed a new microservice service communication technology called RPCX and compared it with gRPC and OpenFeign in terms of stress performance. According to the results, RPCX exhibits good service communication time and TPS performance. The novel method proposed in this study can improve the performance of communication technology in the field of service communication in the microservice architecture and can help future researchers further improve the communication performance, ease of use of microservices and promote development in the field of microservices IPC technology. In the future, experiments should be extended to multiple cloud hosts and across hosts, and more complex experimental plans should be developed. www.nature.com/scientificreports/

Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.