A new advance on dimensional-aware scalar, vector and matrix operations in C++

We review the dimensional check problem of the high-level programming languages, discuss the existing solutions, and come up with a new solution suited for scientific and engineering computations. Then, we introduce Univec, our C++ library designed to make scalar, vector, and matrix operations using units of measurement. Moreover, Univec supports dimensional-aware operations for complex numbers, quaternions, octonions, and sedenions. We provide tables of the relevant functions and operators implemented. Our library was compared with several existing solutions, and the results are shown in the performance section. Finally, we present our future plans for improving the current implementation.


Introduction
Dimensional analysis is one of the most powerful tools available in physics to verify the correctness of mathematical formulas describing physical processes.However, this tool alone is not able to spot all possible mistakes.It allows only a fast check, highlights faulty operations, and provides a consistent examination of the whole process.
Nowadays, complex simulations and derivations are performed using computing algorithms taking advantage of high-speed calculation and numerical correctness due to the advances in computing technology.However, when using complex algorithms spanning thousands of source code lines divided into tens of files, we tend not to automatically trust the results or, much worse, accept them blindly.Moreover, because there is no bug-free software implementation, we have to mitigate the impact of possible errors using countless checks in code and proper testing procedures, which are often as time-consuming as the development itself.
Therefore, an appropriate approach is to use development techniques that could help reduce the overall cost of software development 1 .Unfortunately, due to historical reasons, most high-level programming languages lack a dimensional check as a language feature; they work on floats and integers, leaving to the programmer's judgement the responsibility to keep track of their semantic meaning, leading to many potential mistakes.
Several libraries provide dimensional analysis check at compile time or at run-time.However, as shown by Mc.Keever 2 , there is an evident reticence to adopt such solutions due to the following reasons: • Difficulty in using unfamiliar and complex libraries • Fear of adding external dependencies • Concerns about the performance impact • Possible limitations on using external libraries or complex data structures such as matrices and vectors.
In section 2, we review the existing solutions available in the most common programming languages, we describe our proposed solution and present concrete examples in section 3, we discuss the testing approach in section 4.1 and present a performance analysis in section 4.2.Limitations of the current development and plans for improvements are discussed in section 5 and conclusions are drawn in section 6.

Existing solutions
A comprehensive review of several existing solutions for dimensional analysis in programming languages can be found in Preussner's work 3 .Apple's Swift 4 and F# 5 are two widespread programming languages that provide native support for UoM-aware (units of measure) quantities.Promoting UoM analysis to a language feature has the benefit of immediate access to meaningful error messages.Unfortunately, only a few programming languages offer native support for this important topic.Instead, most languages have an external library that relies on generic programming techniques.
The most popular solution for C++ is Boost.Units 6 , which makes possible the UoM validation at compile time.Its main advantage is that it is included in the Boost framework, a comprehensive set of libraries that are used as a launchpad for features that are later included in the language's standard library.Unfortunately, Boost is quite often seen as a heavy dependency.For the FORTRAN language there is the PHYSUNITS module 7 which provides support for dimensional-aware routine creation.While this solution is versatile, allowing working with both F77 and F90 code, because the dimension information is carried alongside numerical values, PHYSUNITS introduces a run-time execution overhead in terms of speed and memory usage.Dimensional analysis is not limited to compiled languages, Python supports run-time dimensional analysis checks thanks to libraries like Pint 8 .
Almost all existing libraries do not provide native support for vectorial computation.However, for vectorial operations in high-energy physics there are two solutions: CLHEP 9 and Eigen 10 .They provide representations for both 2D and 3D vectorial quantities and n-dimensional matrices, but none of them has built-in support for dimensional-aware units, relying only on primitive numeric types available in C++.
We have to mention that our approach is not the first attempt to solve this specific issue.For example, the CORSIKA 8 simulation tool 11 , a C++ rewrite of CORSIKA 12 , uses a similar solution to address the same problem of combining unit systems with vector arithmetic.The approach adopted there was to combine Eigen v3 10 with the PhysUnits library 13 , allowing them to perform dimensional-aware vector arithmetic without any performance penalty 11 .

Our Implementation
To improve the accuracy and lower the run-time overhead of the scientific and engineering computations, we developed Univec, our C++ library that allows scalar, vector, matrix, complex, quaternion, octonions, and sedenion operations integrating the Boost.Units 6 library features.
A summary of the functions and aliases available for vector and matrix types can be found in tables 1 to 7, while the complete source code, released under LGPLv3 license, can be found at the address https://gitlab.com/micrenda/univecAll the methods implemented return a copy of the original vector.However, for methods for which the result is compatible with the calling class type, we created an in-place version of the method, which modifies the object instance.These methods start with the prefix do (e.g.doConj()) to be easily differentiated from the copy version of the same method.These methods are usually faster than those that copy data because they do not allocate additional memory for their operations.
Our implementation heavily uses generic template programming, shifting the cost of the dimensional check at compiling time and removing, in principle, any run-time overhead.However, the drawback is an increased usage complexity, which may discourage non-experienced C++ developers.We tried to mitigate this issue by providing custom descriptive error messages and a set of default quantity aliases covering the most common quantities used in physics.This approach allows easy-to-remember type names when defining a variable, such as: dfpe::QtySiVelocity my_var; which is a much shorter name than its canonical name: boost::units::quantity<boost::units::si::velocity> my_var; We decided to keep the set of aliases in SI, GCS and Gauss units, isolated from the main library development and were published as a separated project, at the address https://gitlab.com/micrenda/qtydefDuring the design, we realized that complex numbers, quaternions, octonions, and sedenions methods were a super-set of the methods available in Cartesian −dimensional vectors (see table 2).Therefore, we used the same classes for these entities, marking complex quantities as vectors with a negative dimensional number.In this way, we can represent complex numbers as VectorC<-2>, quaternions as VectorC<-4>, etc., enabling and disabling some methods using the  template parameter.However, for better readability, a set of alias for common entities is provided, as shown in table 5.This solution allowed a high degree of code reuse and painless conversion between the entities without requiring inheritance and virtual method resolutions at runtime.

Matrix operations
During the development of the vectorial operations, we realized that matrix operations would be needed soon, which shall also benefit from the same UoM validation.The main difficulty was that not all operations were available for any matrices.For example, an operation like the determinant of a matrix is defined only for a square matrix.For this reason, we made heavy use of the C++ concept requires.Although this severely limits its usage only within projects that use at least C++20, it allows a clear definition of the function usage constraints, and produces meaningful error messages when these conditions are not met.
As a design decision, matrix dimensions can be set using integer templates, as shown in table 5: this means that matrix dimension can not be changed at runtime, allowing non-negligible performance optimizations by the compiler.As we can observe in tables 6 to 8, we decided to limit the method's implementations to the one which have a deterministic and non-iterative solution, omitting methods requiring matrices decomposition.In a future release, we may add support for this class of methods, allowing the calculation of eigenvalues, eigenvectors, matrix  2 norms and similar quantities.

Floating point limitations and tolerances
Representation of floating-point number is a tricky point in any software package that handle non-integer quantities.The IEEE Standard for Floating-Point Arithmetic (IEEE 754) 14 , is the de facto standard in modern computers, and C++ supports it using the float and double variable types.This format allows storing any number inside a fixed size mantissa and exponent, allowing the representation of up to 1 × 10 308 or down to 1 × 10 −308 , with around 15-digits precision.
One first issue is that some rational numbers that are perfectly representable on a base-10 numeric system can not be expressed on a base-2 numeric system and inevitably introduce a rounding error.
Another issue is that trigonometric and transcendental functions in C++ are calculated using a series sum and are truncated after a specific precision is achieved.This is usually not a problem in the engineering and scientific fields because we are interested in the approximate values of our calculation, given that we do not fall into known floating-point pitfalls.However, our library is quite sensitive about this issue because often, we have to check if two vectors are parallel or perpendicular, or if a matrix is diagonal: these actions do require to check the result against the exact values, which is tricky when working with floating-point quantities.
To manage this last issue, we introduced the concept of tolerance, represented by the class Tolerance.This class can be passed as a template parameter to any function that requires comparing two values for equality (usually, but not always, the methods starting with the "is" prefix).We provide a default implementation which compares two values as equals if they equal to the 10th decimal digit.While this approach is not quite robust, it has the advantage of being relatively fast and easy to understand.If a different approach is required, the user can provide a custom class for all the methods or just for some specific calls.

Frame transformations
While we were using Univec we found that some formulas become much simpler when we use specific coordinates transformation.One good example comes from the simulation of microscopic electron-molecule non-relativistic interactions in the center-of-momentum reference frame.A transformation-matrix can represent any linear transformation: complex transformations can be easily represented and computed by chaining multiple simpler transformations such as translations, rotations, and scaling operations.
One can make a frame transformation using a concrete implementation of the abstract class BaseFrameC3D (in table 4, we present ready-to-use implementations).
A vector can be easily transformed into a new reference frame:

3/18
Operation Method Note Ratio between parallel vectors v.rotate(a,ax) Rotate around z-axis (2) or an arbitrary axis (3) Are parallel and have same directions v ↑↓ u v.isOppositeDirection(u) Are parallel and have opposite directions

Has any infinite component
Table 1.List of functions implemented for each type of vector.Cross product is only implemented for 3 vectors, while the angle is implemented only for 2 and 3 vectors.The * marks the disponibility of an extra operation, which starts with do prefix (e.g.doVersor), that works inplace on the vector.

Vector conversion
Another valuable feature of Univec is changing the coordinate system of a vector.The conversion from a type to another is made via a constructor, which is explicit when the conversion is computationally expensive (e.g.VectorP2D to VectorC<2>), and implicit otherwise (e.g.VectorC<4> to Quaternion).These methods can be useful for executing operations on vectors represented in different coordinate systems.For example, for converting a VectorP2D into a VectorC2D, we can use the following code: and for complex and quaternions, we have: // 4x4 matrix Matrix<4,4,QtySiLength> m2 = q.matrix();

Usage examples
In this subsection, we will present two realistic examples of using Univec.For the first example we will show a simple implementation of the Bethe-Bloch equation, a handy formula for evaluating the energy deposit of ionizing charged particles.
For the second example we will show how our library can be used in elementary particle physics, evaluating the decay rate of the  − boson.It is important to notice that we perform the calculation using SI units and not the natural units that are normally used.This is because there is currently no support for natural units in Boost::Units, but we plan to address this issue in the near future (see section 5).All the examples presented in this article are available in a public repository 15 .

Bethe-Bloch
The well-known Bethe-Bloch formula [16][17][18] is used to calculate the mean energy loss of a charged particle.For example, if we have a particle with velocity , charge  (in a unit of elementary charges), traveling a medium with electron number density  and mean excitation energy , we can calculate the mean energy-loss using the formula: where   is the electron rest mass,  is the speed of light and  = / as well as  = 1/ √︁ 1 −  2 are the Lorentz factors.Using Univec, we can encode this formula into a compact expression, enjoying a fully dimensional analysis validation: The previous example was helpful for presenting the dimensional analysis validation, but the equation itself only contained scalar operation.We will show here how vector, spinor, and matrix operations can be combined in a compact syntax to perform advanced calculations.In this example we reproduce a derivation from [19, ch.15] which describes the properties of the  and  bosons.To calculate the decay rate of a  − boson into an electron,  − , and an electronic anti-neutrino,   (shown in fig.1), Figure 1.Lowest-order Feynman diagram for  − boson decay into an electron and an anti-neutrino 19 .
we calculate the decay matrix elements using where   are the -matrices in Dirac-Pauli representation and    represents the three possible polarization states and  (  4 ), ū(  3 ) are, respectively, the adjoint particle spinor and antiparticle spinor, defined as with  = √  + .Finally, having M   , we can calculate the average decay rate using and access the provided CI/CD feature.For each commit or push GitLab triggers the build of code, runs all unit tests implemented with GoogleTest and extracts the annotated C++ files to generate the online Univec documentation using Doxygen 25 .Furthermore, at the end of each test procedure on the pipeline, several report files are generated: two files with the test results (one for each compiler: junit_gcc.xmland junit_clang.xml)and one file that briefly presents code coverage results for the tests run (coverage_gcc.xml).
This combined approach, CI/CD and unit tests, assures us of the Univec's functionality and maintainability.

Performance study
Performance comparison against existing solutions is a crucial aspect in the introduction of a new library.We compare Univec methods against four different approaches of performing the same operations: • Raw: using only primitive C++ types such as double.
When the operation is too complex to implement (e.g.determinant of  ×  matrix, where  > 4 for Raw and Semi approaches) or unavailable in a given library (e.g.cofactors of a 3 × 3 matrix for BLAS approach), the result is omitted.We compare a subset of operations, as shown in figs. 2 and 3.The procedure of performance testing is performed using this scheme: 1. Input data are generated for Raw operation, using  • 10 000 random values between −1000 to 1000, with  being the number of operands, and then stored in contiguous memory locations.
2. Input data from Raw are copied to Semi, Eigen, Blas, and Univec.We use meter [m] unit for solutions with UoM validation.
3. The timer is started.
4. The operation is performed 10 000 times, without multi-threading, and results are stored in a contiguous memory location.
5. The timer is stopped, and the time spent for a single operation is calculated dividing the total time by 10 000.
6.The results for each approach are compared for correctness against the result of Raw implementation.
7. All these steps are repeated 1000 times in order to calculate the mean value and the stdev.
The source code for the performance analysis is available at the following locations: https://gitlab.com/edystan/univec-performance-testhttps://gitlab.com/edystan/univec-scaling-testThe first repository contains the performance tests for commonly used methods for both Matrix and VectorC classes, results being displayed in figs. 2 and 3.In contrast, the second one contains the performance analysis for only two methods (vector dot product and matrix determinant) but for vector dimensions between 2 and 10 and matrices sizes between 2 × 2 and 10 × 10.For each of the two performance projects, the test results are collected and saved in .csvfiles, one for each operating system, and available at the previously mentioned addresses.As well, results for these tests are shown in figs.4 and 5.
The performance validation was performed on two laptops to simulate real scenarios in two operating systems.The configurations are the following:

13/18
We can conclude that Univec performs in line with the other solutions, without any significant overhead, but with the addition of compile-time validation and a reasonable interface.In the performance analysis, we can see that Blas often shows poor performance, but we assume two factors cause this: 1.The overhead of calling a Fortran function from C++ code (when applicable).
2. The fact that we perform a BLAS call for each vector: we realize that BLAS can be used for some operations, but not all, in a SIMD (single instruction multiple data) fashion, which could mitigate the function calling overhead.
In addition to this, we noticed an anomaly on the calculation of dot product for 8 vectors: as shown in fig.4(a), we can see that we have a performance hit on Linux/GCC platform, which seems to only appear at this specific dimension.We do not have a definitive explanation for this phenomenon, however we suspect that is a bug on the GCC optimization algorithm, because the same issue is not present in macOS/Clang environment, as show in fig.4(b).
Another anomaly we found in fig.5(b), is that, for macOS, the timing of determinant do not scale well with the increasing of dimensionality.We believe that the causes of this behavior are: 1.The use the Gaussian elimination method for matrix bigger than 4 × 4, which is not the most efficient method available (BLAS uses the more efficient  decomposition).
2. Some different optimization settings applied by the Apple Clang compiler.
We plan to analyze and fix these optimization issues in the near future, in order to have performance in line with other implementations.

Limitations and future plans
Our strongest limitation is that, currently, our solution only works with Clang of version higher or equal to 14.This is caused by our decision to use the latest C++20 features like requires and constexpr / consteval.Compatibility with GCC is limited at this moment because we use the __declspec(property) attribute to give access to the vector components.Univec was not tested under the MSVC or icc compiler, however we do not expect blocking issues as long as C++20 is supported.
The use of Boost.Units as an underlying library for Univec has several non-negligible weak points that, unfortunately, seem to have no solution in the near future.The first limitation is that this library is available only as part of a more comprehensive framework, Boost, which many software developers often see as a heavy dependency.The second limitation is that the current implementation is based on heavy use of Boost Metaprogramming Library (MPL) and template meta-programming techniques, which produce error messages challenging to understand for the non-experienced software developers.
In section 3.6.2,we present an example of an elementary particle physics calculus implementation.However, the use of SI units is not encouraged and is not widespread in this field.We plan in a near future to add support to Boost::Units 6 for natural units, allowing more easy adoption of our solution by the HEP community.
There is a proposal to introduce the support for UoM-aware units directly in the standard language (see Mateusz Pusz's talk at CppCon 2019 31 ).This approach will solve the problems presented above and will provide a more reliable solution with a less steep learning curve.We plan to create another implementation which uses a standard library implementation when it will become available.However, it will take several years until such implementation will find its way into the C++ standard library.Last, but not least, we plan to introduce in the near future a SIMD optimization (e.g.SSE 2 or the newer AVX-512), which can increase the performance by an order of magnitude when working with vectors or matrices.

Conclusions
In this paper, we presented Univec, our solution for UoM-validation in software development, aiming to improve the use of vector and matrix calculations in C++ development.After a brief overview of the existing UoM solutions, we discussed their limitations and presented our solution, with several specific user-case scenarios.
The elaboration of Univec started as an internal project during the designing phase of our Betaboltz 32 project.While developing this project, we realized the benefits of integrating the UoM analysis directly into our source code, virtually removing the most common causes of mistakes.Even if this solution does not remove all the sources of error, Univec allowed us to focus on the main development workflow, increasing confidence in our implementation.
We plan to also create to a native implementation when a UoM-aware implementation will be published in the C++ standard library and we hope that similar solutions will be widely accepted in the development of advanced computational calculus for the scientific community.

Figure 2 . 18 (a) 2 × 2
Figure 2. Time needed to perform the specified vector operations, using five different code implementations.The left side refers to Linux setup, while the right side refer to macOS setup.In this plot, lower values mean better performance.

Figure 3 .
Figure 3.Time needed to perform the specified matrix operations, using five different code implementations.The left side refers to Linux setup, while the right side refer to macOS setup.In this plot, lower values mean better performance.

Figure 4 .
Figure 4. Time needed to perform the vector dot product for different vector sizes.The left side refers to Linux setup, while the right side refer to macOS setup.In this plot, lower values mean better performance.

Figure 5 .
Figure 5.Time needed to perform the matrix determinant for different matrix sizes.The left side refers to Linux setup, while the right side refer to macOS setup.In this plot, lower values mean better performance.

Table 2 .
List of functions implemented for complex, quaternions, octonions, and sedenions, in addition to the one presented in table1.Methods marked with † are applicable to quaternions only.The * marks the availability of an extra operation, which starts with do prefix (e.g.doConj), that works inplace on the quantity.

Table 4 .
List of ready to use reference frame transformations, that can be used to describe complex frame transformations.The class CompositeFrame is rarely used directly but can be created combining the other transformations using the » operator.Multiple transformations can be combined to describe complex scenarios using the syntax: assert(frame.forward(v1).isNear(v2));assert(frame.backward(v2).isNear(v1));

Table 5 .
List of classes and aliases implemented in Univec.

Table 6 .
)  2 ← A * ,  2 + A * ,  1 a.colAdd(j1, j2) Column addition and store * A * ,  2 ← A * ,  2 − A * ,  1 a.colSub(j1, j2) List of functions implemented for the Matrix<M,N> class.The † symbol marks operations available for square matrix only, while ‡ symbols is available for row or column matrix only.The * marks the disponibility of an extra operation, which starts with do prefix (e.g.doColSwap), that works inplace on the matrix.Matrix indices are zero-based.

Table 7 .
List of functions implemented for the Matrix<M,N> class.The † symbol marks operations available for square matrix only, while ‡ symbols is available for row or column matrix only.