\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu $$\end{document}ν-Improved nonparallel support vector machine

In this paper, a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu $$\end{document}ν-improved nonparallel support vector machine (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu $$\end{document}ν-IMNPSVM) is proposed to solve binary classification problems. In this model, we use related ideas of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu $$\end{document}ν-support vector machine(\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu $$\end{document}ν-SVM), the parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu $$\end{document}ν is introduced to control the limits of the support vectors percentage. In the objective function, the parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}ε is increased to ensure that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}ε-band is kept as small as possible. It has played a great role in the classification of unbalanced data sets. On the basis of maximizing the interval between two classes, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu $$\end{document}ν-IMNPSVM can fully fit the distribution of data points in the class by minimizing the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}ε-band, which enhances the generalization ability of the model. The results on the benchmark datasets testify that the proposed model has a good effect on the classification accuracy.

ν-Improved nonparallel support vector machine Fengmin Sun & Shujun Lian * In this paper, a ν-improved nonparallel support vector machine ( ν-IMNPSVM) is proposed to solve binary classification problems. In this model, we use related ideas of ν-support vector machine(ν-SVM), the parameter ν is introduced to control the limits of the support vectors percentage. In the objective function, the parameter ε is increased to ensure that ε-band is kept as small as possible. It has played a great role in the classification of unbalanced data sets. On the basis of maximizing the interval between two classes, ν-IMNPSVM can fully fit the distribution of data points in the class by minimizing the ε-band, which enhances the generalization ability of the model. The results on the benchmark datasets testify that the proposed model has a good effect on the classification accuracy.
Support vector machines (SVM), introduced by Vapnik in the early 1990s, is a helpful tool for pattern recognition and logistic regression [1][2][3][4] . Due to its three elements: interval maximization, duality theory, and nuclear techniques, it has been used in diversified fields, including face recognition, text classification and bioinformatics [5][6][7][8][9] . SVM adopts the principle of structural risk minimization, which not only maximizes the classification interval, but also keeps the error small. In traditional support vector machines, two parallel support hyperplanes are constructed through a quadratic programming problem(QPP). It ensures maximum interval between hyperplanes and takes the intermediate hyperplane as the final decision hyperplane.
In recent years, in order to obtain higher classification accuracy in shorter training time, scholars have made a lot of efforts in this regard. The research on support vector machine generally includes: from the perspective of algorithm, the training speed is improved by optimizing the algorithm, such as sequential minimal optimization (SMO) 10 , SSSR 11 , incremental learning 12 and so on. From the perspective of the model, the original problem is partially adjusted to obtain higher classification accuracy and faster classification speed, such as MVSVM 13 , PPSVC 14 , NHSVM 15 , and so on [16][17][18][19][20][21][22][23][24][25] . Most of the research is based on the concept of nonparallel classifier.
So far, nonparallel hyperplane classifiers have caused wide concern among scholars. Mangasarian and Wild 16 first proposed generalized eigenvalue proximal support vector machine (GEPSVM). Two kinds of hyperplanes are constructed by solving the eigenvectors corresponding to the minimum eigenvalues of two related generalized eigenvalue problems, so that each hyperplane is as close as possible to one class in the classification samples. Then Jayadeva et al. 17 proposed twin support vector machine (TWSVM). It follows the thinking of GEPSVM to get two nonparallel hyperplanes by solving two smaller quadratic programming problems (QPPs), which is about 4 times faster than the training speed of SVM. Based on TWSVM, twin bounded support vector machine (TBSVM) was presented in Ref. 18 . It was further developed in Ref. [19][20][21][22][23] . In order to facilitate the application of kernel techniques in dual problems, Tian et al. 24 proposed nonparallel support vector machine (NPSVM), which further improved the nonparallel hyperplane classifier. For NPSVM, it also looks for two nonparallel hyperplanes. Two ε-bands are constructed on either side of each hyperplane, and each ε-band is guaranteed to contain a class of points as much as possible. Each class of point maintains at least one distance from the other hyperplane. Because the penalty parameter C lacks practical significance, the sparsity of the model cannot be estimated even if the value of C is given. So ν-nonparallel support vector machine ( ν-NPSVM) was further proposed in Ref. 25 . It combines ν-SVM and ν-support vector regression(ν-SVR) to obtain two nonparallel hyperplanes. The value of ν replaces the position of parameter C, and the number of support vectors can be controlled by adjusting parameter ν . To solve the difficulty of large-scale data set processing, DC-νNPSVM was subsequently proposed in Ref. 26 . It makes ν-NPSVM have faster convergence speed and even higher classification accuracy when processing large-scale data sets.
Based on the previous model, in this paper, we propose a new support vector machine, called ν-improved nonparallel support vector machine ( ν-IMNPSVM). It optimizes the model of NPSVM and minimizes the constructed ε-band. In order to overcome the lack of quantitative meaning of parameters, the value of ν is introduced in combination with the characteristics of ν-SVM. It enables the sparsity of the model to be presented, thus inheriting the advantages of ν-SVM and making the number of support vectors easy to control. ν -IMNPSVM, meanwhile, inherits the advantages of the previous nonparallel classifier and improves the running OPEN School of Management Science, Qufu Normal University, Rizhao, China. * email: lsjsd2003@126.com

C-SVM. Consider the binary classification problem with the training set
where l is the number of samples, x i ∈ R n are inputs, y i ∈ {1, −1}, i = 1, . . . , l are the labels, the standard C-SVM formulates the problem as a convex QPP: where C > 0 is a penalty parameter, ξ i = (ξ 1 , . . . , ξ l ) T . The importance of minimizing training errors and maximizing intervals can be balanced by adjusting the value of C.
The separation hyperplane of C-SVM is The detached hyperplane lies midway between (ω · x) + b = −1 and (ω · x) + b = 1 . For new input x ∈ R n , we can determine which category it belongs to by using the following decision function ν-SVM. Consider the binary classification problem with the training set (1), ν-SVM formulates the problem as a convex QPP: where ξ i = (ξ 1 , . . . , ξ l ) T , and ν ∈ (0, 1] . The value of ν has real meaning. If the number of support vectors is q and the total number of sample points is l, there is always q l ≥ ν.

ν-IMNPSVM
In this section, we propose a new nonparallel support vector machine, called ν-improved nonparallel support vector machine.
Linear ν-IMNPSVM. Primal problems. We seek the two nonparallel hyperplanes (7) by solving two convex QPPs: and (9) www.nature.com/scientificreports/ where x i , i = 1, . . . , p are positive inputs, and x j , j = p + 1, . . . , p + q are negative inputs, There are four parts in the objective function (12) or (13). The first part is the regular term to minimize structural risk. The second part adds constraint variables to the constructed ε-band to keep it as small as possible. The third and fourth parts ensure that the training error is as small as possible, and that the distance from two hyperplanes to points in the other class is as far as possible. Now, we discuss the primal problem (12) geometrically in R 2 (see Fig. 1). Firstly, We hope that the class point +s fell as much as possible in hyperplane (ω + · x) + b + = ε and (ω + · x) + b + = −ε (red thin solid line) within the ε-band, and the ε-band is as small as possible. Secondly, we want the negative class point * s to be as far away from the hyperplane (ω + · x) + b + = −ρ + (red dotted line).
Nonlinear ν-IMNPSVM. Now, we extend the linear ν-IMNPSVM to the nonlinear case. As NPSVM, we only need to apply the kernel function directly to the above dual problem. For the problem (34), its nonlinear case is as follows: where K(·, ·) is the kernel function. The corresponding constructions are similar to the linear case except that the inner product (·, ·) is taken instead of the kernel function K(·, ·).
The decision function is: Step 5 For the new input sample x ∈ R n , the label category of the sample can be determined by arg min where ∆+ = π T Λπ, ∆− = π T Λ π.
Significance of the parameter ν. The argument ν has real value meaning. If the number of support vectors is q and the total number of sample points is l, there is always q l ≥ ν . Now we use Fig. 2 to illustrate this change. In the figure, we can clearly see the curve of ν value changing with the ratio of the number of support vec- www.nature.com/scientificreports/ tors to the total number of sample points (SVs). Where the red line represents SVs and the blue line represents the value of the parameter ν . The parameter ν varies in steps of 0.1 between the range (0,1]. We can see that the blue line is always below the red line, which means that the parameter ν is the lower bound of the SVs ratio. We can adjust the sparsity of the model by adjusting the value of ν , so as to get better classification effect.

Experimental results
In this section, in order to better verify the performance of the proposed new model ν-IMNPSVM, we compare it with C-SVM, ν-SVM, NPSVM and ν-NPSVM under different data sets. All methods are implemented in MATLAB 2016b 28 on a PC with an Intel Core I5 processor and 4 GB RAM. All training sample points were normalized before training to make the features between [0,1]. The measurement index is based on the classification accuracy, and the classification accuracy is calculated by the standard tenfold cross verification method. For all of the methods, linear and RBF kernel function Iris data set. First, we validate the model ν-IMNPSVM on the Iris data set 29 , which contains 150 samples corresponding to each row of the dataset. Each row of data contains four characteristics of each sample and the sample category information. It can be divided into three categories (Setosa, Versilcolor and Viginica), here we use only two categories (Versilcolor and Viginica) and two feature (petal length and width of petal). The distribution of data is shown in Fig. 3, where +s and * s represent the two categories of Versilcolor and Viginica, respectively.
Here we take the linear kernel as an example and take the parameters separately C 1 = C 3 = 20 , C 2 = C 4 = 15 , ν 1 = ν 2 = ν varies in the range [0.1,0.7] with the step 0.2. Experiment results are shown in Fig. 3 (d) UCI data sets. Second, we tested all the methods on several public benchmark sets 29 . All training samples were normalized before training to ensure that the features were between [0,1]. For different data sets, the same number of samples are randomly selected to form balanced data sets, and the above methods are tested respectively.
For all of these methods, the RBF kernel function is . The parameters C i , i = 1, 2, 3, 4 are tuned for best classification accuracy in the range 2 −8 to 2 8 . The parameters ν , ν = ν i , i = 1, 2 are obtained in the range (0,1] with the step 0.1. The experimental process was repeated for 5 times. Table 1 shows the average cross validation results of all methods for different data sets in linear case. Table 2 shows the average cross validation results of all methods for different data sets in nonlinear case. Experimental results show that the classification accuracy of the proposed ν-IMNPSVM model is improved compared with other models. In order to reflect the experimental effect more clearly, we further compare the 2d scatter plots of NPSVM, ν-NPSVM and ν-IMNPSVM, which were obtained from partial test data points of Australian, BUPA-liver and Heart-Statlog. The data sets we selected were 200 data points randomly selected, including 100 positive and 100 negative points. The scatter plot is obtained by plotting coordinate points based on the vertical distance of the input data point x to the two hyperplanes. Figures 4, 5 and 6 depict a comparison of these three methods across three data sets, which can be seen that the classification effect of ν-IMNPSVM is quite obvious.

Conclusion
In this paper, we propose a new support vector classifier, termed ν-IMNPSVM. It inherits the advantages of NPSVM and further improves the accuracy of data classification on this basis. We replace parameter C in NPSVM with parameter ν . The parameter ν has the following meanings: (1) The parameter ν balances the two objectives of the support vector machine model, i.e. maximization of interval and minimization of training error. In other words, it minimizes the training error while maximizing the interval. (2) The parameter ν is different from parameter C in existing models and has real value meaning. It represents the upper bound of the ratio between the number of interval error sample points and the total number of sample points in the training set. It also expressed the lower bound of the ratio between the number of support vectors and the total number of sample points. The rate of SVs can be controlled by adjusting the value of parameter ν . (3) In different data sets, the parameter ν is selected differently, which means that the sparsity of data sets is different. So it can be better applied to the processing of unbalanced data sets. ν-IMNPSVM is different from the model of ν-NPSVM. It only www.nature.com/scientificreports/ applies the idea of ν-SVM. ν-IMNPSVM retains the free parameter ε and achieves better classification effect by adjusting the parameters. In addition, the addition of regular terms makes the decision function unique and improves the classification accuracy. However, for large-scale data sets, the computational speed of this model needs to be improved. We hope to get enlightenment from Ref. 26 and apply the idea of divide-and-conquer to large-scale algorithm in the future. Reasonable and effective solvers should be selected to improve the classification speed. In addition, for support vector regression learning 30 , we hope to get a new idea, which is also under further consideration.