Manifold-based sparse representation for opinion mining

What the consumer thinks about an organization's products, services, and events is a crucial performance indicator for businesses. The brief opinion pieces were quickly published on websites and social media platforms and have been analyzed by machine learning methods. The classical text feature representation methods suffer from high dimensionality, sparsity, noisy, irrelevant and redundant information. This paper focuses on how to enhance feature representation for opinion mining. Some nonlinear feature selection methods based on manifold assumption have been exploited to resolve these problems. The inherent manifold configuration was commonly ascertained through a nearest neighbor graph, whereby the neighbors in the current techniques may exhibit diverse polarities. To alleviate this burden, it is proposed to exploit both manifold assumption and sparse property as prior knowledge for opinion representation to learn intrinsic structure from data. First, the graph representation of user reviews based on the mentioned prior knowledge is learned. Then, the spectral properties of the learned graph are exploited to present data in a new feature space. The proposed algorithm is applied to four various common input features on two benchmark datasets, the Internet Movie Database (IMDB) and the Amazon review dataset. Our experiments reveal that the proposed algorithm yields considerable enhancements in terms of F-measure, accuracy, and other standard performance measures compared to the combination of state-of-the-art features with various classifiers. The highest classification accuracies of 99.15 and 91.97 are obtained in the proposed method on IMDB and Amazon using a linear SVM classifier, respectively. The impact of the parameters of the proposed algorithm is also investigated in this paper. The incorporation of a sparse manifold-based representation has led to noteworthy advancements beyond the baseline, and this success serves to validate the underlying assumptions.

of research that demonstrate high dimensional data lies on some interesting manifolds against to only one manifold 16,19 .It means that some near opinions, based on common distance measures, may have various attitudes, and it is needed to another prior knowledge is imposed to resolve the problem.Therefore, an algorithm based on sparse representation technique, i.e., Sparse Manifold Sentiment Representation (SMSR), is developed and studied to represent sentiment data based on the two following properties of high dimensional data: (1) selfexpressiveness property of the data, which reflects that each data point in a union of subspaces can be expressed as a linear combination of other data points 26 , (2) sparse representation of a data point matches to a combination of a few data points locating in its own subspace 20 .
Sparse representation has led to promising results in a wide range of applications, including visual recognition 27 , image synthesis 28 , animation 29 , denoising 30 , etc. Recently, sparse representation is applied to sentiment analysis 31,32 .l 2,1 -norm sparsity is exploited to represent micro-videos with the aim of finding the main frame including sentiment 33 .The singular vector decomposition is also used for the sparse representation of image sentiments 31 .The success of sparse learning in sentiment analysis on image and video and the importance of graph representation in manifold-based feature extraction are the author's motivation for incorporating them together.
The proposed algorithm learns the graph representing manifolds from data by using both sparse and locality representation and then exploits spectral properties to generate a new data representation.
The main contributions of this paper are described as follows: Figure 1.The Euclidean distance of TF_IDF vectors of sentence 1 and 2 with dissimilar polarities (after removing stop words) is less than the Euclidean distance of TF_IDF vectors of sentence 1 and 3 with similar polarities.
(1) The feature extraction on sentiment data is applied by considering laying sentiment data on some manifolds.
(2) The manifolds' structure of opinions is discovered by both local information and sparse properties of data.
(3) The impact of sparse and manifold-based representation on some common feature extraction methods of sentiment analysis is explored to provide insights to various vector representation.Our experiments confirm the enhancement of the proposed method compared to all of them.(4) The effect of the proposed method parameters is investigated, and the obtained results demonstrate the validity of the proposed method in a wide range of values.(5) The linear SVM classifier suitable for text data is applied to the extracted features on two benchmark datasets.To the best of our knowledge, a wide range of research has approved the performance of SVM classifiers compared to other classifiers.This paper exploits SVM with linear kernel that has fewer parameters compared to SVM with both Radial basis function and polynomial kernel.Numerous experiments reveal that extracted features are highly suitable features for this classifier.
It is noteworthy to mention that the last three cases mentioned above highlights the main aspects of this research in comparison to our previous research 34 .This paper combines the key ideas from sparse representation, manifold learning, and the sentiment analysis.
The rest of this paper is organized as follows."Literature review" section examines the literature review.Then, "proposed method" describe the details of the proposed algorithm.The details of experiments and results are given in the fourth section, followed by a conclusion in the end.

Literature review
In recent years, opinion mining has become an active research area 35,36 .In this section, some research on enhancing sentiment analysis performance by feature representation methods, including feature selection and feature extraction, has been examined.
Feature selection methods in opinion mining literature falls into three categories of natural language processing (NLP) based, machine learning-based and the combination of them.NLP-based techniques exploit lingual characteristics such as nouns, noun phrases, adjectives, and adverbs as features 37,38 .NLP-based techniques have achieved high accuracy but low recall, depending on the amount of accuracy of part of tagging speech.
The feature selection methods based on machine learning are divided in three categories including filter approaches, wrapper approaches, and hybrid methods.In filter approaches, a score is assigned to each feature based on the evaluation function and the features with low scores are eliminated.This approach has low computational cost and easy to implement.Feature filtering methods such as information gain (IG), chi-square (Chi2), occurrence frequency, Z-score, logarithm likelihood, fisher discriminant ratio and minimum frequency thresholds consider attributes separately [39][40][41][42][43][44][45] .The document frequency (DF) feature selection method is commonly used in the general text classification and picks the most common terms in the corpus.Mutual Information, IG, Chi2, and DF with five machine learning algorithms, consisting of support vector machine, k-nearest neighbor, centroid classifier, naïve Bayes, and winnow classifier, were applied for sentiment analysis of Chinese documents 39 .Cekik et al. proposed a feature selection method using rough set theory to handle sparse opinion data 46 .Wang et al. exploited an improved fisher discriminant ratio for feature selection of sentiments 41 .Filterbased techniques have computational efficiency, but they neglect the interactions between features 47 .
In wrapper methods, a feature is either added or removed at each step to select an optimal feature subset in a greedy manner.When a new feature set is selected, the classifier is retrained with new features and assessed on a validation set.These methods suffer from high computational cost.Some important wrapper methods are decision tree models, recursive feature elimination, and heuristic based algorithms 48 .The wrapper methods are more expensive than filter models in terms of computational efficiency, since they evaluate attribute interactions.Gokalp et al. proposed a wrapper feature selection method using iterated greedy metaheuristic algorithm for opinion mining 49 .Hybrid techniques combine other groups for exploiting their suitable properties and achieving both accuracy and efficiency 43,50,51 .Hybrid or embedded methods find a subset of features in the machine learning methods.
Feature extraction methods can be grouped into linear or nonlinear methods.Linear methods are based on the simple assumption of linearity.Latent semantic analysis and principal component analysis are two popular methods applied to opinion mining 52,53 .The implementation of linear dimensionality reduction is easy and fast.Meanwhile, they cannot represent complex nonlinear data appropriately.Few studies on opinion mining consider the nonlinear structure of data which are local methods applying manifold assumption by enforcing near data points in the input space, close to each other in the new feature space.Kim et al. exploited semi-supervised Laplacian eigenmaps to reduce the dimensionality of data and visualize sentiments 24 .This method is based on manifold regularization utilizing unlabeled data in the training step.Ma et al. proposed another semi-supervised nonlinear feature extraction based on Laplacian eigenmap, where its initial graph is obtained using fuzzy similarity relation 54 .A semi-supervised dimensionality reduction combined with feature weighting is also proposed in 25 .It attempts to maximize between-class distance and minimize within-class distance, exploiting the local structure of unlabeled data at the same time.
The summarization of the literature is given in Table 1.The focus of this paper is on the representation of opinion data using a nonlinear feature extraction method by considering laying data on some manifolds and discovering this structure by sparse properties, which have neglicated in the literature.The significance of constructing graphs in manifold-based methods has been studied and demonstrated in a number of research studies.Some methods exploit label information to make a discriminant graph with the aim of classification.Discriminant neighborhood embedding (DNE) method has been introduced an optimization problem for learning a discriminant embedding by considering both intra-class and inter-class adjacency weights, where these www.nature.com/scientificreports/adjacency weights are simply in {0, 1} 55 .Some other researches improved DNE by extending both weight values and neighborhood definition 21,56 .A theoretical framework for improving the graph weights' learning has also been proposed in 21,22 .The mentioned methods have been evaluated on image datasets.Similar to existing nonlinear methods 24 , a graph is constructed from data, but instead of using k-nearest neighbor ( k NN) or ǫ-nearest neighbor with some heuristics 24,54 , a graph is learned from data in this paper in order to detect close data points by imposing sparsity constraint.Sparse representation successfully applied to various applications.Recently, it has been an active research area in the sentiment analysis.l 2,1 -norm sparsity in the incorporation of the regularization term based on Laplacian matrix is exploited to extract sentiment key frame extraction in user-generated micro-videos 33 .The cosine distance between video frame vectors is used to compute the graph weights.The sparse representation by singular vector decomposition is used to enhance the efficiency of deep learning in image sentiment analysis.It ignores the local smoothness assumption as an effective prior knowledge 31 .
In the comparison of the above sparse research on sentiment analysis, the focus of the proposed method is on short text data and attempts to learn a suitable data dependent graph structure.

Problem formulation
Let R = {r i } N i=1 be a collection of N reviews, where r i is i th opinion.A sentiment label is assigned to each review of r i , in the labeled training set and is shown by y i that can be + 1 (positive) or -1 (negative), 0 is also may be included indicating neutral sentiment but it is not considered in this research.Table 2 gives the main notations Exploit semantic and sentiment distance between opinions by computing cosine distance between the opinion's vectors and the difference in the degree of sentiments, respectively Consider lying data on one manifold Table 2. Notations.

D Dimension of input data N The number of input reviews d
The dimensionality of data in the learned feature space a set of N feature vectors, where vector x i represents features of i th review The coefficient of sparsity term in the optimization problem k The maximum neighborhood size to select the sparse neighbors The matrix of normalized vectors www.nature.com/scientificreports/ of this paper.The ultimate aim of opinion mining is to predict the sentiment label of reviews with unknown attitudes.The essential steps of the proposed method are given in Fig. 2.

Research motivation
Let X = x i ∈ R D N i=1 be a set of review vectors, x i is the feature vector of representing i th review, r i , and can be extracted by various methods, including statistical, lexicon-based, and combined.The details of feature extraction methods in this paper are given in "Extract feature vectors from unstructured text" subsection .The features generated from unstructured reviews consist of redundant, sparse, and unusual information and should be enhanced for the opinion mining task.Some researches are exploited manifold assumption and reduce the dimensionality of opinion data by constructing a locality graph from data.Nonetheless, By studying various methods of nonlinear dimensionality reduction, three following achievements are available: 1) High dimensional data lies on some intersecting manifolds.
2) The accurate graph construction of underlying manifolds is so important, especially in the intersecting regions.3) Sparse representation can find the representation of one data based on other data points in its subspace (local tangent space) Therefore, it is assumed that some nonlinear manifolds of opinions are embedded into the high dimensional input space, and a few nearest neighbors can construct each data point in its manifold.The weights of this reconstruction for all data are learned by imposing the mentioned assumptions as prior knowledge.It is proposed to exploit sparse manifold based representation to efficiently and accurately analyze high-dimensional data by restoring its low-dimensional structure.After learning graph structure from data, spectral methods are exploited to extract the new representation.The details are explained in "Sparse Manifold Representation Algorithm" subsection.
Finally, a classifier is applied to learn a model that discriminates positive attitudes from negative ones in the feature space and predicts the overall attitude within unknown labels.The architecture of the proposed method is shown in Fig. 3.

Extract feature vectors from unstructured text
Some preprocessing steps are carried out at the first stage, including removing numbers and punctuations, normalizing, tokenizing, and eliminating stop words.Then, the commonly used features are extracted in the phase of extracting features.The popular features in the literature are n-gram, parts of speech (POS) tagging, term frequency-inverse document frequency (tf-idf), and their combination 39,57,58 .The mentioned features are used in our experiments as input features and are described as follows: -n-gram: An n-gram is defined as a subsequence of n words from a given sequence by eliminating extra spaces and noisy characters between two words.Unigram or bag of word model is the simplest model and consists of all the individual words presented in the text.The bigram model is defined as a pair of adjacent words and can include some contextual information.Higher orders of n-grams are more efficient in capturing the  www.nature.com/scientificreports/context as they provide a better understanding of the word position, while their generation is inefficient.The unigram (n = 1), bigram (n = 2) and trigram (n = 3) are commonly used n-gram models.-POS tagging: POS tagging is a linguistic technique that gives a tag to each word by specifying its morphological category such as noun, verb, adjective, etc. -Lexicon-based features: A list including positive and negative words with their scores is provided in a sentiment lexicon.An n-gram is considered in the features if it is provided in the list with acceptable scores.
The terms are specified based on one or more features above and weighted according to the tf-idf weighting method.Intuitively, tf-idf specifies how relevant a given term is in a particular document.It consists of two parts: the frequency of terms in a specific review, and the inverse proportion of that term over the entire review.Equation (1) is the classical formula of tf-idf related to i th term in j th document (review): where tf i,j indicates the term frequency of term i in j th review and df i is the document frequency of i th term in all reviews.Terms that occur in one or a small set of documents are given more tf-idf weight than terms that occur in many documents.Finally, i th review is represented by x i = (x i1 , . . ., x iD ) , where x ij = tf − idf i,j .
There exist alternative techniques for encoding opinion texts into vectors.One such approach is Doc2Vec 23 , which employs a neural network to generate a vector representation of each opinion of a fixed length.The learning process is designed to map similar documents to proximate points within the vector space.The resultant vectors can be effectively leveraged within the proposed method.

Sparse manifold representation algorithm
After the step of representing reviews by feature vectors, there is a set of N data points l=1 and the aim is to discover a new representation of the data by the sparse manifold embedding method considering these two assumptions 20 : (1) each data point x i ∈ M l can be reconstructed by the linear combination of its neighbors, and (2) the minimum of these neighbors from the same manifold is formed by the sparse representation of x i .In other words, between the infinitely many possible representations of a data point which use other data points, a sparse representation finds a few points from the same subspace.These assumptions are illustrated in Fig. 4, where points x 2 and x 3 are farther than x 4 , x 5 and x 6 to x 1 , whereas they lie on the different manifold of x 1 , M 2 , and x 4 , x 5 and x 6 are located in any small ball centered at x 1 containing x 2 and x 3 .Nevertheless, only x 2 and x 3 span a one-dimensional subspace around x 1 which is close to the tangent space of M 1 at x 1 and can be found by the sparse constraint.
This idea is studied in both machine learning and image classification literature, and to the best of the author's knowledge, so far, less attention has been paid to the sentiment analysis application 20,59,60 .Manifolds are represented by a graph structure so that each data point x i is a node of the graph and the weights of connections between i th node and other nodes formed by the vector of c i ∈ R N−1 .c ij is the weight between x i and j th element of X without considering x i .c i are obtained by the reconstruction of each data point based on two mentioned properties of proximity and sparsity by the following optimization problem 20 : where l1-norm, . 1 , imposes the sparsity constraint, > 0 is a given parameter used for tradeoff between the sparsity against the reconstruction error.Q i is a positive-definite diagonal matrix whose smaller values show the nearby points to x i and larger values indicate farther points from x i , preferring the zero values for them.A common definition of Q i is as: x 2 and x 3 are far from x 1 in the comparison of x 4 , x 5 , x 6 lying on different manifolds, whereas all of them are in the nearest neighbor of x 1 .The sparse optimization selects x 2 and x 3 to reconstruct x 1 which span a 1-dimensional affine subspace passing near x 1 and approximates its manifold (Adapted from 20 www.nature.com/scientificreports/Furthermore, other alternatives for Q i , exponential weights for example, maybe exploited; X i is the matrix of normalized vectors x i − x j j� =i as The optimization problem (2) is a Lasso optimization problem with the additional affine constraint 61 .The coefficient vectors of {c i } N i=1 , obtained in the optimization problem (2) are translated into graph edge weight matrix, W = [w ij ] , capturing the manifold structure of data.Let c i = [c i1 . . .c iN−1 ] , the w ij is obtained via the following relation: that places the learned weights for the i th data, c i , in its suitable index of weight matrix.To describe precisely, each node i, representing i th data point, is connected to all other nodes by utilizing the elements of a sparse solution c i .It is expected that the nonzero elements of c i originate from the same manifold.Consequently, the resulting graph should ideally exhibit multiple connected components.This is due to the fact that points within the same manifold are connected to each other, while no connections exist between points in different manifolds (In reality, there exists some weak connections between various manifolds).The obtained graph edge weights are established to connect each data point x i to the specified neighbors.In practice, for more efficiency, the learning of weights of each x i is restricted to its nearest neighbors, and a parameter k specifies maximum neighborhood size to select the sparse neighbors.Ultimately, a symmetrizing step is conducted to make the obtained weight matrix symmetric.Then, the spectral properties of the obtained weighted graph are utilized to learn a novel representation of data.
The eigenvectors of an affinity matrix are utilized to reveal the intrinsic data structure by three standard steps: (1) Compute the normalized symmetric graph Laplacian 62 where W = [w ij ] is the weight matrix, and D is degree matrix as diagonal matrix as Other versions of Laplacian can also be used.(2) Compute the d eigenvectors v 1 ...v d of L with the smallest eigenvalues.(3) Form matrix V with v 1 ...v d as columns.i th row of V is the projection of the x i onto smallest eigenvectors and are used as the new representation of x i .The steps of SMSR are given in Alg. 1.
The ultimate goal of sentiment analysis is to predict the attitude of sentiments.Conventional classifiers could be applied to the new representation of data.However, as it will be shown in the next section, if a few eigen vectors are chosen, a wide range of suitable information is lost; therefore, suitable results are achieved by exploiting a classifier that supports high-dimensional data.SVM is a successful classifier that achieved proper results in the opinion mining literature 13,[63][64][65] .In this study, the linear SVM is applied that is appropriate for text application and has fewer parameters than SVM with other kernels 66,67 .Form symmetric weight matrix, , by combining .

3.
Compute eigenvalue decomposition of and form the new representation of data in dimensions.

Datasets
The proposed method is trained and tested on two famous datasets: (1) Internet Movie Database (IMDB), including 50,000 movie reviews consisting of 25,000 positively annotated and 25,000 negative labeled reviews (It is available at http:// www.cs.corne ll.edu/ people/ pabo/ movie-review-data/). 68 (2) Amazon review dataset released in 2014 and updated in 2018 and contained 233.1 million reviews and their metadata (The subset of original dataset is downloaded from https:// drive.google.com/ file/d/ 1EW-2ZiC2 Df8Pu fsuNM PqIrd x6_ zo29D1/ view?usp= shari ng (last access data: 9/21/2021)) .Some subsets of datasets are randomly selected, including equal positive and negative classes with various features settings.Then, the SMSR algorithm is applied to widely used features including unigrams, bigrams, tagged terms, and sentiment terms as the following four combinations: (1)Unigram + lexicon + tag: the combination of unigrams and POS tags is considered as features, and their tf-idf are computed.(2)Unigram: The tf-idf of unigrams is computed.(3) Bigram: The tf-idf of bigrams is calculated.(4) Unigram + lexicon: Sentiwordnet 69 , a popular sentiment lexicon including positive and negative terms based on their part-of-speech tags, is used in this setting.Sentiwordnet is frequently exploited in the opinion mining literature 70 .In our experiments, unigrams are filtered using sentiwordnet lexicon.Those words whose difference between the sum of positive and the sum of negative scores is less than a pregiven threshold are eliminated and tf-idf of remaining unigrams is computed.The threshold is set to be 0.3.
In all the above settings, unigrams and bigrams which appear less than five times are eliminated.Other features or the combination of features can be considered input features, and there are no constraints on the features used by the proposed method.In this study, 2000 and 4000 reviews are selected randomly from IMDB and Amazon datasets, respectively.

Evaluation measures
The proposed method is evaluated based on Precision, Recall, Accuracy, Specificity and F-measure 71 .Accuracy gives the percent of reviews whose attitudes were predicted correctly.Precision gives the percent of reviews estimated as positive attitudes correctly among the all reviews that are predicted as positive ones.Recall is the fraction of true estimated positive attitudes among all actual positive reviews and F-measure is the harmonic mean of precision and recall 72 .Specificity shows the percent of negative reviews that are predicted correctly.

Results
For a fair comparison, tenfold cross-validation is applied to all experiments.The impact of SMSR is investigated by applying linear SVM on the above-mentioned features and comparing the obtained results with the following cases in numerous experiments: 1. Linear SVM with all input features, where regularization parameter, γ , was set by grid search between 11 equally logarithmically spaced points between 10 −6 and 10 −0.5 .2. Seven applied classifiers to sentiment analysis in the literature including k NN, Naïve Bayesian (NB), Bagging, SVM with Radial Basis Function kernel (SVM(RBF)), Random Subspace, J48 and Random Forest under two situations.First, without applying any feature selection method, and second, by selecting suitable features using information gain, where the number of selected features is set to be {100, 200, 300, 400, 500, 600, 700, 800, 900, 1000}.So, eleven experiments have been done for each classifier.For more clarity, the best result of these experiments for each classifier is reported below.The classifier of WEKA (Waikato Environment for Knowledge Analysis, version 3.9.2,The university of Waikato, New Zealand) was exploited to apply abovementioned classifiers 73 .The parameters were tuned according to Table 3. 3. It is worthwhile to mention that Random Subspace, J48 and Random Forest selects features in the nonlinear manner at the same time of training models.Therefore, they are suitable models for assessing the proposed method vs. nonlinear feature selection models.
The SMSR's parameters are set according to Table 4.The best values of and k for two datasets are obtained for 10 and 50, respectively.The best values of d are 1500 and 2200 for IMDB and Amazon, respectively.
The accuracy of forecasting the attitude of sentiments by various methods is summarized in Fig.  unigram + lexicon + tag, unigram, bigram and unigram + lexicon, respectively.It showed that the lowest error, 8.03% is retained by unigram features on the Amazon dataset.The precision, recall, specificity and F1 of various methods on four input vectors are given in Table 5.The proposed method outperforms other methods in all performance measures.The best result on IMDB dataset is achieved exploiting both unigram + lexicon (F-Measure = 99.15,recall = 99.60) and bigrams (Precision = 99.68 and specificity = 99.69).Table 6 shows the precision, recall, specificity and F1 on Amazon dataset.The best result on the Amazon dataset is gained by unigram (F-measure = 92.16 and Recall = 94.12) and bigram (precision = 99.45 and specificity = 99.55).The obtained results are better than other methods in all performance measures on unigram + lexicon features.However, several methods have achieved the better results based on some measures, but the best F1 and accuracy are attained utilizing the proposed feature extraction method on all four type input features.For more comparison, the proposed method is also compared with a manifold-based dimensionality reduction, where manifold is represented by sentiment and semantic analysis 23 .The graph is built by combining semantic and sentiment distance between reviews.The semantic distance metric is computed by a cosine similarity metric between opinion's vectors, where Doc2Vec model 74 has been utilized for representing opinions by vectors.The sentiment distance is computed by the valence aware dictionary for sentiment reasoning 75 .Isomap is applied for dimensionality reduction.Finally, Linear SVM with tenfold cross validation is applied to compare with the proposed method.The number of Doc2vec dimensions is selected from {300, 400, 500}, and the best results are reported in the Table 7.As the results show, the proposed method outperforms the method presented in 23 .However, the Doc2Vec model does not generate sparse vectors; the output vectors are high-dimensional yet.The proposed method can also discover the underlying structure of these data since it learns neighborhood from the data, as opposed to 23 combines semantic and sentiment distance based on some fixed weights.

Discussion
For more detailed analysis, the parameter sensitivity of the proposed method is assessed in this section.Three parameters may influence SMSR: , k , d .indicates the importance of the sparsity of the solution in comparison to the reconstruction error, in which more values of shows more sparsity of the solution.d is the dimension of the output of SMSR and is limited to the number of data, since the number of eigenvalues of the Laplacian matrix is equal to data size.k specifies the number of nearest data points in the input space to search for neighboring points in the output space.To examine the effect of the value of parameters on the obtained results, some experiments have been conducted by varying them in a wide range for different features.The initial values of and k are set to 10 and 50, respectively.The initial value of d is considered to be 1900 for IMDB dataset and 2500 for Amazon dataset.The initial variable settings are varied one by one to observe their effect on the performance.Figures 7-11 illustrate the impact of varying d , and k on the F-measure and accuracy measures on two considered datasets.
Regarding Figs. 7, 8, it is observed that the models with large d values achieve high performance.Conse- quently, linear SVM is exploited to classify data, which can deal with high dimensional data appropriately having fewer parameters than SVM with both polynomial and RBF kernels.As illustrated in Figs. 9, 10, does not significantly affect the performance, which means that a sparse solution for each point from neighbors in the same manifold was successfully found for a wide range of values in the optimization problem.The standard deviation of accuracy in Fig. 9 is between 0.07% and 2.30%.As shown in Fig. 10, accuracy has a standard deviation of 0.75% and 1.71% for features learned from unigram + lexicon + tags and bigram, respectively.Figures 11, 12

Conclusion
The people's opinion that are posted on social media platforms have an impact on a lot of users.Automated categorizing opinions into negative and positive feeling, utilizing machine learning and natural language processing techniques, is gaining significant importance.The short opinion texts convert into sparse high dimensional vectors, which lie on the intrinsic manifold structure.Discovering the manifold structure has a great effect on the accuracy of the obtained result.This paper investigated a sparse manifold representation of user-generated reviews by assuming lying data on some manifolds.The solution of the optimization problem, formulating these assumptions, concludes a graph of capturing the geometrical structure of data in a way that near data points from the same manifold connect with a higher weight than those from different manifolds.The spectral properties of the learned graph are exploited to portray a new representation of data.The proposed representation method was applied to four features extracted from short unstructured opinion texts on two benchmark datasets.Many experiments have been conducted to explore the effectiveness of the manifold assumption and sparse property as prior knowledge in opinion mining in terms of accuracy, precision, recall, specification, and F-measure.The results revealed that the classification performance of the sentiments is ameliorated by utilizing the proposed feature extraction method.Specifically, the combination of manifold assumption and sparse property as prior knowledge in opinion mining leads to a better representation of opinions and a more accurate classification of sentiments.In addition, the impact of parameters is assessed in our https://doi.org/10.1038/s41598-023-43088-9 13:15904 | https://doi.org/10.1038/s41598-023-43088-9

Figure 2 .
Figure 2. The steps of the proposed method.

Figure 3 .
Figure 3.The architecture of proposed method.

3 .
5 on IMDB dataset by considering unigram + lexicon + tag, unigram, bigram and unigram + lexicon.As shown in the figures, the accuracy of the proposed model with various input features is much higher compared to other methods.It showed that the new representation on unigram + lexicon input features has only 0.85% error on the IMDB dataset.The obtained performance measures on Amazon dataset are illustrated in Fig. 6 by consideringTable The parameter values of Classifiers.
trees = 100, the number of features to consider for training trees = log (the number of input features) + 1Random SubspaceThe size of each subspace = the half of all attributes,BaggingThe size of each bag = the number of input dataJ48The minimum number of instances in each leaf = 2, The number of confidence threshold for pruning = 0https://doi.org/10.1038/s41598-023-43088-9

Figure 5 .
Figure 5.The accuracy of various methods on the IMDB dataset on (a) unigram + lexicon + tag features, (b) unigram features, (c) bigram features, and (d) unigram + lexicon features.

Figure 6 .
Figure 6.The accuracy of various methods on the Amazon dataset on (a) unigram + lexicon + tag features, (b) unigram features, (c) bigram features, and (d) unigram + Lexicon features.
illustrate the stability of the proposed method on varying k ; although, both bigram and unigram + lexicon features may be affected by the values of this parameter compared to other features on IMDB dataset.Finally, the evaluated results reveal the strength of SMSR for different values of k and and a sufficiently large value of d .The effects of these parameters on other aforementioned performance measures are similar to F-measure and accuracy.

Figure 7 .Figure 8 .
Figure 7.The impact the parameter of d on (a) F-measure (b) accuracy of the proposed method on the IMDB dataset.

Figure 9 .Figure 10 .Figure 11 .Figure 12 .
Figure 9.The impact of the parameter of on (a) F-measure (b) accuracy of the proposed method on the IMDB dataset.

Table 1 .
The summarization of related dimensionality reduction methods in sentiment analysis.

Table 4 .
The parameter values of SMSR.

Table 5 .
The precision, recall, specificity and F1 of various methods for different input features on the IMDB dataset.Significance values are in bold.

Table 6 .
The precision, recall, specificity and F1 of various methods for different input features on the Amazon dataset.Significance values are in bold.

Table 7 .
23e result of comparison of the proposed method with23.