Fast data-driven learning of parallel MRI sampling patterns for large scale problems

In this study, a fast data-driven optimization approach, named bias-accelerated subset selection (BASS), is proposed for learning efficacious sampling patterns (SPs) with the purpose of reducing scan time in large-dimensional parallel MRI. BASS is applicable when Cartesian fully-sampled k-space measurements of specific anatomy are available for training and the reconstruction method for undersampled measurements is specified; such information is used to define the efficacy of any SP for recovering the values at the non-sampled k-space points. BASS produces a sequence of SPs with the aim of finding one of a specified size with (near) optimal efficacy. BASS was tested with five reconstruction methods for parallel MRI based on low-rankness and sparsity that allow a free choice of the SP. Three datasets were used for testing, two of high-resolution brain images (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {T}_{2}$$\end{document}T2-weighted images and, respectively, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {T}_{1\rho }$$\end{document}T1ρ-weighted images) and another of knee images for quantitative mapping of the cartilage. The proposed approach has low computational cost and fast convergence; in the tested cases it obtained SPs up to 50 times faster than the currently best greedy approach. Reconstruction quality increased by up to 45% over that provided by variable density and Poisson disk SPs, for the same scan time. Optionally, the scan time can be nearly halved without loss of reconstruction quality. Quantitative MRI and prospective accelerated MRI results show improvements. Compared with greedy approaches, BASS rapidly learns effective SPs for various reconstruction methods, using larger SPs and larger datasets; enabling better selection of sampling-reconstruction pairs for specific MRI problems.

www.nature.com/scientificreports/ low-rank Hankel matrix approach (ALOHA) 10 , among others, can be used. We tested the proposed optimization approach for P-LORAKS 11 and three different multi-coil CS approaches with different priors 12 . The contribution of the proposed approach is a new learning algorithm, named bias-accelerated subset selection (BASS), that can optimize large sampling patterns, using large datasets, spending significantly less processing times as compared to previous approaches. Moreover, the SPs optimized by BASS can achieve good image quality with short acquisition times, improving clinical tasks. A very preliminary presentation of our approach is in 13 . Background and purpose. Fast magnetic resonance (MR) pulse sequences for measurements acquisition 1,2,14 , parallel imaging (PI) using multichannel receive radio frequency arrays [15][16][17] , and CS 3-6 are examples of advancements towards rapid MRI. PI uses multiple receivers with different spatial coil sensitivities to capture samples in parallel 18 , increasing the amount of measurements in the same scan time. Further, undersampling can be used to reduce the overall scan time [15][16][17] . CS relies on incoherent sampling and sparse reconstruction. With incoherence, the sparse signals spread almost uniformly in the sampling domain, and random-like patterns can be used to undersample the k-space [3][4][5]19,20 . Successful reconstructions with undersampled measurements, such as PI and CS, use prior knowledge about the true signal to remove the artifacts of undersampling, preserving most of the desired signal. Essentially, the true signal is redundant and can be compactly represented in a certain domain, subspace, or manifold, of much smaller dimensionality 21,22 . Low-rank signal representation 23 and sparse representation 24 , are two examples of this kind. Deep learning-based reconstructions have shown that undersampling artifacts can also be separated from true signals by learning the parameters of a neural network from sampled datasets 23,25,26 .
The quality of image reconstruction depends on the sampling process. CS is an example of how the SP can be modified [27][28][29] , compared to standard uniform sampling 30 , so as to be effective for a specific signal recovery strategy 29,31 . According to pioneering theoretical results 27,32,33 , restricted isometry properties (RIP) and incoherence are key for CS. In MRI, however, RIP and incoherence are more like guidelines for designing random sampling 3,5,29 than target properties. New theoretical results 34,35 revisited the effectiveness of CS in MRI; in particular, elucidating that incoherence is not a strict requirement. Also, studies 36,37 show that SPs with optimally incoherent measurements 3 do not achieve the best reconstruction quality, leaving room for effective empirical designs. SPs such as variable density [38][39][40] , Poisson disk 41,42 , or even a combination of both 43,44 show good results in MRI reconstruction without relying on optimal incoherence properties.
In many CS-MRI methods, image quality improves when SP is learned utilizing a fully sampled k-space of similar images of particular anatomy as a reference [45][46][47][48][49] . Such adaptive sampling approaches adjust the probability of the k-space points of variable density SP according to the k-space energy of reference images [45][46][47][48][49][50] . Such SP design methods have been developed for CS reconstructions, but generally they do not consider the reconstruction method to be used.
Statistical methods for optimized design techniques can be used for finding best sampling patterns 51,52 . Experimental design methods, especially using optimization of Cramér-Rao bounds, are general and focus on obtaining improved signal-to-noise ratio (SNR). These approaches were used for fingerprinting 53 , PI 54 , and sparse reconstructions 51 . They do not consider specific capabilities of the reconstruction algorithm in the design of the SP, even though some general formulation is usually assumed.
In DDO approaches, the SP is optimized for reference images or datasets containing several images of particular anatomy, using a specific method for image reconstruction [55][56][57][58][59] . The main premise is that the optimized Figure 1. Illustration of the (a) 3D + time data collection scheme considered in this work. The sampling pattern is in 2D + time, it comprises the time-varying phase and partition encoding positions, for each of which data are to be collected by the MRI scanner for all frequency encoding positions. Our method can also be applied to (b) 2D data collection with fully-sampled lines in the frequency-encoding direction and a 1D sampling pattern denoting phase encoding positions. www.nature.com/scientificreports/ SP should perform well with other images of the same anatomy when the same reconstruction method is used. These approaches can be extended to jointly learning the reconstruction and the sampling pattern, as shown in [60][61][62] . DDO is applicable to any reconstruction method that accepts various SPs. In 56 , DDO for PI and CS-MRI is proposed, the selection of the SP is formulated as a subset selection problem 63,64 , which is solved using greedy optimization of an image domain criterion (an extension of 55 for single-coil MRI); see also 57 .
Finding an optimal SP via subset selection problem is, in general, an NP-hard problem. Also, each candidate SP needs to be evaluated on a large set of images, which may involve reconstructions with high computational cost. Effective minimization algorithms are fundamental for the applicability of these DDO approaches with large sampling patterns.
Existent subset selection approaches for SP optimization. Commonly used in prior works are the greedy approaches; classified as forward 29,55,65 (increase the number of points sampled in the SP, starting from the empty set), backward 51,65 (reduce the number of points in the SP, from fully sampled), or hybrid 63 . Considering the current SP, greedy approaches test candidates SPs, that are one k-space element different, to be added (or removed). After testing, they add (or remove) the k-space element that provides the best improvement in the cost function 64 .
Greedy approaches have a disadvantage regarding computational cost because of the large number of evaluations or reconstructions. Assuming that fully-sampled k-space measurements are of size N, whereas the undersampled measurements are of size M < N , and there are N i images, or data items, used for the learning process, the greedy approach will take N × N i reconstructions just to find the best first sampling element of the SP (not considering the next M − 1 k-space elements that still have to be computed). This makes greedy approaches computationally unfeasible for large-scale MRI problems. As opposed to this, the approach proposed in this work can obtain a good SP using 50N i to 500N i image reconstructions (for all the M k-space elements of the SP).
The approach in 55 is only feasible because it was applied to one-dimensional (1D) undersampling, such as in Fig. 1b, with a small number of images in the dataset and single-coil reconstructions. The approach was extended to 1D+time dynamic sequences 57 and to parallel imaging 56 , but it requires too many evaluations, practically prohibitive for large datasets and large sampling patterns.
A different class of learning algorithms for subset selection 64 , not exploited yet by SP learning, use bit-wise mutation, such as Pareto optimization algorithm for subset selection (POSS) 64,66,67 . These learning approaches are less costly per iteration since they evaluate only one candidate SP and accept it if the cost function is improved. POSS is not designed for fast convergence, but for achieving good final results. However, these machine learning approaches can be accelerated if the changes are done smartly and effectively instead of randomly.
Other fast approaches for DDO of SP. Besides the formulation of DDO of SP as a subset selection problem, other approaches have been investigated. The use of deep learning for image reconstruction 23,25,26,68 have been extended to learning the SP. In 60 , a probabilistic sampling mask is learned inside the neural network, following by a random generation of SPs. In 62 , twice continuously differential models are used to find the SP for variational problems. While these approaches are also faster than 55 to learn the SP, they are less flexible. The parallel MRI methods cited in the Section "The specific content of this paper" cannot be used, and only quadratic cost functions can be optimized. In 61,69,70 , parametric formulation of non-Cartesian k-space trajectories are optimized. While being interesting approaches, they cannot be applied to our Cartesian 3D problem described in "The specific content of this paper". Another approach for improving image quality through better sampling is the use of active sampling [71][72][73][74] , in which the next k-space sampling positions are estimated during the acquisition using the data that have been captured. While promising, this approach requires significant changes within the MRI scanning sequence that are not widely available. As opposed to that, our current approach to find the best (optimized) fixed 3D Cartesian SP for a given anatomy, contrast, and pre-determined reconstruction method, can be included in an accelerated (compressed sensing and parallel) MRI scanning protocol, simply replacing an existent non-optimized SP. For this task, the subset selection formulation of DDO of the SP seems to be the most effective approach for our applications of interest.

Theory
Specification of our aim. Referring to Fig. 1, we use Ŵ to denote the set of size N comprising in the Cartesian grid all possible (a) time-varying phase and partition encoding positions in the 3D + time data collection scheme or (b) all possible phase encoding positions in the 2D data collection scheme. Our instrument (a multicoil MRI scanner) can provide measurements related to these sampling positions. Each such "measurement" comprises a fixed number (we denote it by N s ) of measurements values for k-space points, i.e. the points on a line in the frequency-encoding direction for all coils. The measurements for the N positions of Ŵ are represented as the N s N-dimensional complex-valued vector m , these are referred to as fully-sampled measurements.
Let be any subset (of size M ≤ N ) of Ŵ ; it is referred to as a sampling pattern (SP). The undersampled measurements of m , restricted to M positions in , is represented as the N s M-dimensional complex-valued vector where S is a N s M × N s N matrix is referred to as the sampling function. Such m is referred to as the undersampled measurements for the SP . The acceleration factor (AF) is defined as N/M. Note that, in practice, the reduction of the scan time depends on the pulse sequence used 2 . We assume here that the acquisition of N s measurements values for any element of Ŵ requires the same scan time.
It is assumed that we have a defined recovery algorithm R that, for any SP and any undersampled measurements m for that SP, provides an estimate, denoted by R(�,m) , of the fully-sampled measurements. A method www.nature.com/scientificreports/ for finding an efficacious choice in a particular application area is our subject matter. Efficacy may be measured in the following data-driven manner. Let N i be the number of images and also the number of fully sampled measurements items (denoted by m 1 , . . . , m N i , called the training measurements) used in the learning process to obtain an efficacious . Intuitively, we wish to find a SP such that all the measurements m i , for 1 ≤ i ≤ N i , are "near" to their respective recovered versions R(�, S � m i ) from the undersampled measurements. Using f (m, n) to denote the "distance" between two fully-sampled measurements m and n , we define the efficacy of a SP as: Then the sought-after optimal sampling pattern of size M is: Models used. Parallel MRI methods that directly reconstruct the images, such as sensitivity encoding method (SENSE) 16,75 and many CS approaches 76 , are based on an image-to-k-space forward model, such as where x is a vector that represents a 2D+time image of size N y × N z × N t ( N y and N z are horizontal and vertical dimensions, N t is the number of time frames), C denotes the coil sensitivities transform mapping x into multi-coil-weighted images of size N y × N z × N t × N c , with N c coils. F represents the spatial Fourier transforms (FT), comprising N t × N c repetitions of the 2D-FT, and m is the fully sampled measurements, of size N y × N z × N t × N c . The two transforms combine into the encoding matrix E . In 2D+time problems N = N y N z N t and N s = N c , while in 1D problems N = N y , N s = N z N c , and N t = 1 . In this work, all vectors, such as m and x , are represented by bold lowercase letters, and all matrices or linear operators, such as C or F , are represented by bold uppercase letters.
When accelerated MRI by undersampling is used, the sampling pattern is included in the model as where S is the sampling function using SP (same for all coils) and m is the undersampled multi-coil k-space measurements (or k-t-space when N t > 1 ), with N s M elements, recalling that the AF is N/M. For reconstructions based on this model, we assumed that a central area of the k-space is fully sampled (such an area is used to compute coil sensitivities with auto-calibration methods, as in 77 ).
In parallel MRI methods that recover the multi-coil k-space directly, the undersampling formulation is given by (1) and the image-to-k-space forward model is not used, since one is interested in recovering missing k-space samples using e.g. structured low-rank models 23 . For this, the multi-coil k-space is lifted into a matrix H = H(m) , assumed to be a low-rank structured matrix. Lifting operators H(m) are slightly different across PI methods, exploiting different kinds of low-rank structure [7][8][9][10][11]23 .
Once all the samples of the k-space are recovered, the image can be computed by any coil combination 78,79 , such as: where m c is the measurements from coil c, F −1 c is the inverse 2D-FT for one coil and w n,c is the weight for spatial position n and coil c.
Reconstruction methods tested. We tested our proposed approach on five different reconstruction methods: Three one-frame parallel MRI methods (SENSE 75 , P-LORAKS 11 , and PI-CS with anisotropic TV 80,81 ) and two multi-frame low-rank and PI-CS methods for quantitative MRI 12 .
In P-LORAKS 11,82 the recovery from m produces: where the operator H s (m) produces a low-rank matrix and H s,r (m) produces a hard threshold version of the same matrix. P-LORAKS exploits consistency between the sampled k-space measurements and reconstructed measurements; it does not require a regularization parameter. Further, it does not need pre-computed coil sensitivities, nor fully sampled k-space areas for auto-calibration. SENSE, CS, or low-rank (LR) reconstruction 12 is given by: where is a regularization parameter. For SENSE, = 0 and no regularization is used. For CS and LR, we looked at the regularizations: P(x) = �Tx� 1 , with T the spatial finite differences (SFD); and low rank (LR), using nuclearnorm of x reordered as a Casorati matrix P(x) = �M(x)� * 83 .
(2) www.nature.com/scientificreports/ CS approaches using redundant dictionaries D in the synthesis models 24,84 , given by x = Du , can be written as: A dictionary to model exponential relaxation processes, like T 2 and T 1ρ , in MR relaxometry problems is the multi-exponential dictionary 12,85 . It generates a multicomponent relaxation decomposition 86 . The approximatelyequal symbol ≈ is used in (8) and (9), since the iterative algorithm for producing R x (�,m) , MFISTA-VA 87 in this paper, may stop before reaching the minimum. Criteria utilized in this paper. We work primarily with a criterion defined in the multi-coil k-space; see (2) and (3). This criterion is used by parallel MRI methods that recover the k-space components directly in a k-space interpolation fashion (and not in the image-space), such as P-LORAKS 11 and others 7,23,25 . Unless stated otherwise, the f (m, n) in (2) is The term m 2 2 normalizes the error, so that the cost function will not be dominated by datasets with a strong signal.
For image-based reconstruction methods (e.g., SENSE and multi-coil CS) using the model in (4), the (2) is replaced by ER x (�, S � m i ) , as defined, e.g., in (8) and (9). The approach used to obtain the coil sensitivity is part of the method.
Note that (3) can be modified for image-domain criteria as well, such as: where g x, y is a measurement of the distance between images x and y . In this case, the fully-sampled reference must be computed using a reconstruction algorithm, such as x i = R x (Ŵ, m i ) , and so it is dependent on to the parameters used in that algorithm.
Proposed data-driven optimization. Due to the high computational cost of greedy approaches for large SPs and the relatively low cost of predicting points that are good next candidates, we propose a new learning approach, similar to POSS 64,66,67 , but with new heuristics that significantly accelerates the subset selection. For a general description of POSS see 64 , Algorithm 14.2. In our proposed method, similarly to POSS, there is a sequential random selection of the elements to be changed. Differently from POSS, two heuristic rules, named the measure of importance (MI) and the positional constraints (PCs), are used to bias the selection of the elements with the intent to accelerate convergence. This is why our algorithm is named bias-accelerated subset selection (BASS). The MI (defined explicitly in (16) below) is a weight assigned to each element, indicating how much it is likely to contribute to decreasing the cost function. The PCs are rules for avoiding selecting undesirable elements, which may be one of two types: fixed or relative. Fixed positional constraints rule out the selection of an element because there is some prior reason for fixing its value (for example, elements used for auto-calibration are often considered to be such, an area of such elements is illustrated in Fig. 2, top right). Relative positional constraints are inspired by those used in the general combinatorial optimization approach called tabu search (TS) 88 , that had been demonstrated to be effective optimization approaches, in which a selection of some elements results in the forbidding of some otherwise legitimate selections in the same iteration. The rules that we have found efficacious in our application are that if an element with high MI is selected, then an adjacent element and also elements that are in complex-conjugated positions should not be selected in the same iteration. However, this does not prevent them to be selected in future iterations.
BASS, aims at finding (an approximation of) the ˆ of (3), is described in Algorithm 1. It uses the following user-defined items: • init is the initial SP for the algorithm. It may be any SP (a Poisson disk, a variable density or even empty SP). • L is the number of iterations in the training process.
• N is the number of positions in the fully-sampled set Ŵ.
• M is the desired size of the SP ( M < N).
• K init is the maximum (initial) number of elements to be added/removed per iteration ( K init < min(M, N − M)). • select-add(�, K, ρ a (K, M, N, l)) is a subset of Ŵ\�, specified below.
• F is an efficacy function; see (2) with the following.
-N i is the number of items in the training set.
. -R is the recovery algorithm from undersampled measurements.
• α is a reduction factor for the number of elements to be added/removed per iteration ( 0 < α < 1).
Selection of elements to be added to or removed from the SP. Elements of a and r are selected by the functions select-add and select-remove in similar ways, described in the following paragraphs.  We now define (and illustrate in Fig. 2) select-add(�, K, ρ a (K, M, N, l)) and select-remove(�, K, ρ r (K, M, l)) in Algorithm 1; they are used in steps 5, 6, and 7. Intuitively, the definitions should be such that the SP ′ after step 7 is an improved choice as compared to the SP . The number K of elements to be added/removed varies with iteration.
For For select-add, we define a measure of importance (MI) used in this work, for 1 ≤ k ≤ N , as referred to as the ε-map. The purpose of select-add is to select K elements from Ŵ\� in the following randomly-biased manner. First, an approximately ρ a × (N − M) number of elements are randomly preselected by Bernoulli trials with ρ a probability, whose value is the user-provided ρ a (K, M, N, l) (recall that K/(N − M) < ρ a (K, M, N, l) ≤ 1 ). To have more than K pre-selected points, we need ρ a > K/(N − M) . The K best points of the random pre-selected points will be chosen. The selection starts sequentially with the element with largest MI (the largest ε k ). Once this element is chosen, any other element identified as undesirable by the PC rules is excluded from the randomly pre-selected group, and the selection continues with the element with next largest MI. The chosen K elements are likely to be useful for the aim of (3). The probability ρ a indirectly controls the bias applied to the selected set. Larger probability implies less randomness and more bias. The probability varies with iteration l. For select-remove, a sequence with number of elements specified in (13), that are in , is generated in the same way, but using r k as the MI, instead of ε k : for 1 ≤ k ≤ N with δ a small constant to avoid zero/infinity in the defining of r k , which is referred to as the r-map. The idea of this MI is that a large reconstruction error in a sampled k-space position k, defined as The expensive part of select-add and select-remove is the computation of the recoveries given by R(�, S � m i ) , but this is done only once per iteration, for all N i images. These recoveries are also reused to calculate the cost F in line 10 of Algorithm 1. Figure 2 illustrates the steps of these functions using K = 50.

Datasets.
In our experiments we utilized three datasets. One dataset, denominated T 2 -brain, contains 40 brain T 2 -weighted images and k-space measurements from the fast MRI dataset of 89 . Of these, N i = 30 were used for training and N v = 10 for validation. The k-space measurements are of size (14) ′ =| | + | a | − | r |.  www.nature.com/scientificreports/ N y × N z × N t × N c = 320 × 320 × 1 × 16 , and the reconstructed images are N y × N z × N t = 320 × 320 × 1 .
With this dataset, we tested 2D SPs, of size N = 320 × 320 , and 1D SPs, of size N = 320 (see Fig. 1). We used 1D SPs with experiments with large numbers of iterations to compare BASS against POSS and greedy approaches. The fast MRI dataset is a public dataset composed of images and k-space data obtained with different acquisitions, not all of them are 3D acquisitions. In this sense, the experiments with 2D SPs in the T 2 -brain dataset are merely illustrative. The second dataset, T 1ρ -brain, contains T 1ρ -weighted k-space measurements of the brain, of size N y × N z × N t × N c = 128 × 148 × 1 × 20 , and the reconstructed images are N y × N z × N t = 128 × 148 × 1 . Unless otherwise stated, N i = 65 were used for training and N v = 16 for validation. This dataset and the next one were all acquired with the Cartesian 3D acquisitions as described in "The specific content of this paper", training and validation sets are composed of data from different individuals.
The third dataset, denominated T 1ρ -knee, contains T 1ρ -weighted knee images and k-space measurements for quantitative T 1ρ mapping, of size N y × N z × N t × N c = 128 × 64 × 10 × 15 , and the reconstructed images N y × N z × N t = 128 × 64 × 10 representing the cross-sections of the human knee, and 2D+time SPs of size N = 128 × 64 × 10 . Unless otherwise stated, N i = 30 were used for training and N v = 10 for validation. The k-space measurements for all images are normalized by the largest component. A reduced-size knee dataset uses only part of the T 1ρ -knee dataset. Images are of size N y × N z × N t = 128 × 64 × 1, and N i = 5 and N v = 5 . This dataset is used in experiments with a large number of iterations to compare BASS against POSS and greedy approaches for 2D SPs.
Reconstruction methods. For the T 2 -brain and T 1ρ -brain datasets, three reconstruction methods were used: • SENSE 75 : Multi-coil reconstruction, following Eq. (8) with = 0 , and minimized with conjugate gradient.

SENSE was used only for 1D SP comparisons between BASS, POSS and greedy approaches.
For the T 1ρ -weighted knee dataset, we used different methods: • CS-LR 12 : Multi-coil CS using nuclear-norm of the vector x reordered as a Casorati matrix P(x) = �M(x)� * and minimized with MFISTA-VA. • CS-DIC 12 : Multi-coil CS using synthesis approach following Eq. (9), using D as a multi-exponential dictionary 85 , and minimized with MFISTA-VA.
CS-SFD, CS-LR, and CS-DIC need a fully-sampled area for auto-calibration of coil sensitivities using ESPIRiT 77 . P-LORAKS does not use auto-calibration. All experiments were performed in Matlab, codes used in this manuscript are available in https:// cai2r. net/ resou rces/ softw are/ data-driven-learn ing-sampl ing-patte rn. The regularization parameter (the in (8) and (9)) required in CS-SFD, CS-LR, and CS-DIC was optimized independently for each type of SP (Poisson disk, variable density, combined variable density and Poisson disk, adaptive SP, or optimized) and each AF, using the same training data. The parameters of the recovery method R are assumed to be fixed during the learning process of the SP.
Optimizing parameters of Poisson disk, variable density, and adaptive SPs. Grid optimization with 50 realizations of each SP was performed, changing the parameters used to generate these SPs, to obtain the best realization of these SPs, which corresponds to the one that minimizes F(�) . This approach is the one used in 56 for minimizing F(�) . Poisson disk and variable density codes used in the experiments are at https:// github. com/ mohak patel/ Poiss on-Disc-Sampl ing and http:// mrsrl. stanf ord. edu/ ~jyche ng/ softw are. html. Combined Poisson disk and variable density SP from 44 and adaptive SPs from 45 were also used for comparison. The spectrum template obtained from the same training data was used with adaptive SPs. All these approaches can be considered data-driven approaches because optimization of the parameters to generate the SP was performed. They all have a fixed computational cost of 50N i image reconstructions (nearly the same computational cost as BASS).
Evaluation of the error. The quality of the results obtained with the SP was evaluated using the normalized root mean squared error (NRMSE): When not specified, the NRMSE shown was obtained from k-space on the validation set; results using imagedomain and the training set are also provided, as is structural similarity (SSIM) 90 in some cases. Vol.:(0123456789)

Results
Illustration of the convergence and choice of parameters. In Fig. 3a-c we compare BASS against POSS 66 and the greedy approach "learning-based lazy" (LB-L) 56 , adapted to the cost function in (2). The resulting NRMSEs are re-normalized by the initial values and show the difference in computational cost and quality between the approaches. Plots are scaled logarithmically in epochs (in each "epoch" all the images are reconstructed once). In Fig. 3a, it is shown the performance of the learning methods with 1D SP using T 2 -brain dataset and SENSE reconstruction, with AF = 4. In this example, BASS was faster than POSS and LB-L. Also, BASS and POSS obtained nearly same quality results, superior to LB-L. In Fig. 3b, the performance of the learning methods was tested in the same setup, but using CS-SFD reconstruction, with AF = 4. In this example, BASS was faster than POSS and LB-L, but all methods obtained nearly the same quality results. In Fig. 3c, the methods were compared with CS-SFD with the reduced-size knee dataset and 2D SPs, starting with the auto-calibration area and AF = 15. In this example BASS found a solution with same quality in the training set using only 433 epochs, around 50 times faster than LB-L (~ 21,000 epochs). Also, BASS and POSS can go on minimizing the cost function beyond the stopping point of LB-L finding even better SPs.  Fig. 3c, with K init =50). The improvement observable in the validation set ends quickly, at iteration 50 in this example. There is an arrow in the figure pointing to an efficient solution. Such a solution is obtained after relatively few iterations, during which most of the significant improvement observable with validation data has already happened. Iterating beyond this point essentially leads to marginal improvement, observable only with the training data.
In Fig. 3e we see the results of the learning process for the training data according to the parameters K init for CS-LR, AF = 20, using the knee dataset, with N i = 30 and N v = 10 . Note that large K init performs better than small K init in terms of speed of convergence in the beginning of the learning process. Over time, K reduces from K init towards K = 1.
The importance of large and diverse datasets to generate the learned sampling pattern for the specific class of images is illustrated in Fig. 3f, showing the convergence of the learning process with the validation set, in NRMSE. We used training sets of 1, 3, 10, 30, and 90 images. The validation sets were composed of the same 20 images, not used in any of the training sets.
The robustness of an efficient solution in the presence of variable initial SP is illustrated in Fig. 4. Figure 4a-d show three initial SPs: variable density (VD), Poisson disk (PD), empty except for a small central area (CA), and adaptive SP. Using 200 iterations of BASS for P-LORAKS with these initial SPs, corresponding efficient SPs were obtained; shown in Fig. 4e-h. There are minor differences among them (around 1% difference in NRMSE), but the central parts of the SPs are very similar.

Performance with various reconstruction methods. BASS improves NRMSE in image space for fixed
AFs when compared with the other SPs for the four reconstruction methods tested with 2D+time SPs. Figure 5a shows the NRMSE obtained by P-LORAKS with T 2 -brain dataset, comparing variable density SPs, Poisson disk SPs, adaptive SPs, combined variable density with Poisson disk SPs, and the optimized SPs. Figure 5b shows the NRMSE obtained by CS-SFD with T 2 -brain dataset. Figure 5c,d show P-LORAKS and CS-SFD with T 1ρ -brain, dataset. Figure 5e shows the NRMSE obtained by CS-LR with T 1ρ -knee dataset. Figure 5f shows the NRMSE obtained by CS-DIC with T 1ρ -knee dataset. All SP had their parameters optimized for each reconstruction method, dataset, and AF. Figure 6 illustrates on the T 2 -brain dataset how the optimized SPs improve the reconstructed images with P-LORAKS (for AF = 9) and CS-SFD (for AF = 16) against combined variable density and Poisson disk (VD + PD). The P-LORAKS methods had a visible improvement in SNR, the CS-SFD methods became less smooth with some structures more detailed. However, some structured error can still be seen in the error maps. Figure 6 also illustrates that optimized SPs are different for the two reconstruction methods, even when using the same images for training. Figure 7 illustrates on the images obtained with the T 1ρ -brain dataset with P-LORAKS (for AF = 5) and CS-SFD (for AF = 6), comparing optimized SP with variable density and adaptive SP. www.nature.com/scientificreports/ In Fig. 8, visual results with the T 1ρ -knee dataset illustrate the improvement due to using an optimized SP as compared to using combined variable density and Poisson disk SP, for both CS-LR and CS-DIC. We also see that the optimized SPs are different for the two reconstruction methods. Note that both optimized k-t-space SPs have a different sampling density over time (first, middle, and last time frames are shown), being more densely sampled at the beginning of the relaxation process. The auto-calibration region is in the first frame.
BASS with a different criterion. We illustrate that our proposed optimization approach is also efficacious with different criteria. In some applications, one may desire the best possible image quality, regardless of k-space measurements. Here we discuss the use of BASS to optimize the SSIM of 90 , an image-domain criterion. For that, the task in (3) of finding the minimizer of F(�) in (2), used in line 10 of the Algorithm 1, is replaced by finding the minimizer in (11), with g x, y the negative of the SSIM. In Fig. 9 we compare the optimization of SSIM with that of NRMSE, using P-LORAKS on the T 2 -brain dataset, AF = 10, starting with the Poisson disk SP.
T 1ρ mapping. We illustrate the performance of the optimized SPs for T 1ρ mapping. We compare the optimized SP against Poisson disk SP, previously used in 12 , for CS-LR. The SP and reconstructed images correspond to the cross-section of the knee, of size N y × N z × N t = 128 × 64 × 10 , the T 1ρ mapping is performed in the cartilage region on the longitudinal plane (in-plane) of the recovered 3D volume. The 3D+time volume has N x × N y × N z × N t = 256 × 128 × 64 × 10 voxels, where N x = 256 corresponds to the samples in the frequency-encoding direction, field-of-view of 130 mm × 130 mm × 130 mm , and in-plane (rounded) resolution of 0.5 mm × 1 mm , slice thickness of 2 mm , and 10 frames. In Fig. 10 we illustrate the results with T 1ρ mapping in the knee, particularly around the cartilage region. In Fig. 10a-c we show some illustrative T 1ρ maps. Prospective accelerated scans. We tested the optimized SP obtained with BASS in prospective CS scans, in Fig. 11. For an explanation of the usage of the word "prospective" in MRI, see 6 . We used the knee dataset for training the SP for CS-SFD at AF = 4. The images of size N y × N z × N t = 128 × 64 × 1 used for training compose the cross-session of the 3D volumes. Displayed images correspond to the longitudinal view of one slice of a 3D volume (which has size N x × N y × N z = 256 × 128 × 64 ), with in-plane resolution of 0.5 mm × 1 mm and slice thickness of 2 mm . The 15-channel coil measurements was obtained with the T 1ρ pulse sequence used in 12 , which is a T 1ρ magnetization prepared fast gradient-echo sequence 2 .

Discussion
The proposed approach delivers efficacious sampling patterns for high-resolution or quantitative parallel MRI problems. Compared to previous greedy approaches for parallel MRI, as in 56,57 , BASS is able to optimize much larger SPs, using larger datasets, spending less computational time than greedy approaches (Fig. 3a-  www.nature.com/scientificreports/ The proposed approach for subset selection is effective because it uses a smart selection of new elements in the SP updating process. Candidates that are most likely to reduce the cost function are tried first. The obtained efficient solution may have minor differences depending on the initial SP (Fig. 4), but the optimized SPs tend to have the same final quality if more iterations are used (Fig. 3d). Adding and removing multiple elements at each iteration is beneficial for fast convergence at the initial iterations (Fig. 3e).
The cost function in (2) evaluates the error in k-space, not in the image domain. This may not be sufficiently flexible because it does not allow the specification of regions of interest in the image domain. Nevertheless, improvements measured by the image-domain criteria NRMSE were observed (Fig. 5). In different MRI applications other criteria than (2) may be desired. The proposed algorithm can be used for other criteria, such as the SSIM (Fig. 9).
The optimized SP varies with the reconstruction method (Figs. 6, 7 and 8) or with the optimization criterion ( Fig. 9): thus sampling and reconstruction should be matched. This concept of matched sampling-reconstruction indicates that comparing different reconstruction methods with the same SP is not a fair approach, instead each MRI reconstruction method should be compared using its best possible SP. Note that the optimized SP improved the NRMSE by up to 45% in some cases (Fig. 5).
The experiments also show that optimizing the SP is more important at higher AFs. As seen in Fig. 5, the optimization of the SP flattened the curves of the error over AF, achieving a lower error with the same AF. For example, P-LORAKS with optimized SP at AF = 20 obtained the same level of NRMSE as with variable density SP at AF = 6, while CS-LR with optimized SP at AF = 30 obtained the same level as with Poisson disk SP at AF = 16, even after optimizing the parameters used to generate the Poisson disk SP. Thus it is possible to double the AF by optimizing the SP. Variable sampling rate over time is advantageous for T 1ρ mapping as seen in 91 ; it is interesting that the algorithm learned this, as shown in Figs. 8 and 10. It is also important to clarify that the results shown for variable density, Poisson disk, combined variable density and Poisson disk, and adaptive SP are the best obtained among a parameter optimization process spending 50 epochs. If a simple guess of the parameters for these SPs www.nature.com/scientificreports/ is used, then the performance of these SPs can be poor. In contrast, BASS found efficient SPs spending the same computational cost or less than that (10∼ 50 epochs in Fig. 3a-d).
The lower computational cost and rapid convergence speed of BASS bring the advantage of learning the optimal SP for various reconstruction methods considering the same anatomy. Thus one can have a better decision on which matched sampling and reconstruction is the most effective for specific anatomy and contrast at the desired AF. Many questions regarding the best way to sample in accelerated MRI can be answered with the help of machine learning algorithms such as BASS. Learned SPs are key elements in making higher AFs available in clinical scanners for translational research.

Conclusion
We proposed a data-driven approach for learning the sampling pattern in parallel MRI. It has a low computational cost and converges quickly, enabling the use of large datasets to optimize large sampling patterns, which is important for high-resolution Cartesian 3D-MRI and quantitative and dynamic MRI applications. The approach www.nature.com/scientificreports/ considers measurements for specific anatomy and assumes a specific reconstruction method. Our experiments show that the optimized SPs are different for different reconstruction methods, suggesting that matching the sampling to the reconstruction method is important. The approach improves the acceleration factor and helps with finding the best SP for reconstruction methods in various applications of parallel MRI.