Main

Steatosis is a common finding in liver biopsies both clinically and in experimental studies and the extent of fat accumulation is often asked for. The degree of steatosis is commonly assessed by visually and semiquantitatively estimating the area of the section of a needle biopsy that is occupied by fat vacuoles.1 The result is often expressed as ‘the percent of hepatocytes in the biopsy involved’ and can be 0–3 where 0 is none, 1 is up to 33%, 2 is 33–66% and 3 is >66%2 and such grading is also recommended by the American Gastroenterological Association when assessing nonalcoholic fatty liver disease.3 However, it should be pointed out that actual counting of hepatocytes with fat vacuoles is not performed and that the basis of the grading is the estimated area of fat vacuoles (Brunt EM, 2003, personal communication).

In several studies, objective computer-assisted methods4, 5 and stereological point counting6, 7 have been used to quantify the degree of steatosis. These studies have shown that usually not more than 10–20% of the sectional area or volume density is occupied by fat vacuoles.4, 5 The conception of the amount of fat in the liver therefore appears to be greatly overemphasized when graded semiquantitatively. Furthermore, there is a substantial inter- and intra-individual variation regarding the degree of steatosis using semiquantitative scoring.4, 8, 9, 10, 11

The aim of the present study was to assess in further detail the discrepancies between semiquantitative scoring of liver steatosis and morphometric/stereological assessment of the area fraction of steatosis, which equals the volume fraction of steatosis. The reproducibility of the two techniques was also determined using kappa analysis and the intraclass correlation.

Materials and methods

A total of 75 liver biopsies from archived slides stained with hematoxylin–eosin were used. They were selected according to the original grade of steatosis diagnosed, twenty-five of each slight, moderate and severe steatosis (grades 1, 2 and 3). The local ethics committee in Örebro, Sweden approved the study.

The specimens were blinded and evaluated twice regarding grade of steatosis (both macro- and microvesicular) by the pathologist (LF). The interval between the evaluations was 2 months. The degree of steatosis was graded 0–3 based on area of the section of a needle biopsy that was occupied by fat vacuoles.1 Biopsies without steatosis were not included in this study and therefore no specimens have score 0.

A Leica DMRXA 2 microscope with a Leica DC 200 digital camera was used for image capturing. In all, 10 images from each biopsy were captured and stored in a computer using the software Adobe Photoshop 6.0. The first field of view was chosen in the end of the biopsy closest to the end of the microscopic slide. After the first image had been grabbed, the next field of view was chosen by moving along the length axis of the biopsy 1.25 fields of view in order not get overlapping images for evaluation. This procedure was continued until 10 images had been grabbed. A point grid consisting of 100 crosses, 35 μm apart, was superimposed on each image. The final magnification on the computer screen when counting was × 400. The number of hits on fat vacuoles in hepatocytes (including both macro- and microvesicular) and normal hepatocytes was counted. Ballooned hepatocytes were omitted. They can almost always be separated from macrovesicular steatotic cells due to centrally placed nuclei and fragmented membranes or cell organelles in the cytoplasm. Hits on damaged tissue and larger areas with connective tissue were excluded. The results are given as the percentage of biopsy area with fat deposition. Images from 20 randomly chosen specimens were recounted to assess the reproducibility of the point counting in the same images.

In all, 20 specimens were then selected randomly and a new set of 10 images was captured from each of these specimens. These new images were counted as above and the results were used to assess the reproducibility of the point counting technique when new images were resampled.

Statistics

Agreement for the scoring results was analysed by the kappa coefficient, both in the unweigthed and the weigthed form, the weights chosen as quadratic weights.12 The percentage of absolute agreement was also calculated. The estimates of agreement were supplemented with 95% confidence intervals (95% CI) to account for sampling variability.

For the point counting technique, which is a purely quantitative variable, we calculated the intraclass correlation coefficient (ICC), to estimate agreement,13 supplemented with 95% CI. The formulas for and the interpretation of the weighted kappa with quadratic weights and the ICC are similar as stated by Fleiss,12 and values close to or above 0.75 signify very well to excellent agreement.

Results

The evaluation of the degree of steatosis by the semiquantitative approach resulted in the following scoring: 21 specimens obtained score 1, 20 specimens score 2 and 34 specimens score 3. This is to compare with the original scoring with 25 specimens in each score group.

The mean values of the point counting as well as the minimum and maximum values for each scoring group are shown in Table 1. The coefficients of variation regarding the percentage of the fat vacuole density between images taken from the same biopsy was calculated and found to be high (Table 1). This indicates an uneven distribution of steatosis in individual liver specimens. Since these coefficients of variation decreased with higher steatosis score, the uneven distribution of steatosis was especially seen with low-grade steatosis. When the scoring results were correlated to those obtained with the point counting technique (Figure 1), a polynomial of the second degree gave a good fit to the data.

Table 1 Basic characteristics of the point counting technique, stratified for three grades of the scoring results
Figure 1
figure 1

Scatterplot for point counting technique vs scoring results. Second degree polynomial fitted to the observations, – – – – – – –.

A substantial overlap (n=18) was found regarding score groups 2 and 3 (Figure 1). The overlapping specimens were re-examined and compared in order to find some morphological characteristic that made them scored to either group irrespective of the amount of fat globules measured with the point counting. No such features could be found. The degree of fibrosis did not differ between the groups 2 and 3 or between the specimens that were overlapping (data not shown). Nor were there any differences regarding the coefficients of variation, that is, uneven distribution of fat, in the specimens that were overlapping. No specimens were overlapping between degrees 1 and 2 (Figure 1).

The semiquantitative scoring was performed twice, 2 months apart and, the agreement was 81% (95% CI 72–90%) and the unweighted kappa was 0.71 (95% CI 0.58–0.85). Weighted kappa with quadratic weights was 0.87 (95% CI 0.81–0.94). When the images of 20 randomly chosen specimens were reassessed by point counting and compared to the initial counting, the ICC value was 0.99 (95% CI 0.98–1.00). In all, 10 images from each of 20 randomly chosen specimens were captured a second time and the concordance calculated. The ICC value was found to be 0.95 (95% CI 0.87–1.00).

Discussion

Steatosis is a frequent finding in liver biopsies and the extent of involvement is often asked for. Traditionally the degree of steatosis is assessed through a visual and semiquantitative estimation of the area of the section occupied by fat globules in hepatocytes1 or the ‘percent of hepatocytes’ in the biopsy involved.2 An alternate approach is to assess the area of the section involved with steatosis, either by stereological point counting or by image analysis with thresholding of areas with fat globules. In the present study, we scrutinized the discrepancies between semiquantitative grading and point counting and the reproducibility of the methods.

A few earlier studies have considered and compared the semiquantitative and the quantitative approaches to assess the degree of steatosis in liver biopsies. Auger et al4 used 2 μm thick sections from plastic-embedded tissue stained with hematoxyline–eosin. The automated analysis was based on thresholding of unstained areas, automatic omission of sinusoidal empty spaces with a ‘form factor’ and manual exclusion of vessels with red blood cells. The semiquantitative grading was performed by two pathologists estimating the percentage of fatty hepatocytes using a 10-graded scale. They found that the pathologists scoring varied between 0 and 80% whereas the automatic calculated densities were much lower and varied between 0 and 15%. Kumar et al5 studied biopsies from patients with hepatitis C and their highest semiquantitative score was 2. They found that the maximum value of score 2 was approximately 11% area of the biopsy occupied by fat. The correlation between per cent steatosis and histological grade was high (r=0.87). In this study, we found a mean fat density value of 23.1% and a maximum value of 45.3% in the score 3 group. These studies thus show that the semiquantitative estimates greatly overemphasize the true density of fat accumulation in the liver tissue.

The quantitative methods used for the estimation of fat accumulation in liver have been image analysis with thresholding and stereological point counting. The thresholding technique demarcates the unstained areas in the sections and these are mostly fat globules. One drawback with this approach is that sinusoids and, vessel and bile duct lumina also remain unstained. The sinusoids can be excluded automatically since their ‘form factor’ obtained in image analysis differ significantly from fat globules but vessels and bile duct lumina have to be manually excluded from the measuring area.4, 5 One way to circumvent this problem would be to use osmium tetroxide as a fat-specific stain and embed the tissue in resin. With this procedure fat is stained black.14 However, this requires that the biopsies are handled separately from routine biopsy material and one also has to consider that osmium tetroxide is hazardous to handle.

The point counting technique has also been used to quantify liver steatosis.6, 7, 15 The technique is simple and can be performed either by feeding images into the computer and use an overlay lattice as in this study or supplying one eye-piece of the microscope with a graticule with a point lattice and do the counting directly in the microscope. Different structures in the liver are identified immediately and can be included or excluded from counting. Another advantage with the point counting technique is that the area of fibrosis can be assessed at the same time by counting points hitting the connective tissue. We therefore prefer point counting technique since it is simpler, no camera for capturing images and computer is needed, it measures the area/volume fraction of fat and do not require intervention such as manual exclusion of structures.

Reproducibility is of great importance in all quantification. Semiquantitative scoring methods mostly show moderate or low reproducibility and high inter- and intraobserver variability. A total of 41 morphological characteristics were evaluated by two pathologists in liver biopsies from 362 alcoholic patients.8 The kappa values ranged from 0.13 (size of liver cell) to 1 (presence of hepatocellular carcinoma). Degree of steatosis was agreed upon in 48% of the cases and the kappa value was 0.35. The interobserver variation for steatosis was k=0.63 in a study by the French METAVIR study group9 and others have presented interindividual kappa values for steatosis of 0.50810 and 0.64.11 The intraindividual variation also shows low kappa values for the degree of steatosis, kappa=0.429 and 0.63–0.65.11 In the present study, the agreement between the two semiquantitative evaluations was 81% and the unweighed kappa value 0.71. This value is similar or slightly higher than previously reported. The ICC value, which is equivalent to the kappa value, was 0.99 when 25 cases were re-evaluated with point counting. When sets of new images were captured randomly from the 25 specimens and the point counting results were compared to those of the first evaluation, the ICC value was still very high, 0.95. These results thus show that the point counting technique has a very high reproducibility and is superior to the semiquantitative scoring methods.

A very precise measurement of the degree of liver steatosis is mostly of interest in experimental studies and usually not needed in clinical practice. However, in orthoptic liver transplantation, the degree of steatosis of the donor transplant significantly influences the outcome16, 17 possibly due to impairment of hepatic microcirculation.18 A retrospective study of steatosis in cadaveric liver transplants was recently performed by Marsman et al19 in which they compared automated measurement of steatosis with the semiquantitative grading of a pathologist. As has been shown earlier and discussed above, the pathologist was found to constantly overestimate the degree of steatosis. They conclude that an automated analysis system can be used to determine the fat content in liver biopsies but that further studies are needed to determine the role of such a technique in the evaluation of donor livers for transplantation. They used specimens stained with hematoxyline–eosin as in earlier studies with automated analysis4, 5 but nothing is mentioned how hollow structures with lumina such as sinusoids, vessels and bile ducts were omitted from the measurements.

In conclusion, the present study highlights the difference in reproducibility of scoring methods and point counting methods when assessing the degree of steatosis in liver biopsies. The point counting technique is simple and superior to scoring techniques and is also preferable to the image analysis thresholding method. Point counting can be used when accurate measurements of liver steatosis are required such as in liver biopsies from prospective transplant livers and in studies involving histological evaluation of the liver pathology.