This article has Open Peer Review reports available.
Significant difference between three observers in the assessment of intraepidermal nerve fiber density in skin biopsy
- Sigrid Wöpking†1,
- Andrea Scherens†1Email author,
- Ida S Haußleiter2,
- Helmut Richter1,
- Julia Schüning1,
- Sabrina Klauenberg1 and
- Christoph Maier1
© Wöpking et al; licensee BioMed Central Ltd. 2009
Received: 08 July 2008
Accepted: 31 March 2009
Published: 31 March 2009
The determination of Intraepidermal Nerve Fiber Density (IENFD) in skin biopsy is a useful method for the evaluation of different types of peripheral neuropathies. To allow a reliable use of the method it is necessary to determine interobserver reliability. Previous studies dealing with this topic used limited suitable statistical methods.
In the present study three observers determined the IENFD and estimated the staining quality of the basement membrane for an adequate quantity of 120 skin biopsies (stained with indirect immunofluorescence technique) from 68 patients. More adequate statistical methods like intraclass correlation coefficient and Bland Altman Plot were chosen to estimate interobserver reliability.
We found an unexpected significant difference in IENFD between the observers (p < 0.05) and so the results of this study are not in line with the high interobserver reliability reported before (intraclass correlation coefficient: 0.73). The Bland Altmann Plot showed a variance growing with rising mean. The difference in IENFD between the observers and the resulting low interobserver reliability is likely caused by different interpretations of the standard counting rules. There was no significant difference in IENFD between observers for biopsies with a well-defined basement membrane. Thus skin biopsies with an inexactly defined basement membrane should not be used diagnostically for the determination of IENFD.
These results emphasise that standardisation of the method is extremely important and at least two observers should analyse skin biopsies with critical IENFD near the cut-off values.
Despite the fact that numerous patients in pain or neurology departments are admitted for typical neuropathic symptoms such as paraesthesia and dysaesthesia the conventional diagnostic methods such as nerve conduction studies and electromyography often do not show pathological findings [1–4]. Immunohistochemical illustration of the intraepidermal nerve fibers (IENF) in skin biopsy and quantitative sensory testing (QST) are two new diagnostic methods to objectify the disorders of some of these patients . In 2005 Lauria et al. published the guidelines of the European Federation of Neurological Societies (EFNS) on the use of skin biopsy and the determination of IENF density (IENFD) in the diagnosis of peripheral neuropathy . For a reliable use of this method, a check of methodical quality criteria is essential. Especially reliability as a degree of methodical accuracy has to be determined, e.g. by calculating the interobserver reliability. Therefore two or more observers conduct the same test and their accordance is subsequently analysed.
A few previous studies deal with interobserver reliability [7–10]. Some of them by calculating the correlation coefficients [7, 10]. The value of such correlation coefficients to determine interobserver reliability is limited, since level differences remain unnoticed and extreme values can pretend a higher reliability . Smith et al. calculated the intraclass correlation coefficient and the relative intertrial variability (RIV) to determine interobserver reliability. Special calculations apply for the RIV ([(IENFD1-IENFD2)/MW (IENFD)] *100 [%]) and values less than 10% indicate a high degree of reproducibility. Small absolute differences at low IENFD values are presented as high percentage values while equivalent absolute differences at high IENFD values are being presented as lower percentage values . This approach can lead to an incorrect estimation of the reliability. Gøransson et al. estimated the interobserver reliability by calculating the absolute difference between the IENFD results of two observers.
Due to the limited suitability of the statistical methods so far applied, there is still some need to adequately demonstrate interobserver reliability of the IENFD determination by skin biopsy. To achieve this, three independent observers analysed a sufficient quantity of biopsies. Additionally, more appropriate statistical methods were chosen in order to confirm a reliable use of the skin biopsy in clinical diagnostics.
Demographic characteristics of patients with polyneuropathy, nerve injury at the lower limb, fibromyalgia and arthritis
Number of subjects (n)
Age (year), Mean ± SD
58.96 ± 10.62
41 ± 13.29
49.06 ± 10.25
57.23 ± 10.19
52.31 ± 13.07
Age (year), Range
Sex, Male (n)
2.2 Skin biopsy
The procedure of skin biopsy followed the protocol by Vlckova-Moravcova et al. , as a modified version of the original Guidelines of the EFNS . Indirect immunofluorescence technique was used. Two samples were taken from each patient, one from the affected and one from an unaffected skin area. In patients with polyneuropathy, fibromyalgia and arthritis biopsies were therefore carried out from dorso-lateral foot and back (dermatome L4). The very distal biopsy site at the foot was chosen because all patients had complaints at this area, but not all had complaints at the lower leg, which would be the standard biopsy site recommended by the EFNS guidelines. As a level A recommendation those guidelines also suggest the sampling of an additional biopsy from an unaffected site in patients with generalised diseases to provide information about a length-dependent process. L4 dermatome was assessed as a second area, which was the least affected area in most of the patients. In patients with nerve injury biopsies were carried out bilaterally from foot (dorsolateral or dorsomedial) or lateral thigh. After local injection of 2% lidocaine the removal was carried out under sterile conditions with a 3 mm biopsy punch (Stiefel GmbH, Offenbach, Germany). Tissue was fixed in 4% phosphate-buffered paraformaldehyde for 3–4 hours and cryoprotected in 10% sucrose at 4°C overnight. Subsequently the skin samples were embedded in TissueTek®, frozen in 2-methylbutane cooled in liquid nitrogen and stored at 70°C until further processing. Sections of 40 μm thickness were cut on a sliding microtome and immunostained with rabbit polyclonal antibodies to human PGP 9.5 (Ultraclone, UK, 1:800) as primary antibody and marked with Cyanine 3 (Jackson Immuno Research, USA). The intraepidermal nerve fibers were counted manually in two sections of approximately 3 mm length each by three independent observers (MF, ISH, SW), who were professionally trained at an approved skin biopsy laboratory (Department of Neurology, University of Würzburg, Germany). Counting was conducted in a blinded fashion to determine interobserver reliability at 400× magnification with a Zeiss Axiophot 2 microscope adhering to standard counting rules , agreed on by the European guidelines 2005 . Samples were only evaluated if the staining quality of both sections were judged to be satisfactory by all observers (e.g. distinct discrimination of dermis and epidermis, clearly illustrated nerve fibers). Samples were excluded for the determination of interobserver reliability if they were judged to be of bad quality for counting by at least one observer (e.g. nerve fibers or basement membrane stained badly). Using Image Pro Plus 4.0 software (Media Cybernetics, Leiden, The Netherlands), the epidermal length was accurately measured. The average intraepidermal nerve fiber density (IENFD) per mm of epidermal length was then calculated. IENFD results from biopsies taken from the foot were compared with published control data  as done in a previous study  and classified as pathologic in case of IENFD less than 9 fibers/mm.
Additionally every observer evaluated the definition of the basement membrane in each biopsy, classifying it as 'well', 'moderately' or 'inexactly' defined. In summary the basement membrane was rated 'well defined' if at least two observers ranked it so.
2.3 Data analysis
All statistical analyses were performed using the Statistica software package, release 7.1 for Windows (StatSoft Inc., USA) and the statistical package for social sciences (SPSS 12). Differences between observers were analysed using a one-way analysis of variance (ANOVA). Due to the unprovable homogeneity of variance post hoc comparisons were calculated using Dunnet T3 post hoc tests. P values < 0.05 were considered significant. Since IENFD from adjacent sections of one biopsy showed a high degree of association , the accuracy of each observer was estimated by calculating the standard deviation between both sections of one biopsy (intersection variability) and the relative standard deviation (SD/mean). To demonstrate the variance growing with rising mean the results were presented as Bland Altman Plot . Interobserver reliability was measured by calculating intraclass correlation coefficient with absolute agreement definition . To compare the results of this study with those of previous studies correlation coefficients and RIV were also measured. For the RIV applies ([(IENFD1-IENFD2)/MW (IENFD)] *100 [%]) and values of less than 10% indicate a high degree of reproducibility .
A total of 120 biopsies from 68 patients (polyneuropathy: n = 44; nerve injury at the lower limb: n = 25; fibromyalgia: n = 30; athritis: n = 21) were analysed. 16 biopsies had to be excluded due to bad quality.
IENFD by different observers for skin biopsy sites foot, back and thigh and total data
Skin biopsy site
Foot, n = 71
Mean ± SD
4.88 ± 4.1
5.16 ± 4.00
3.57 ± 2.56
Inter section SD
Relative inter section SD
Back, n = 45
Mean ± SD
15.97 ± 11.63
17.74 ± 11.73
12.20 ± 4.78
Inter section SD
Relative inter section SD
Thigh, n = 4
Mean ± SD
4.15 ± 2.49
5.69 ± 4.21
2.12 ± 2.68
Total, n = 120
Mean ± SD
9.01 ± 9.48
9.89 ± 9.93
6.76 ± 5.52
Inter section SD
Relative inter section SD
The intersection variability differed significantly between the observers for the foot data and the complete data. Observer 3 had the lowest values in contrast to observer 2 who had the highest ones (table 2). In this case the Post Hoc tests revealed a significant difference between observer 3 and both other observers for the foot data. However, with respect to the overall data the intersection variability differed significantly between observer 2 and 3.
The comparison of IENFD results of 71 foot biopsies with published control data showed that the significant interobserver difference would generate different rates of pathological results. The results from observer 3 would add up to 68 pathological biopsies in opposition to the other observers with lower numbers of pathological biopsies (62 and 63 respectively). Since the control data were taken from the distal calf  the accuracy of the comparison results might be limited.
IENFD by different observers for punches with well defined dermal-epidermal basement membrane (n = 35).
Mean ± SD
8.74 ± 7.72
9.82 ± 7.93
6.85 ± 5.2
Our results revealed an unexpected significant difference in IENFD between three observers. Despite having received the same training, the three observers most likely interpreted the standard counting rules  differently.
Since we found the lowest values of intersection variability and therefore the highest accuracy for the observer stating the lowest IENFD values, the strict interpretation might be more reliable.
Other groups stated higher interobserver reliability with correlation coefficients ranging from 0.86–0.96 [7, 10]. In further studies the RIV was 9.6%, the intraclass correlation coefficient 0.98  and the mean difference between the IENFD results of two observers 0.4 ± 1.5 fibers/mm . The low interobserver reliability in our study was probably caused by the described significant interobserver difference in IENFD. Additionally we might have found higher interobserver reliability by counting three sections as recommended by the EFNS Guidelines . Considering the pronounced significant difference between the observers in our study, the results would have probably been similar. Furthermore an accessory analysis of intra-observer reliability would allow a more accurate interpretation of the interobserver reliability.
The qualitative evaluation of the basement membrane before counting the intraepidermal nerve fibers could be an approach to improve the methodical accuracy. The results allow the conclusion that interobserver reliability is higher if the basement membrane is well defined. Consequently skin biopsies with inexact illustration of the basement membrane should not be used for the determination of IENFD in clinical diagnostics and scientific studies. However the number of biopsies with a well defined basement membrane was quite small in our study and there was only a little improvement of interobserver reliability.
Another possibility to avoid inaccurate IENF counting due to an inexactly defined basement membrane might be the use of antibodies against collagen IV with confocal microscopy to better visualise the basement membrane .
In summary, the determination of IENFD by skin biopsy is a useful method to investigate different types of peripheral neuropathy , but our results show that standardisation of the method is extremely important. However the number of biopsies was quite small in our study and we used a modified version of the original Guidelines of the EFNS. Therefore our results are limited to a small number of patients but lead us to following conclusion. To avoid inaccurate IENFD counting, clear inclusion and exclusion criteria for skin biopsy samples should be further defined. The EFNS Guidelines  recommend the application of the counting protocol which was described by Kennedy et al . Our results show that a consensus should be reached on the interpretation of the counting rules in biopsies with less accurate illustration of the skin innervation. We recommend that observers undergo thorough training and intraobserver reliability must be demonstrated by intra-lab assessment to avoid different interpretation of the counting rules by individuals. Nevertheless IENFD counting may still be a subjective investigation partially. Skin biopsies with critical IENFD values (IENFD near the cut-off values) should be analysed by at least two observers together. Furthermore, mandatory external quality controls of skin biopsy laboratories e.g. by interlaboratory comparison should be enforced. Whilst in experienced laboratories the interobserver reliability may not an issue, consensus data is still needed for application to all labs.
This work is part of the doctoral thesis of SW.
Authors would like to thank Michaela Fey for preparing and staining the biopsies and Lars Eichler, Andreas Engelhardt and Kathryn Paterson for their reviews of the English text version.
- Holland NR, Crawford TO, Hauer P, Cornblath DR, Griffin JW, McArthur JC: Small-fiber sensory neuropathies: clinical course and neuropathology of idiopathic cases. Ann Neurol. 1998, 44: 47-59. 10.1002/ana.410440111.View ArticlePubMedGoogle Scholar
- Gibbons CH, Griffin JW, Polydefkis M, Bonyhay I, Brown A, Hauer PE, Mc Arthur JC: The utility of skin biopsy for prediction of progression in suspected small fiber neuropathy. Neurology. 2006, 66: 256-8. 10.1212/01.wnl.0000194314.86486.a2.View ArticlePubMedGoogle Scholar
- Fink E, Oaklander AL: Small-fiber neuropathy: answering the burning questions. Sci Aging Knowledge Environ. 2006, 2006 (6): pe7-10.1126/sageke.2006.6.pe7.View ArticlePubMedGoogle Scholar
- Scherens A, Maier C, Haussleiter IS, Schwenkreis P, Vlckova-Moravcova E, Baron R, Sommer C: Painful or painless lower limb dysesthesias are highly predictive of peripheral neuropathy: comparison of different diagnostic modalities. Eur J Pain. 2008, 10.1016/j.ejpain.2008.07.014.Google Scholar
- Løseth S, Lindal S, Stalberg E, Mellgren SI: Intraepidermal nerve fibre density, quantitative sensory testing and nerve conduction studies in a patient material with symptoms and signs of sensory polyneuropathy. Eur J Neurol. 2006, 13: 105-11. 10.1111/j.1468-1331.2006.01232.x.View ArticlePubMedGoogle Scholar
- Lauria G, Cornblath DR, Johansson O, McArthur JC, Mellgren SI, Nolano M, Rosenberg N, Sommer C, European Federation of Neurological Societies: EFNS guidelines on the use of skin biopsy in the diagnosis of peripheral neuropathy. Eur J Neurol. 2005, 12: 747-58. 10.1111/j.1468-1331.2005.01260.x.View ArticlePubMedGoogle Scholar
- McArthur JC, Stocks EA, Hauer P, Cornblath DR, Griffin JW: Epidermal nerve fiber density: normative reference range and diagnostic efficiency. Arch Neurol. 1998, 55: 1513-20. 10.1001/archneur.55.12.1513.View ArticlePubMedGoogle Scholar
- Gøransson LG, Mellgren SI, Lindal S, Omdal R: The effect of age and gender on epidermal nerve fiber density. Neurology. 2004, 62: 774-7.View ArticlePubMedGoogle Scholar
- Smith AG, Howard JR, Kroll JR, Ramachandran P, Hauer P, Singleton JR, McArthur J: The reliability of skin biopsy with measurement of intraepidermal nerve fiber density. J Neurol Sci. 2005, 228: 65-9. 10.1016/j.jns.2004.09.032.View ArticlePubMedGoogle Scholar
- Koskinen M, Hietaharju A, Kyläniemi M, Peltola J, Rantala I, Udd B, Haapasalo H: A quantitative method for the assessment of intraepidermal nerve fibers in small-fiber neuropathy. J Neurol. 2005, 252: 789-94. 10.1007/s00415-005-0743-x.View ArticlePubMedGoogle Scholar
- Weiß C: Die Korrelationsanalyse. Basiswissen Medizinische Statistik. Edited by: Weiß C. 2002, Germany: Springer Verlag, 79-82. 2Google Scholar
- Vlcková-Moravcová E, Bednarik J, Dusek L, Toyka KV, Sommer C: Diagnostic validity of epidermal nerve fiber densities in painful sensory neuropathies. Muscle Nerve. 2008, 37: 50-60. 10.1002/mus.20889.View ArticlePubMedGoogle Scholar
- Kennedy WR, Wendelschafter-Crabb G, Polydefkis M, Mc Arthur J: Pathology and quantitation of cutaneous nerves. Peripheral Neuropathy. Edited by: Dyck PJ, Thomas PK. 2005, Philadelphia: Saunders, 869-896.View ArticleGoogle Scholar
- Bland JM, Altman DG: Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986, 1: 307-10.View ArticlePubMedGoogle Scholar
- Shrout PE, Fleiss JL: Intraclass Correlations: Uses in Assessing Rater Reliability. Psychological Bulletin. 1979, 86: 420-428. 10.1037/0033-2909.86.2.420.View ArticlePubMedGoogle Scholar
- Sommer C, Lauria G: Skin biopsy in the management of peripheral neuropathy. Lancet Neurol. 2007, 6: 632-42. 10.1016/S1474-4422(07)70172-2.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2377/9/13/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.