Introduction
Manipulation of body composition is a common goal in a variety of pursuits, but how do we accurately and objectively measure and track changes over time? Body composition devices and techniques offer a solution here. However, people often place undue confidence in both the accuracy and precision of body composition values that they get from techniques that are regarded as high-quality.
The most significant challenge when it comes to evaluating the effectiveness of body composition assessment devices or techniques is determining the validity, or how close to the “true” value, the techniques get us. Unfortunately, a person’s true body composition is a practically unknowable value outside of post-mortem dissection, which is a particularly unpleasant technique for the living. Instead, we have to rely on comparisons to reference techniques (usually DXA, MRI, or hydrostatic weighing) to evaluate validity. When we talk about typical error rates for a particular device or technique, most people usually think of the error rate as the magnitude that the measurement might diverge from their “true” body composition. But as we’ll see, every assessment technique, even the “gold standard” or reference techniques make certain assumptions that may introduce error. This is important to keep in mind as we go through the different assessment techniques because the standard that a technique is compared to will influence its perceived accuracy. To complicate matters further, the aspects of someone’s physiology (excluding body fat %) that can influence the accuracy of a measurement (body water, density of tissues, and others) are not static entities. This means that the same measurement method on the same person but on a different day may be closer or further from their “true” values depending on how these physiological factors present on those days.
In addition to validity(accuracy), another important consideration we want to keep in mind is reliability(precision). Essentially, we want to know if a technique will give us consistent results regardless of how close it is to reality. For example, if subject A has a true body fat % of 25% and an assessment technique gives us 19.5%, 19.7%, and 19.6% on three separate measuring occasions, that technique has good precision but not great accuracy. Depending on the technique, there are several factors that can influence both accuracy and precision including shifts in hydration status, the technician performing the assessment, how recently exercise was performed, the prediction equation used, and others.
To recap, for each of the techniques we cover here we should be thinking of the following things:
- Validity: How close are body composition predictions when compared to criterion or reference techniques? What is the measurement method being compared to in order to determine error rates?
- Reliability: Are we getting similar values for the same person under the same conditions? How small of a change can we consistently detect?
- What are the possible sources of error and how can they be controlled?
- How does the device/technique transform the raw data into predictions of body composition?
Body Composition Models
Every assessment method we cover here can be categorized broadly by how many components it divides the body into.
Two-component models divide the body into fat mass (FM) and fat-free mass (FFM). This is the most basic model and gives us the least insight into overall body composition, as the FFM value from this model does not differentiate between muscle mass, bone mass, or total body water.
Two component model: Body mass = FFM + FM
Three-component models improve on this lack of granularity by breaking FFM into two additional components, total body water (TBW) and fat-free dry mass (FFDM). FFDM includes protein, glycogen, and mineral in bone and soft tissue.
Three component model: Body mass = TBW + FFDM + FM
Four-component models are simply an extension of three-component models, in that they further reduce FFDM into bone mineral (BM) and the residual (whatever is leftover of FFDM after BM is measured).
Four component model: Body mass = TBW + BM + FM + residual
Four component models require the use of more than one device to measure the different components. For example, using DXA to obtain bone mineral values and bioelectrical impedance to obtain total body water values and combining these into a more robust prediction equation. Even more techniques can be combined to achieve up to 6-component models. This approach can help mitigate the limitations of any one technique, by reducing the amount of assumptions that go into the final body composition readings. However, for most people this is not a practical way of monitoring body composition over time, so we will limit our review today to the techniques we most often see in the real world.
Most Common Assessments Methods
Skinfold Anthropometry
Overview
Skinfold anthropometry is a two-component model (gives predictions of fat mass and fat-free mass) that involves the use of a caliper to measure a double fold of gripped skin, over a range of differing sites to establish an overall measurement of subcutaneous adiposity. The sum of those skinfold measurements are then plugged into one of over 100 equations to obtain an estimate of fat mass and fat-free mass. This method is relatively common in non-lab settings due to the minimal requirement for equipment and low financial burden.
Validity
As we mentioned above, there are so many equations available for skinfold measurements that you can get pretty large variation in predicted body fat % from the same initial measurements. Using an equation that is specific to the population you are working with increases validity, but it is still limited. Overall, we can expect up to a ~5% error rate when compared to DXA, although error rates vary widely depending on equations used and experience of the measurer.
Reliability
One way you could use skinfold measurements to enhance reliability if you were just looking for changes over time as opposed to an accurate body fat % value, is to simply track the raw sum of skinfolds value. By not taking that extra step of entering that value into an equation (which gives you the body fat % prediction) you can bypass many of the sources of potential error and assumptions in those equations. If you can also have the same person, who is trained for this(preferably ISAK certified), to do the measurement every time, skinfold becomes a pretty reliable method of monitoring changes in body fat over time. In fact, when a trained measurer is taking skinfolds of an individual on two consecutive days, reliability is even greater than most other methods, including DXA and BODPOD. This is due to the fact that skinfold anthropometry is the method least affected by intra individual variance such as when exercise was last performed, recent meals, and changes in hydration status.
Due to the excellent reliability when using the sum of skinfolds, skinfold anthropometry remains a useful method for measuring changes over time, especially when those changes are expected to be small, such as in well-trained populations.
Bioelectrical Impedance Analysis/Spectroscopy (BIA/BIS)
Overview
This is probably the method that most people are familiar with, as there are very affordable products using this method on the market today. The wide range in quality of those products makes it difficult to discuss in a broad sense, so we will just cover the overall concept and limitations of this method. This is a three-component model, which can be further categorized by the number of frequencies used for analysis. Single frequency devices (BIA) use electrodes in contact with the hands only, while multiple frequency methods (BIS) include hand and foot contact with electrodes. Multiple frequency devices are generally considered higher quality.
As a reminder, three-component models estimate fat-free mass, fat mass, and total body water. To estimate total body water, currents originating in the electrodes send a small voltage through the body, and based on the resistance to the flow of that current, estimations of total body water are attained. Fat-free mass contains more water and provides less resistance than fat mass, and so by measuring the time it takes the current to travel through the body, prediction equations can provide estimations of fat-mass and fat-free mass.
Validity
Again, giving specific error rates for this technique is going to be difficult given the wide range of devices that use it, but it is not uncommon to see error rates in the ~5% range when compared to hydrostatic weighing or DXA. That means if DXA puts you at 20% body fat, BIA/BIS can put you anywhere from 15-25% on average, although body fat is usually underestimated when compared to DXA.
One source of possible error in this method is that it assumes a constant hydration of fat-free mass of 73%, so any variation in someone’s actual hydration status from that 73% is going to significantly alter the results.
Reliability
BIA/BIS has some of the lowest test-retest reliability among the methods covered here, likely due to its susceptibility to changes in hydration status. Anything that impacts total body water, such as prior food and fluid intake, physical activity, or medical conditions, make BIA/BIS vulnerable to a decreased level of reliability. Despite this, BIA/BIS may still be useful to measure larger changes over time.
Air Displacement Plethysmography (ADP) (Bodpod)
Overview
Despite being the hardest to pronounce, ADP is often touted as a “gold standard” for body composition assessment. This technique involves a 450L or 500L pod of air (pictured above) that measures how much air is displaced when the person sits inside it to determine the body’s total volume. This, combined with the mass of the person provides the overall density of the body.
Validity
Being a two component model, it can only differentiate between fat mass and fat-free mass. This means that it must make certain assumptions about the quality of certain tissues in order to make predictions. As we mentioned earlier, FFM includes muscle mass, bone mass, and total body water, all of which ADP does not measure directly. Most notably, this method assumes a density of fat-free mass of 1.1 g/mL. So if someone has an actual fat-free mass density that is higher than that assumption (if they have a high bone mineral content, for example) this assumption will cause an underestimation of their FFM. Any deviation from the assumed value of 1.1g/mL is going to cause error in this measurement. Another possible source of error is the constant value used for skin surface area, which may cause inaccuracies if a subject is particularly hairy or is wearing inappropriate clothing for the test.
When compared to DXA, ADP has an average error rate that is usually within 2%. While that sounds comforting, I want to show you an example of a data set that elicits a 2% error rate:
On the x-axis we have the measured body fat % from the BODPOD (ADP), and on the Y-axis is the % difference between that measurement and the same person’s measurement with DXA. Sure, on average at the group level the agreement between ADP and DXA in this study is within 2%, but how many of the people in the study would that 2% figure actually be meaningful for? There are individuals in this study that got almost 15% disagreement between BODPOD and DXA. Let’s say, for example, that you did a BODPOD assessment and it gave you a body fat % reading of 20%. If you know that BODPOD has an average error rate of ~2%, this may lead you to an inappropriate level of confidence that your true body fat % is somewhere in the 18-22% range. The error for you as an individual may be much higher.
For individuals at the extreme ends of the BMI spectrum (very lean or very obese individuals) error rates tend to be even higher, overestimating body fat % for thinner individuals and underestimating in heavier individuals.
Reliability
The strength of ADP is in its reliability. When all the appropriate protocols are followed, ADP can detect changes as small as 1.7% in body fat %. So if your body composition goals are larger than that, ADP could be a useful measuring tool.
Dual-Energy X-Ray Absorptiometry (DXA)
Overview
I saved DXA for last, because there is a perception that this technique is infallible and that readings from DXA can be taken at face value. Indeed, most of the other methods we’ve covered here utilize either DXA or hydrostatic weighing as the reference standard to evaluate their validity. However, just like any other technique, there are plenty of ways error can be introduced both in validity and reliability.
DXA is a three-component model that measures fat mass, lean soft tissue, and bone mass using a dual-photon-energy low-dose X-ray beam. Proprietary software, which varies based on the manufacturer of the DXA device, is then used to calculate estimations of body composition.
Validity
Validity of DXA values relies in large part on the use of accurate algorithms that estimate soft tissue in the body compartments containing bone. These algorithms assume uniformity of soft tissue distribution in the limb and trunk regions. For this reason, validity of DXA values vary based on the population that the algorithms are validated against, as different racial and ethnic populations tend to have differences in fat distribution. Algorithms are also proprietary and differ between manufacturers, so there is no published information into some of their details. This means that a DXA conducted on one device may give you a different reading than another DXA conducted with a device produced by a different manufacturer, even if measurements are done back-to-back.
When DXA is compared to 5 or 6 component models, which combine multiple techniques, the estimated error rate for prediction of body fat is usually between 2-3%. That means if someone’s body fat as measured by 5 or 6-compartment models is 20%, DXA may provide a value, on average, between 17-23%. I want to emphasize that this is a typical GROUP average error rate, but error rates at the individual level can be higher. For example, error rates are known to be larger in individuals who are very small, very large, or very lean.
Reliability
DXA can be prone to biological variance due to changes in tissue hydration status, which can be influenced by proximity of the most recent training session, menstrual status, sleep differences. If proper standardization protocols are adhered to, these sources of variation can be limited and DXA can provide reliable measurements that enable the detection of changes in body fat as small as 1-2%.
An often overlooked application of DXA is in the monitoring of energy availability in athletes. Declining bone mineral content is a symptom of low energy availability, which underpins the female athlete triad and relative energy deficiency is sport (RED-S), and so DXA can be a useful tool in early identification and intervention of these conditions.
Should You Even Measure Body Fat?
For this section of the article, we will be stepping firmly out of the “objective scientific information” realm and into the “Chris’ personal opinion and approach with most of his clients” realm.
As we can see, obtaining “true” values of body composition with a high degree of accuracy is very difficult and often unachievable with the techniques that are available to the average consumer. However, some of these techniques do have acceptable reliability which means that we could potentially use them to assess changes in body composition over time. But should we?
I have a bit of a hot take on this one. For most general health goals, body fat % is usually just a proxy measurement for the things we actually care about. If we are losing body fat for aesthetic reasons, it is much easier to just directly measure that by looking in the mirror. Given how inaccurate measurements can be, if you are in a situation where you see improvements in the mirror but the BF % number does not reflect that, it can really mess with your head even though it’s just from the measurement being off. If you are losing body fat to improve health markers, then let's just look at those health markers directly (blood pressure, lipids, etc.), or even use circumference measurements of the stomach and hips which are less prone to assumption errors, and easier to measure on your own.
If you insist on measuring body fat %, ask yourself this: are you confident that the body fat % change you are after is large enough to be detected accurately by the method you are using, and is it simultaneously small enough that you would not notice it without that method?
There are, however, situations where I think monitoring body fat % (assuming you are using a method with good reliability) makes sense. If you are a someone who competes in a sport that requires you to get very lean on a regular basis (bodybuilding, for example) then monitoring body fat may be useful as you can monitor the BF % you start to experience certain symptoms of being very lean, so that in future cutting cycles you can anticipate those effects and plan accordingly.
If you are going to be monitoring your body composition over time, make sure that you are sticking to one method and one manufacturer (in the case of DXA) and adhere to the testing protocol as closely as you can in order to control for as much error as you can.
If you somehow have access to any of the above methods, and want help deciding which one to go with, consider the below decision tree from this paper.