Once Again, With Feeling…
Or, How Do We REALLY Know That There Are Two Types of Transwomen?
One would think that with years of both clinical and scientific evidence to support the Two Type Taxonomy of MTF transexuality, we would no longer have need of essays that explain how we know this to be true, but no… sillyolme, nothing is so obvious as to be truly self-evident. So, once again, it’s time to write a clear, concise, yet also complete explication of how we know that there are two and only two types of transwomen.
First, we need to know a bit about epidemiological research into etiology. In medical science we often recognize that a given medical entity exists because of its pattern of symptoms that collectively we call a syndrome. After recognizing a syndrome, science then attempts to determine an etiology, if it can. Here it is important to recognize that the existence of a given symptom in itself does not define a syndrome. Consider fever as a symptom. Today, after much research, we know that it is caused by our immune system attempting to fight off an infection. But that infection may be from any of literally millions of different entities, from eukaryotic parasites, bacteria, to viruses. One would not say that just because two individuals both have fevers, or that a given medicine helps reduce both individual’s fevers, that they have the same etiology. Yet, when it comes to transsexuals, this seems to be the assumption by both transexuals and the public at large. As I will show, this is just not the case.
We also need to know a bit about statistics, most critically, about the concept of “effect size” and what it means. Effect size is a measure of how different two populations are from one another when comparing their mean (average) and their variance (how much spread in a given measure exists within a given population). If two populations have the same average, they have by definition an effect size between them of exactly zero, no matter the variance within the populations. But even if they do not have the same average, if the variance in each is so large that it dwarfs the difference in average, it has a small and not very important effect size. But if two populations have a difference in their average and no overlap in their variance, than there is a large effect size. We calculate the effect size using a standard formula called “Cohen’s d”.
Why is this important? Because to determine if there are in fact two (and only two) types, we must show that the Null Hypothesis, the assumption that there is only one type, is wrong by demonstrating that we consistently find that there is a large enough effect size in a number of measures that consistently cluster together. In science we never “prove” an hypothesis… we only disprove one. If the null hypothesis holds, there should be no such effect sizes. So, in this essay, I’m going to review some of the evidence, demonstrating that there are respectable effect sizes and that they consistently cluster together. Here’s the key, we DON’T have to show that that there are characteristics that give 100% vs. 0%… only that there ARE differences, respectfully large effect sizes, in order to disprove the null hypothesis.
Further Reading on Effect Size
Having prefaced our discussion, let’s describe our hypothetical two types, as described by experienced clinicians:
One group is exclusively attracted to men, transitions quite young, passed as girls/women with relative ease, were noted to be feminine (sissy boys) by parents and teachers as children, preferred female playmates, avoided rough’n’tumble play, and were unlikely to report finding wearing women’s clothing to be sexually arousing.
The other grouping was sexually attracted to women (as evidenced by extensive sexual experience with women, marriage, and siring children) but may identify as bisexual or asexual, transitioned later in life, rarely passed successfully as women, were considered to have been typical boys (“boyish”) by their parents and teachers, and were very likely to report finding wearing women’s clothes to be, or once had been, sexually arousing.
But what is the evidence and how large are the effect sizes?
Let’s look at some data. In a study by Lawrence, conducted in 2005 among those who had had SRS by Toby Meltzer, she has three groups, those who had always been exclusively into men (androphilic), those who had always been exclusively into women (gynephilic), and those who claimed that their sexuality has switched from women to men (bisexual).
|Attraction before SRS/Attraction after SRS:||F/M||F/F||M/M|
|Participant characteristic||(n = 30)||(n = 50)||(n = 17)|
|Mean age at SRS (SD)||45 (8.4)||44 (9.1)||34 (9.2)|
|Mean age at living full-time in female role (SD)||42 (11.3)||42 (9.6)||28 (8.8)|
|Very or somewhat feminine as a child, in own opinion||41%||45%||76%|
|Very or somewhat feminine as a child, in others’ probable opinion||21%||24%||76%|
|Autogynephilic arousal hundred of times or more before SRS||52%||58%||18%|
So, let’s look at the effect size ages of SRS and of social transition. When we compare those who had been consistently gynephilic to those who would best be described as bisexual (having claimed sexual attraction to both men and women) we see that Cohen’s d for age of SRS is only 0.11, so tiny as to be essentially zero. For age of social transition Cohen’s d is 0.0000 = zero. Thus, we would have to say, for this characteristic and these two populations the null hypothesis is not disproven. Again, this does not mean that the null hypothesis is proven… only that it is not disproven. Gynephilic and bisexual transwomen could be the same underlying etiology… or not.
Oh… but let’s look at the androphilic group compared to these other two groups, shall we? Comparing age of SRS between the bisexual and androphilic Cohen’s d = 1.25, a very large effect size. Comparing their ages of social transition Cohen’s d = 1.48, also a very large difference. Finally, looking at the ages of SRS and ages of social transition between the gynephilic and androphilic groups Cohen’s d = 1.09 and 1.44 respectively. This very powerfully disproves the null hypothesis. Sexual orientation is definitely important and supports the two type hypothesis.
Lest you think this result is from only one study, consider the even larger Nuttbrock study in which we see that of those who have started HRT, fully one half of the androphilic had done so before they turned age 20, while only one gynephilic individual had done so.
Our description of the two types also mentioned other characteristics, such as gender atypicality and autogynephilia. Now here, we have a small problem in that we don’t have measures that have a continuous value nor a variance. These were bivalued. However, interestingly, because people don’t always answer perfectly, we can use the number of people who answer a given way as a pseudo continuous measure of the real continuous value. That is to say, if only a small number say yes to a question, it’s likely that the real value is very small. If a large number answer yes to a question, it’s likely that the real value is very large. So, let’s look at the values for self image and likely impression to others of being gender atypical. Oh look, consistent with our earlier conclusion that the gynephilic and bisexual groups were in fact not really different groups, their answers are very similar at 41% vs. 45% and 21% vs. 24%. These are so close, that we might as well agree that they are identical. And once again, we see that the androphilic group scores are quite different at 76%. So, consistent with our earlier conclusion, the null hypothesis that there is only one group is very much disproven.
Before we leave Lawrence’s study, lets look at the issue of autogynephilia. Again, we have a bivalued question whether one had experienced hundreds (or more) episodes of autogynephilic arousal to wearing women’s clothing. As before, we see that the gynephilic and bisexual groups are very similar at 52% vs. 58%, while the androphilic group had only 18%. So, once again, consistent with our earlier conclusion, the null hypothesis that there is only one group is very very much disproven.
Again, lest you think this result is restricted to only this study, we have seen this replicated by Buhrich (1977), Freund (1982), Blanchard (1985), Doorn (1994), Smith (2005), and Nuttbrock (2009), in separate studies spanning four decades, collectively involving over a thousand transsexuals to date. In fact, this is one of the most repeated and reconfirmed scientific finding regarding transsexuality.
Another characteristic difference mentioned about the two types was passability. Fortunately, we have a clinical study from the Netherlands which showed a robust effect size d = 0.7 difference between androphilic and non-androphilic transwomen. The graph above shows the data. The higher the score, the more ‘readable’ (less passable) the individual. From the graph, we see that the most passable non-androphilic (gynephilic and bisexual) is just average for the androphilic population.
When we add in the growing evidence that there is a distinct difference between the brains of androphilic vs. gynephilic & bisexual, the null hypothesis that there is only one type is not just merely dead, but most sincerely dead.