The Splintered Mind: April 2021

Friday, April 30, 2021

Are 15 UCLA Anthropology Graduate Students Representative of the Los Angeles Population? More Thoughts on Henrich's The WEIRDest People in the World

Earlier this month, I complained about Joseph Henrich's somewhat loose summaries of scientific research in his recent, influential book The WEIRDest People in the World. At the time of the post, I had read through Chapter 6.

One of my complaints was that in explaining how research on economic games works, Henrich's paradigmatic fictional example described giving each research participant $20 to $30 per game over the course of ten games, totaling $200-$300 per participant. However, economic games rarely have stakes that large. More typically, the stakes are about a tenth of that. Overstating the amount typically at stake illegitimately prevents the naive reader from forming the skeptical thought that people might behave differently with small amounts of laboratory money than with the larger amounts commonly at stake in real-world situations.

Despite my concerns, I find Henrich's book fascinating, and I am finding much of value in it. So I kept reading. Last week, I hit Chapter 9 and, given my complaints about Chapter 6, I was struck by the following paragraph:

These interviews contrasted with those I did in Los Angeles after administering an Ultimatum Game that put $160 on the line. It was a sum that was calculated to match the Matsigenka stakes. [Matsigenka live in small farming hamlets in the Amazon.] In this immense urban metropolis, people said they'd feel guilty if they gave less than half. They conveyed the sense that offering half was the "right" thing to do in this situation. The one person who made a low offer (25 percent) deliberated for a long time and was clearly worried about rejection.

Wait, $160 per participant per game?!

(In the Ultimatum Game, Person A is given a sum of money to split with Person B. Person A proposes a split -- say, 50/50 or 80/20 -- and then Person B has the choice either to accept the resulting split or reject the offer, in which case neither player gets any money.)

I had to look up the study. Indeed, Henrich did offer $160 to participants. But -- understandably given the amounts at stake -- the sample size was very small: only 15 (that is, 15 people in the Person A role, whose offers provided the main data). And those 15 people were all graduate students in the Anthropology Department at UCLA, paired with 15 other anthropology students.

While it's not exactly wrong for Henrich to summarize the data as he did, his presentation omits details that seem to me quite relevant and which might fuel a skeptical interpretation. Should we consider 15 UCLA Anthro grad students representative of the Los Angeles population? Henrich treats their behavior as representative without explicitly flagging for the reader how unusual a group they are.

In the original article, Henrich does make a case for choosing this population. It's a group of acquaintances, like the Matsigenka population was a group of acquaintances. Like the Matsigenka participants, the graduate students all personally knew the experimenter, Henrich himself. That could potentially control for any inclination to be more generous in order to create a favorable impression on a high-status, high-resource acquaintance. In the original article and in at least one later re-presentation of the work, Henrich explicitly acknowledges some of the potential concerns with taking these students as representative of the larger U.S. urban population.

But of course all of this is hidden beneath Henrich's description in his book of the participants as merely being from "the immense urban metropolis" of Los Angeles. Given only that description of them, you might reasonably guess that the L.A. participants were strangers recruited off the streets.

Yes, readers can't be told every detail, especially in a book of such sweeping scope as Henrich's. This creates a situation in which the reader must trust the author. As an author, part of your job is to warrant that trust. As a critical reader, part of your job is to assess as best you can whether the author in fact warrants trust. One tool the reader can use spot checking, especially when the author enters areas where you have some independent sources of knowledge.

If you're inclined to trust Henrich's judgment that these 15 anthropology students were a well-chosen representative sample of Angelenos, then his omission is one you should feel comfortable enough with. You should think, "I'm in good hands. He's not distracting me with irrelevant details." But my own sense is different. Henrich omits crucial details about his population that I would want to know, and that I think readers in general should want to know so that they can think critically about the presented research.

-----------------------------------------------------------

By the way, Henrich replied on Facebook to my earlier blog post about the book. If you're curious, check it out. My sense is that his characterization of my post is inaccurate and that he did not correct that mischaracterization when given an opportunity to do so. Please feel free to read my earlier post to judge whether I'm being fair in my complaint.

[image source]

Friday, April 23, 2021

The Leaky Pipeline into Academic Philosophy for Black Students in the U.S.

Liam Kofi Bright, Carolyn Dicey Jennings, Morgan Thompson, Eric Winsberg and I have a paper forthcoming in The Philosophers' Magazine on the racial, ethnic, and gender diversity of philosophy students and professors, and how it has changed since the year 2000. We look at data from first-year intention to major through entry into the professoriate, drawing on several large data sources.

Below is a teaser, with graphs on the percent of philosophy majors who are non-Hispanic Black at three educational levels: first-year intention to major, completed bachelor's degree, and completed PhD, from 2000-2001 through the most recent available year. In each figure, the heavy black line represents the percent of philosophy majors who identify as non-Hispanic Black and the gray line represents the percentage of non-Hispanic Black students in all other majors combined. If the figures don't display correctly, click to enlarge and clarify.

In this first figure (drawing on the HERI CRIP database), note the sharp increase in first-year intention to major from 2000 to 2016. By the end of the period, first-year students intending to major in philosophy are about 10% non-Hispanic Black, similar to their overall representation in the undergraduate population.

How about bachelor's degrees awarded?

Here (drawing on the NCES IPEDS database), things are quite different. It does appear that the percent of philosophy bachelor's recipients identifying as non-Hispanic Black is rising -- from about 4% in the early 2000s to about 6% more recently. But it still remains far below the 10% non-Hispanic black among bachelor's recipients overall. This might partly reflect a lag in the data. Students entering in 2015 or 2016 (the final two years of the first figure) wouldn't normally be receiving their bachelor's degrees by 2019 (the final year of the second figure). Other possible explanations include a tendency for Black students disproportionately to exit (or not enter) philosophy, difference in the questionnaire items or methods, or sampling problems in the HERI database.

We see further falloff at the PhD level (drawing on the NSF SED database):

Non-Hispanic Black students are currently receiving only 1-4% of PhDs, with a weakly increasing trend at best. Temporal offset might again play a partial role, but it can't be the whole story. Even if we take bachelor's recipients from 2010 and 2011 as the approximate cohort to receive PhDs in 2018 and 2019, there's a falloff from about 5% to about 3%. It's unlikely that sampling problems could explain the difference, since both datasets capture the large majority of degree recipients.

The most natural explanation is a "leaky pipeline". Philosophy is increasingly drawing Black students' initial interest. However, for whatever reason, as their education proceeds from first-year to bachelor's to PhD, Black students are disproportionately likely to exit.

Thursday, April 15, 2021

Why Are We Such Bad Introspectors?

Sahar Joakim interviewed me earlier this week for her philosophy video series. Thanks, Sahar! We discussed the nature of consciousness, the unreliability of introspective reports of conscious experience, and how sparse or abundant consciousness is, both across animal species (e.g., are snails conscious?) and in your own mind (e.g., do you have constant tactile experience of your feet in your shoes?).

At one point Sahar asked, if it's true that people are bad introspectors, why is that so? What makes introspection so difficult? I have a theory about this, but I've never published an article directly on the topic. So here's a brief discussion (some of the below is adapted from Chapter 3.3 of my 2007 book with Russ Hurlburt).

Before getting into the details, let me briefly sketch my target. The principal target of my skepticism about introspection concerns large-to-medium-sized structural features of currently ongoing conscious experience. An example of what I'm not skeptical about is this: If you think you're thinking of a banana, probably it's true that you are thinking of a banana. The introspective judgment "I'm thinking of a banana" might even contain within it the thought of a banana, making it automatically self-fulfilling.

However, what you don't know so well, I've argued, are the structural features of that thought, for example, whether it is in inner speech (words you silently speak to yourself), or inner hearing (auditory imagery that is experienced as being more passive than inner speech), or whether it is to some extent also, or instead, an imageless/wordless "unsymbolized thought", or.... Even if you might rightly be confident about the coarsest-level distinction here -- for example, that it's a thought specifically in inner speech as opposed to some other modality -- the basic structural features of that inner speech experience can be difficult to know. Is the speech at a normal pace relative to ordinary speech or does it transpire more quickly? Is the speech experienced as located somewhere, e.g., in the center of your head, versus somewhere else or nowhere? Is there some distinctive feeling of understanding that accompanies the inner speech? (See Chapter 4 of my book with Hurlburt for discussion of these issues in the context of an actual sampled instance of reported inner speech by an introspective research participant.)

Alternatively, consider your current visual experience. How stable is it? How clear and distinct are shapes and colors in the periphery? It is in some respect flat, like a photograph, or does it have some real depth beyond what is possible in a photograph? If the latter, does the depth of it change dramatically when you close one eye? (See here, here, and here for more discussion of these issues.)

You might not be convinced that introspection of the large-to-medium-sized structural features of your ongoing conscious experience is as difficult as I suggest. But hopefully you at least have a sense of what my position is. Now, to Sahar's question. Why is it so difficult? In the interview, I listed three reasons. Here, I'll expand those reasons into five. None of these reasons, by itself, needs to make introspection super difficult. But combined, they create quite a set of obstacles to good self-knowledge.

First, experience is fleeting and changeable – or so it seems to me right now as I reflect, introspectively, upon it. The screen of text before me, as I reread these paragraphs, is relatively steady; but my visual experience as I look at the text is in constant flux. As my eyes move, the portion that’s clear, the portion that’s hazy, constantly changes. I blink, I glance away, I change my focus, and my experience shifts. My eyes slowly adapt to the black and white of the screen, to the contrast with the surrounding desk, to the changing light as the sun goes behind a cloud. I parse some bit of the page into familiar words as my eye scans down it; I form a visual image, reflecting the content of the discussion; my attention wanders. All this, it seems, affects my visual experience.

Consider your own experience as you read this paragraph. The text in your hands changes not a whit, but your visual phenomenology won’t stay still a second, will it? (Or will it?) The same is true, I’m inclined to think, for our auditory experience, emotional experience, somatic experience, conscious thought and imagery, taste, and so on: Even when the outside environment is relatively steady, the stream of experience flies swiftly. It won’t hold still to be examined.

Second, we’re not in the habit of attending introspectively to experience. Generally, we care more about physical objects in the world around us, and about our and others’ situation and prospects, than about our conscious experience, except when that experience is acutely negative, as with the onset of severe pain. This may seem strange, given the importance we sometimes claim for “happiness,” which we generally construe as bound up with, or even reducible to, emotional experience – but despite the lip service, few people make a real study of their phenomenology. We spend much more time thinking about, and have much subtler an appreciation of, our outward occupations and hobbies. And when we do “introspect,” we tend to think about such things as our motives for past actions, our personality traits and character, our desires for the future. This is not the sort of introspective attention to currently ongoing (or immediately past) conscious experience around which my skepticism turns (though I am also skeptical of much of this purported knowledge, on different grounds). Introspective attention to experience is hardly a habitual practice for most, perhaps any, of us, except maybe a few dedicated meditators of a certain sort.

If accurate introspection requires a degree of skill, as I suspect it does, in most people the skill is uncultivated. Furthermore, relatedly, experience is difficult to remember: Generally what we remember are outward objects and events – or, rather, outward objects and events as interpreted, and possibly misperceived, by us – not our stream of experience as we witness those objects and events. We remember, usually, that the boss said the work wasn’t up to snuff, not that our visual experience as he said it was such-and-such or that we felt some particular sinking feeling in the stomach afterward. These conscious experiences fade like dreams in the morning unless, as with dreams, we fix them in mind with deliberate attention within a very short space.

Third, in part due to our disinterest in conscious experience, the concepts and categories available to characterize conscious experience are limited and derivative. Most language for sensory experience is adapted from the language we use to describe outward objects of sensation. Objects are red or square or salty or rough, and usually when we use the words “red” and “square” and “salty” and “rough,” we are referring to the properties of outward objects; but derivatively we also use those words to describe the sensory experiences normally produced by such objects. That’s fine as far as it goes, but it’s prone to invite confusion between the properties of objects and the properties of experiences of those objects. The practitioners of certain specialties – for example, wine tasting and sound engineering – have refined language to discuss sensory experience, but even here our conceptual categories are only rough tools for describing the overall experience. And, anyway, isn’t the gustatory experience of eating a burrito as complex as that of tasting a mature wine, and the auditory experience of sitting in a restaurant as complex as that of hearing a well-played violin? We almost completely lack the concepts and competencies that would allow us to parse and think about, talk about and remember, this complexity.

Fourth, the introspection of current experience requires attention to (or thought about) that experience, at least in the methodologically central case of deliberately introspecting with the aim of producing an accurate report. Problematic interference between the conscious experience and the introspective activity thus threatens. Philosophers and psychologists going back at least to August Comte have complained that the act of introspection either alters or destroys the target experience, making accurate report impossible. Much of experience is skittish – as soon as we think about it, it flits away. Suppose you reflect on the emotional experience of simple, reactive anger, or the auditory experience of hearing someone speak. Mightn’t the self-reflective versions of those experiences – those experiences as they present themselves to concurrent introspection – be quite different from those experiences as they normally occur in the unselfconscious flow of daily life? A number of psychologists have attempted to remedy this difficulty by recommending immediate retrospection, or recall, of past experience rather than concurrent introspection as the primary method (e.g., James, 1890/1981). However, deliberately poising oneself in advance to report something retrospectively may also interfere with the process to be reported; and if one only reports experiences sufficiently salient and interesting to produce immediate spontaneous retrospection, one will get a very biased sample. Furthermore, retrospection is likely to aggravate the fifth problem, namely:

Fifth, reports of experience are apt to be considerably influenced, and distorted, by pre-existing theories, opinions, and biases, both cultural and personal, as well as situational demands. The gravity of this problem is difficult to estimate, but in my opinion it is extreme (and considerably larger than the influence of bias and preconception now generally recognized to permeate science as a whole). Given the changeability and skittishness of experience, and our poor tools and limited practice in conceptualizing and remembering it, we lean especially heavily on implicit assumptions and indirect evidence in reaching our introspective and immediately retrospective judgments. One major source of such error is what the introspective psychologist E. B. Titchener called “stimulus error”: We know what the world, or a particular stimulus, is like (we know for example that we are seeing a uniformly colored red object), and we are apt to infer that our experience has the properties one might naively expect such a stimulus to produce (e.g., a visual experience of uniform “redness”). We’re much better accustomed to attend to the world than to our experience, and the difference between sensory attention to outside objects and introspective attention to the sensory experience of those objects is a subtle one; so the former is apt to substitute for the latter. Even when experience isn’t so easily traceable to an outside object, I’m inclined to think our theories can profoundly affect our reports. If we think images must be like pictures, we’re more apt to instill reports of imagery with picture-like qualities than if we don’t hold that view. If we think cognition takes place in the brain, we’re more apt to locate our cognitive phenomenology there than if we think it takes place in the heart. If we think that memories must be imagistic, we’re more apt than those who don’t think so to report memory images.

Experience is a fast-moving chaos -- a tornado whipping through a magnet factory. If we approach it without practice and skills, with derivative, second-hand concepts, and with our attention split between the experience itself and the act of trying to figure out what that experience is, no wonder we make a mess of it. We end up describing at least as much what we expect to be there as what is actually there.

Wednesday, April 07, 2021

On Scientific Trust, Loose Summaries, and Henrich's WEIRDest People in the World

Joseph Henrich's ambitious tome, The WEIRDest People in the World, is driving me nuts. It's good enough and interesting enough that I want to read it. Henrich's general idea is that people in Western, Educated, Industrial, Rich, Democratic (WEIRD) societies differ psychologically from people in more traditionally structured societies, and that the family policies of the Catholic Church in medieval Europe lie at the historical root of this difference. It's very cool and I'm almost convinced!

Despite my fascination with his argument, I find that when Henrich touches on topics I know something about, he tends to distort and simplify things. Maybe this is inevitable in a book of such sweeping scope. However, it does lead me to mistrust his judgment and wonder how accurate his presentation is on topics where I have no expertise.

Early in reading, I was struck by Henrich's presentation of the famous / notorious "marshmallow test". Here's his description:

To measure self-control in children, researchers sit them in front of a single marshmallow and explain that if they wait until the experimenter returns to the room, they can have two marshmallows instead of just the one. The experimenter departs and then secretly watches to see how long it takes for the kid to cave and eat the marshmallow. Some kids eat the lone marshmallow right away. A few wait 15 or more minutes until the experimenters gives up and returns with the second marshmallow. The remainder of the children cave somewhere in between. A child's self-control is measured by the number of seconds they wait.

Psychological tasks likes these are often powerful predictors of real-life behavior (p. 40).

It's a cute test! However, I have a graduate student who is currently writing a dissertation chapter on problems with this test. Maybe the test is a measure of self-control, but it could also be a measure of how much the child trusts the experimenter to actually deliver on the promise, or how much the child desires the social approval of the experimenter, or how comfortable the child is with strange laboratory experiments of this sort, or how hungry they are, how much they want to end the situation so as to reunite with their waiting parent, etc. Indeed, the a recent conceptual replication of the experiment mostly does not find the types of predictive value that were claimed in early studies, after statistical controls are introduced to account for race, gender, home background, parents' education, vocabulary, and other possible covariates.[1]

In general, if you've been influenced, as I have, by the "replication crisis" and other recent methodological critiques of social science and medicine, this might be the kind of result that should set off your skeptical tinglers. The idea that how long a four-year-old waits before eating a marshmallow reveals how much self-control they have, which then "powerfully predicts" real-life behavior outside of the laboratory (e.g., college admission test scores over a decade later, as is sometimes claimed) -- well, it could be true. I'm not saying it's not. But I don't think I'd have written it up as Henrich does, without skeptical caveats, as though there's general consensus among psychologists that a child's behavior with a single marshmallow in this peculiar laboratory situation is a valid, powerful measure of self-control with excellent predictive value. Its prominent placement near the beginning of the book furthermore suggests that Henrich regards this test as part of the general theoretical foundation on which psychological work like his appropriately builds.

In this matter, my knowledgeable judgment and Henrich's differ. That's fine. Researchers can differently weigh the considerations. But if I hadn't had the background knowledge I did, his quick presentation might have led me into a much more optimistic assessment of the value of the marshmallow test than I would have arrived at from a more thorough presentation that acknowledged the caveats. So there's a sense in which Henrich's presentation is a bad fit for my theoretical inclinations.

Here's another passage that bothered me:

Upon entering the economics laboratory, you are greeted by a friendly student assistant who takes you to a private cubicle. There, via a computer terminal, you are given $20 and placed into a group with three strangers. Then, all four of you are given an opportunity to contribute any portion of your endowment -- from nothing at all to $20 -- to a "group project." After everyone has had an opportunity to contribute, all contributions to the group project are increased by 50 percent and then divided equally among all four group members. Since players get to keep any money that they don't contribute to the group project, it's obvious that players always make the most money if they give nothing to the project. But, since any money contributed to the project increases ($20 becomes $30), the group as a whole makes more money when people contribute more of their endowment. Your group will repeat the interaction for 10 rounds, and you'll receive all of your earnings in cash at the end. Each round, you'll see the anonymous contributions made by others and your own total income. If you were a player in this game, how much would you contribute in the first round with this group of strangers?

This is the Public Goods Game (PGG). It's an experiment designed to capture the basic economic trade-offs faced by individuals when they decide to act in the interest of their broader communities.... societies with more intensive kin-based institutions contribute less on average to the group project in the first round (p. 210-211).

This describes a study in which participants will receive $200-$300 each. Of course, it's rare to award research participants such large amounts of money. If you want, say, 200 participants, you'll need a $60,000 budget! Henrich's endnotes cite two general books, one brief commentary without empirical data, two classic articles in which participants exited the experiment having earned about $30 each on average, and two cross-cultural studies whose payout amounts weren't readily discoverable by me from looking at the materials. Also in the notes, Henrich says that one study "increased contributions to the group project by 40 percent, not 50 percent. I'm simplifying" (p. 543). However, the majority of the cited studies in fact used 40 percent increases, not just the one study to which this caveat was attached.

I'm not seeing why the more accurate 40% is "simpler" than 50%. This seems to be a gratuitous inaccuracy. Characterizing the experiment as ten rounds with payoffs of $20-$30 per round is potentially a more serious distortion. Really, these experiments are run with units that are later exchanged for small amounts of real money. This is important for at least two reasons: First, these experimental monetary units might be psychologically different from real money, possibly encouraging a more game-like attitude. And second, when the actual amounts of money at stake are small, the costs of cooperating (and also the benefits) are less, which should amplify concerns about how representative this game-like laboratory behavior is of how the participants would behave in the real world, with more serious stakes.

Suppose that instead of exaggerating the stakes upward by a factor of about 10, Henrich had exaggerated the stakes down by a factor of about 10. What if, instead of saying that there was $20-$30 at stake per turn, when it's typically more like $2-$3, he had said that $0.20 was at stake per turn? I suspect this would make an intuitive difference to most ordinary readers of the book. The leap from "here's how cooperatively research subjects act with $20" to "here's how cooperative people in that culture are with strangers in general" is more attractive than the leap from "here's how cooperatively research subjects act with $0.20" to the same broad conclusion.

In general, I tend to be wary of quick inferences from laboratory behavior to real-world behavior outside the laboratory. Laboratories are strange social situations and differently familiar to people from different backgrounds. This is the problem of ecological validity or external validity, and concerns of this sort are why most of my own research on social behavior uses real-world measures. Other researchers, such as Henrich, might not be as worried about the external validity of laboratory/internet studies. There's room for legitimate debate. But in order for us readers to get a sense of whether external validity might be an issue in the studies he cites, at the very least we need an accurate description of what the studies involve. Henrich's presentation does not provide that, and simplification is a poor motive for this distortion, since $2 is no less simple than $20.

Henrich does not, in my mind, cross over into bald misrepresentation. He doesn't, for example, say of any particular study that it involves $20 per round. Rather, the presentation seems to be loose. He's trying to give the general flavor. He's writing for a moderately broad audience and aiming to synthesize a huge range of work, unavoidably simplifying and idealizing along the way. He could respond to my concerns by saying that his best judgment of the conflicting evidence about the marshmallow test is that it's a valid and highly predictive measure of self-control and that his simplified presentation of the material conveys that effectively by avoiding concerns and apparent replication failures that would just (in his judgment) be distracting. He could say that his best reading of the literature on external validity is that the difference between $2 and $20 doesn't matter and that the quick leap to general conclusions about cooperativeness is justified because we can reasonably expect laboratory studies of this sort to be diagnostic. He could say that the reader ought to trust that he's done his homework behind the scenes.

We must always trust, to some extent, the scientists we're reading -- that they are reporting their data correctly, that there aren't big problems with the study's execution that they're failing to reveal, and so on. Part of this involves relying on their inevitably simplified summaries of material with which we are unfamiliar. We trust the researcher to have digested the material well and fairly, and not to be hiding worries that might legitimately undermine the central claims. The looser the presentation, the more trust is required.

This invites the question of whether there are conditions under which more versus less trust is justified. How much, as a reader, ought you be willing to glide through on trust?

I'd recommend reducing trust under the following three conditions:

(1.) The author has a prior agenda or a big picture theory that might motivate them to interpret and digest results in a biased way. Most scientists have agendas and theories, of course, and certainly Henrich does. But there is more and less agenda-driven work, and adversarial collaboration offers the opportunity for bias to be balanced through scientists' opposing agendas.

(2.) The author is not as skeptical as you the reader are about some of the relevant types of research. If the author is less skeptical than you are, they might be interpreting that research more naively or more at face value than you would if you had read the same research.

(3.) Where the author makes contact with the issues you know best, they seem to be distorting, misinterpreting, or leaping too quickly to broad conclusions. This might indicate a general bias and sloppiness that might be present but harder for you to see regarding issues about which you know less.

On all three grounds, my trust of Henrich is impaired.

--------------------------------------------------

Update, April 30: See my continuing thoughts about the book here. See also Henrich's reply to my post here.

--------------------------------------------------

[1] Deep in an endnote, Henrich acknowledges this last concern. He responds that "it's easy to weaken the relationship between measures of patience and later academic performance by statistically removing all the factors that create variation in patience in the first place" (p. 515). It's a reasonable, though disputable point. Regardless, few readers are likely to pick up on something buried in the back half of one among hundreds of endnotes.

Friday, April 02, 2021

Gender Disparity in Philosophy, by Race and Ethnicity

The National Center of Education Statistics keeps a database of bachelor's degree recipients at accredited colleges in the U.S., currently running through the 2018-2019 academic year. Search "NCES" on The Splintered Mind and you'll see my many posts drawing on this database.

Here's something I noticed today, in the course of preparing a new paper on demographic trends in academic philosophy for The Philosopher's Magazine: 33% of non-Hispanic White bachelor's degree recipients in philosophy are women (averaging over the most recent three years), while 46% of non-Hispanic Black bachelor's degree recipients are women. That is, if you look just at non-Hispanic White students, the gender ratio in philosophy is 2:1 men to women, while if you look just at non-Hispanic Black students, it's nearly 1:1. The result is highly statistically significant: non-Hispanic White 4674/14032 vs. non-Hispanic Black 579/1264, z = 8.6, p < .001.

I find this interesting and surprising. I welcome conjectures about the possible explanation in the comments. It is definitely not the case, as I have sometimes heard suggested, that non-Hispanic White women are proportionately represented in philosophy, at least at this level. Non-Hispanic White women constitute 32% of bachelor's degree recipients across all majors, and 30% of the U.S. general population, but only 20% of bachelor's degree recipients in philosophy.

Of course, as these numbers also suggest, non-Hispanic Black students remain underrepresented among philosophy majors overall (6%, excluding students who aren't permanent residents or whose race/ethnicity is unknown), compared to bachelor's degree recipients across all majors (10%) and to the U.S. general population (13%).

Looking at the other race/ethnicity categories that NCES makes available, non-Hispanic Asian, Non-Hispanic Multiracial, Hispanic (any race), and nonresident aliens show a similar tendency toward greater gender parity in philosophy than non-Hispanic White students (all p values < .001):

non-Hispanic Asian philosophy BA recipients 44% women (708/1598);
non-Hispanic multiracial philosophy BA recipients 40% women (441/1097);
Hispanic philosophy BA recipients 39% women (1234/3132);
non-resident alien BA recipients 44% women (545/1239).

However, Native American / Alaska Native and Native Hawaiian / Other Pacific Islander (non-Hispanic) showed proportions closer to those for non-Hispanic White students, though the numbers are too small for any confident conclusions: 36% (32/88) and 30% (13/44), respectively.

Image: Angela Davis mural in Boston [source]

The Splintered Mind