Whodunit... and to Whom? Subjects, Objects, and Actions in Research Articles on American Labor Unions

This paper investigates whether sentence structure analysis—examining who appears in subject versus object position— can illuminate who academic articles portray as having agency in labor relations. We extract subjects and objects from a corpus of 3,800 academic articles, and compare both the relative occurrence of different groups (workers, women, employers) in each position and the verbs that most commonly attach to each group. We conclude that agency, while elusive, can potentially be modeled by sentence structure analysis.


Introduction
In scholarship on grassroots movements and nonelite groups, the question of "agency" often looms large (Johnson, 2003). Who exactly do we, as scholars, portray as taking action, accomplishing historical change, doing rather than being done to?
With regard to scholarship on American labor unions, the main fault lines along which "doers" and "done-to" are split involve not only employers versus unions, but also the union leadership versus the rank and file, and unionized workers versus unaffiliated workers. This question has, indeed, informed some of the major shifts in the writing of labor history, as scholars have moved away from the "Wisconsin school" of John Commons, which focused on the institutional and organizational history of unions and toward a more inclusive and bottom-up social history of workers, affiliated or not (Isserman, 1976;Fink, 1988). More recently, perhaps spurred by the sorry state of American labor unions, interest in unions as institutions and organized movements has resurfaced (Currarino, 2011;Taillon, 2009). As many scholars have noted, however, there seems to often be an excess of attention to the articulate leadership and the actions of the union as an institution, even if the rank and file (let alone unaffiliated workers) may not share those views or endorse those actions (Pierce, 2010).
The question of who gets to speak for social movements is hardly limited to the history of organized labor. Similar questions about whose actions command attention (as well as about who does the hard work and who gets the credit) have been raised about the Civil Rights Movement as well (Hall, 2005;Ransby, 2003). More recently, the efforts of the Black Lives Matter movement to remain multipolar and avoid focusing attention on "leaders" have raised both the question of whether that is a useful strategy vis-à-vis the media or the public's perceptions of the significance or the message of the organization, and the question of the risks of one or a handful of "charismatic leaders" (Harris, 2015). This paper investigates whether these problems of agency-fundamentally, who exercises some measure of power-can be perceived in scholarly writing using natural language processing (NLP) tools. A syntactic analysis has potential to go beyond bag-of-words models like topic modeling in illuminating power relations, as well as to capture more clearly who exactly is at the center of the analysis. Analysis of subjects and objects can also easily be combined with analysis of which actions are related to which subjects/objects, revealing interesting patterns about the ways different groups of actors are represented in the literature. In future work, we hope to expand the analysis by experimenting with Semantic Role Labeling in addition to syntactic analysis as well as with using FrameNet (Baker, 2008;Palmer, 2009) and Verb-Net (Kipper et al., 2008) to discover patterns in the actions.
In what follows, we offer a preliminary analysis focusing on noun phrases (NP) that appear in either a subject (passive or active) or an object (indirect or direct) position, and of the actions they most commonly perform or are subjected to. Does this grammatical representation of the doers and the done-to reproduce the splits usually emphasized in scholarship? Who, in academic writing, appears as a doer, grammatically speaking? Do the actions associated with doers and done-to modify assumptions about who has agency in this corpus?

Dataset
The texts examined in this paper consist of the set of English-language research articles over 9 pages contained in the JSTOR article database answering the query ("american federation of labor"). The query was selected to weight attention toward "mainstream" organized labor rather than e.g. working-class culture or the Socialist movement, though naturally the dataset also contains articles on e.g. the radical Industrial Workers of the World (IWW). 1 This query produces a set of 4,183 articles, of which about 70 percent are published after 1945. The final set consists of a subset of 3,807 of these articles successfully processed using the Stanford CoreNLP parser (Manning et al., 2014).

Extracting subjects and objects
Extracting subjects and objects from the parsed articles was performed using the Stanford Tregex utility (Levy and Andrew, 2006).
The expressions used to extract subjects (active and passive) and objects (direct and indirect) are listed in table 1. The copula "to be" was excluded from consideration. As the main expressions capture rather long noun phrases (NPs), a constraining expression was used to further narrow those phrases down to more useful sub-NPs.

Most common entities
Disregarding for the moment whether an entity (NP) appears as subject, direct object, or indirect object, the list of most-frequent animate entities in the corpus reads like the cast of main characters  and issues of industrial relations, with e.g. workers, employers, and the american federation (of labor) as well as legislation and wages clearly represented (see table 2). 2 Some trends can be extracted even from this basic count of subject/object NPs: for example, as figure 1 shows, women's involvement in the labor movement has been of shifting scholarly interest, with the first peak coinciding roughly with the suffrage movement and the second upward trend beginning around the rise of second wave feminism in the 1970s. Although the topic model 3 depicted in figure 2 finds a similar pattern in the data, the NP-based graph offers a much more fine-grained and more easily interpreted view.

Subjects and objects
But what about the question of agency? Is there any pattern in who appears as a subject and who appears as an object?
There is, though the results should be taken with some caution. Table 3 and figure 3 show selec-2 The count is the sum of the times the NP appeared as indirect, object, direct object, passive subject, and active subject.
3 Topic model created using MALLET (McCallum, 2002), 50 topics, 1000 iterations, optimize-interval 20.  tions of the most frequent human or human-like entities according to the entity's degree of "subjectness." The table and figure were constructed by first selecting the 1,000 most frequent NPs in the data and then calculating for each the ratio of how many times it appeared as a subject (passive or active) versus as an object (indirect or direct). From this was then deducted the overall ratio of subjects to objects in the dataset, and the resulting figure was used as a proxy for "subjectness." Thus, negative ratios in table 3 indicate that the NP is found in object position more commonly than the average NP in the data (the count reflects the sum of mentions, each position being counted once per article). Of the 1,000 most frequent terms, few were of this "more-object-than-average" character; however, the spread of "subjectness" allows some preliminary conclusions. 4 4 We did not perform coreference resolution, and thus have no way of capturing repeated references to the same entity with different words. To mitigate this, we have used a count of how many articles an NP appears in as subject/object rather than allowing multiple counts per article. The order of the NPs in terms of subjectness if multiple instances per  1900 1905 1910 1915 1920 1925 1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 Figure 1: NP "women" (obj/subj) in the corpus 1890 1895 1900 1905 1910 1915 1920 1925 1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 Figure 2: Topic "women" in the corpus (women women's work men woman female family male children gender working workers equal sexual sex suffrage) On the whole, workers (even strikers) appear quite commonly in an object position, whereas the government, unions as organizations, and employers appear clearly more commonly than average in a subject position. Partly the results are explainable by specificity: the AFL-CIO and the wellknown AFL leader Samuel Gompers are more likely to appear as subjects, whereas "workers" is only barely above average in its "subjectness." However, it is worth noting that "employers" and "manufacturers" are significantly above "workers" and variants thereof in subjectness. Even as "strikers," workers' subjectness is quite low-although as "unionists" their subjectness is slightly higher than that of manufacturers. article are considered is very nearly the same as presented here. Figure 3: Selected NPs ordered from "subjectness" to "objectness."

The actions of doers and done-to
To further investigate the meaning of a word appearing in a subject versus object position, the most common verbs for each word and each category (object, indirect object, subject) were extracted. The indirect object category was mostly too ambiguous to draw conclusions from, involving verbs like send, provide, give, distribute; the analysis below therefore focuses on the subject versus indirect object categories. 5

"Unions"
The verbs associated with unions as actors (subjects) are the bread and butter of union activity: they affiliate, represent, organize, seek, agree, refuse, grow, demand -and encouragingly, win rather more often than lose.
As acted-upon (direct objects), unions seem to mainly reflect worker activity: join is by far the most common verb, followed by verbs like form, organize, and build. The third most common verb here is recognize, i.e., achieving union recognition by the employer. However, high on the list are also destroy, break, and prohibit, reflecting the contested history of labor.

"Workers," "members," and "strikers"
High on the list of verbs associated with workers as actors are organize and strike; interestingly, 5 Excessively generic verbs like do or make are ignored in the analysis. strike comes much higher on the list for "workers" than for "unions" (10th versus 73rd place).
"Members" as actors are clearly tied to the bureaucracy and process of union activity: they appoint, vote, elect, represent, and participate. Intriguingly, they also grapple and adapt.
As acted-upon, both members and workers are organized, represented, recruited, and mobilized, as well as employed and hired. However, workers are also excluded and divided, reflecting the divisions among workers and the not-always-inclusive nature of American labor unions. Meanwhile, members get disciplined, presumably reflecting conflicts between leadership and rank and file, and forbidden, possibly by police or courts. Both workers and members are the targets of someone's efforts to educate.
When they appear as "strikers," the main thing workers do is return (to work, presumably). They also demand, remain (on strike?), refuse, vote, and win or lose. As acted-upon, strikers most commonly get replaced. But they are also supported, urged, aided, rehired, and reinstated -as well as restrained, arrested, and intimidated.

"Employers"
Employers are not primarily the initiators of action in this corpus: rather, the two most common verbs for "employers" as actor are refuse and agree. In the top 25 are also violate (presumably agreements) and resist (presumably unions).
As acted-upon, employers in this corpus find  themselves the target of efforts to require, force, compel, prevent, prohibit, and coerce, though also to allow, permit, and induce.

"Women"
The main thing that women do in this corpus is work; it seems that the main news about women as workers is that they exist. High on the list is also enter, probably from a phrase like "enter the workforce." However, women also participate, want, organize and negotiate.
As acted-upon, women are given, organized, employed, and bafflingly, ordained. They are also encouraged and excluded (7th and 8th position).

Discussion
As the above analysis demonstrates, grammatical subjects and objects function as a rough proxy for examining agency, illuminating who tends to be the doer and who the done-to: the broad lines of which NPs have high "subjectness" coincide with one's intuition of the prevailing power relations. At least as interesting, however, is that the verbs attached to each further demonstrate their different roles. Juxtaposing the subjectness and the common verbs is particularly interesting: for instance, it is intriguing that in a corpus where employers appear in a not-so-favorable light (as resisting, refusing, and violating, among other things), they are nevertheless as a group more likely than workers to occupy a position of agency as subjects. On the other hand, the tensions between union leadership and rank-and-file are also revealed in, for example, the fact that "members" find themselves the object of verbs like discipline.

Future research
In the future, we hope to investigate whether SRL analysis would offer greater clarity in distinguishing agents from non-agents. We also hope to refine the preliminary verb analysis presented here by using verb categories as defined in VerbNet and FrameNet. In addition, we plan to combine the type of analysis presented here with an analysis of named entities; this might allow us to investigate not only the prominence of well-known figures, but possibly also questions like whether the rise of bottom-up approaches in the 1970s or the cultural turn of the 1990s resulted in a greater variety of named entities.