In this notebook I reproduce all the calculations, tables and figures from the paper “Human Rights and the Grammar of Corporate Social Responsibility.” The notebook does not reproduce the paper as a whole but ought to allow the paper’s readers to understand how the corpus was manipulated and reproduced.
R is designed to exploit a large population of libraries, each designed around a specific purpose. This paper relies heavily on libraries in the Tidyverse. Tidyverse libraries are designed to promote a specific approach to programming in R.1 I also rely on the Quanteda and Tidytext libraries. These two libraries’ authors have been at the fore in developing R’s potential for text mining.2
The CollocateR library was written to support the research contained in this paper. It will be available on github once the anonymous review process has finished. For now, you can download it as a zip file from the Supplementary materials, unzip and install using Devtools.
First, The following script will load the required libraries, installing any that are not already installed on your system. The ‘devtools’ library is used for installing local libraries. ‘Knitr’ and ‘kableExtra’ are used for publishing tables. ‘Cowplot’ is used to prepare figures.
# Install the following libraries if not already installed.
sapply(c("tidyverse", "tidytext", "quanteda", "devtools", "knitr", "kableExtra", "cowplot", "cleanNLP", "gridExtra"),
function(x)
if(!(x %in% rownames(installed.packages()))){install.packages(x)})
lapply(c("tidyverse", "tidytext", "quanteda", "devtools", "knitr", "kableExtra", "cowplot", "cleanNLP", "gridExtra"),
require,
character.only = TRUE)
if(!("collocateR" %in% rownames(installed.packages()))){ # To save time, check if installed (downloading to install locally)
url <- "https://github.com/cokelly/collocateR/archive/master.zip"
download.file(url = url, destfile = "collocater.zip")
unzip("collocater.zip", exdir = "collocateR")
devtools::install("collocateR/collocateR-master")
}
library(collocateR)
In this section I load two versions of the data, one complete and one with ‘stopwords’ and ‘lemmas’ removed. In order to load the data from scratch, please refer to the ‘GetDocs.R’ file. Otherwise both corpuses are contained here in ‘RDS’ files, which act as containers for r data. I then draw up some summary statistics. The remainder of this workbook draws in R scripts as functions in order to reproduce all the tables and figures in the paper.
docs <- readRDS("../data/docs.RDS")
docs_nostops <- readRDS("../data/docs_nostops.RDS") # This loads a version of the reports with 'stopwords' and lemmas removed.
The filenames for each report contains relevant details about the report. The file ‘meta.RDS’ contains the data in tabular form. The first five objects/rows for each are printed below:
head(names(docs))
## [1] "Energy_BRA_Petrobas_2007.pdf" "Energy_BRA_Petrobas_2008.pdf"
## [3] "Energy_BRA_Petrobas_2009.pdf" "Energy_BRA_Petrobas_2010.pdf"
## [5] "Energy_BRA_Petrobas_2011.pdf" "Energy_BRA_Petrobas_2012.pdf"
meta <- readRDS("../data/metadata.RDS")
head(meta)
## # A tibble: 6 x 6
## year country company sector filename `corpus positio~
## <dbl> <chr> <chr> <chr> <chr> <int>
## 1 2007 BRA Petrobas Energy Energy_BRA_Petrobas_2007.~ 1
## 2 2008 BRA Petrobas Energy Energy_BRA_Petrobas_2008.~ 2
## 3 2009 BRA Petrobas Energy Energy_BRA_Petrobas_2009.~ 3
## 4 2010 BRA Petrobas Energy Energy_BRA_Petrobas_2010.~ 4
## 5 2011 BRA Petrobas Energy Energy_BRA_Petrobas_2011.~ 5
## 6 2012 BRA Petrobas Energy Energy_BRA_Petrobas_2012.~ 6
# To export a table, run
# kable(head(meta), format = "html")
# calculate the word length
wordcount <- docs %>%
as_tibble %>%
unnest_tokens(., word, value) %>%
nrow
wordcount_nostops <- docs_nostops %>%
as_tibble %>%
unnest_tokens(., word, value) %>%
nrow
# Calculate the number of reports, the number of companies and the earliest and latest years.
numreps <- nrow(meta)
numcomps <- length(unique(meta$company))
numcountries <- length(unique(meta$country))
firstyear <- min(meta$year)
lastyear <- max(meta$year)
The corpus is 13,073,963 words long. With stopwords removed it is 8,602,056 words long.
The dataset consists of 347 corporate social responsibility reports published by 36 global mining and energy firms from 10 states between 1998 and 2017.
Table 1 is a ‘keyword in context’ table from BG Group’s 2015 report. I use the version that includes stopwords here and publish rows 6-16 (where more narrative passages begin). The table is formulated using Quanteda’s ‘kwic’ function:
# Isolate BG Group's 2010 report
bg_2015_meta <- meta %>% filter(company == "BG" & year == "2015")
bg_2015 <- docs[bg_2015_meta$`corpus position`]
# Extract kwics
table1 <- kwic(x = bg_2015, pattern = "rights", window = 6) %>%
as_tibble %>%
select(pre, keyword, post) %>%
slice(6:16)
# To generate a publication-ready table
#kable(table1, format = "html", caption = "KWICs for Rights, BG Group Sustainability Report 2015") %>%
# kable_styling("striped", full_width = TRUE)
table1
## # A tibble: 11 x 3
## pre keyword post
## <chr> <chr> <chr>
## 1 has a commitment to respect hum~ rights in line with the un guiding
## 2 guiding principles on business ~ rights we have an integrated approach~
## 3 have an integrated approach to ~ rights across our business to reflect~
## 4 priorities in 2015 managed pote~ rights issues related maintained our ~
## 5 voluntary principles on securit~ rights in myanmar we conducted a human
## 6 in myanmar we conducted a human rights bg published a public position~
## 7 our approach impact assessment ~ rights activities and performance int~
## 8 performance integrated approach~ rights across the business and manage~
## 9 the business and manage potenti~ rights issues through our existing in~
## 10 security and social performance~ rights taskforce made up of senior re~
## 11 to provide assurance that poten~ rights impacts were identified and ad~
In table 2 I use the corpus with lemmas and stopwords removed. I isolate BG Group reports between 2010 and 2015, calculate word frequencies for trigram words, calculate normalised pointwise mutual information scores, combine the two tables and sort by npmi.
# Isolate BG Group reports 2010-2015
bg_2010_2015_meta <- meta %>% filter(company == "BG" & year > "2009" & year < 2016)
bg_2010_2015 <- docs_nostops[bg_2010_2015_meta$`corpus position`]
# Note below "human right": "rights" has been lemmatied. Note also 'window = 1' gives us trigrams (as in words directly neighbouring "human right"). In the final table I slice out row one because it is 'human right' itself.
bg_trigram_freqs <- collocateR::get_freqs(bg_2010_2015, "human right", window = 1)
bg_trigram_npmi <- collocateR::get_npmi(bg_2010_2015, "human right", window = 1)
table2 <- bg_trigram_freqs %>%
full_join(., bg_trigram_npmi, by = "ngram") %>%
arrange(desc(npmi)) %>%
slice(2:11) %>%
add_column(rank = 1:10) %>%
select(rank, trigram = ngram,
`trigram recurrence` = `Collocate Frequency`,
`word recurrence` = `Document Frequency`,
npmi)
# To generate a publication-ready table
#kable(table2, format = "html", caption = "Top 10 Word frequencies for trigrams with human rights, BG sustainability reports 2010-2015, with lemmas and stopwords removed, sorted by npmi") %>%
# kable_styling("striped", full_width = TRUE)
table2
## # A tibble: 10 x 5
## rank trigram `trigram recurrence` `word recurrence` npmi
## <int> <chr> <int> <int> <dbl>
## 1 1 longstanding 2 3 0.534
## 2 2 vpshr 12 48 0.531
## 3 3 security 30 282 0.487
## 4 4 respect 7 36 0.476
## 5 5 institute 8 65 0.435
## 6 6 hsse 9 80 0.431
## 7 7 integrate 5 45 0.404
## 8 8 business 33 714 0.390
## 9 9 risk 27 547 0.389
## 10 10 hr 2 17 0.376
Table 3 compares collocates for three sets of words, using the corpus with stopwords and lemmas removed. In order to do this I convert larger words (“environmental”) into shorter (“environment”) and count them as one.
# combine longer words into short
docs2 <- docs_nostops %>%
str_replace_all(., "environmental", "environment") %>%
str_replace_all(., "safety", "safe")
# Get npmis for all the key terms
hr <- get_npmi(docs2, "human right") %>% slice(2:11) %>% select(`human rights` = ngram)
env <- get_npmi(docs2, "environment") %>% slice (2:11) %>% select(`environment/environmental` = ngram)
health <- get_npmi(docs2, "health") %>% slice(2:11) %>% select(`health` = ngram)
saf <- get_npmi(docs2, "safe") %>% slice(2:11) %>% select(`safe/safety` = ngram)
# Combine into table 3
table3 <- tibble(rank = 1:10) %>% bind_cols(hr) %>% bind_cols(env) %>% bind_cols(health) %>% bind_cols(saf)
# To generate a publication-ready table
#kable(table3, format = "html", caption = "Table 3: Words neighbouring âhuman rightsâ compared to words neighbouring either âenvironmentalâ or âenvironment,â to words neighbouring either âsafetyâ or âsafeâ, and to words neighbouring âhealthâ, full corpus.") %>%
# kable_styling("striped", full_width = TRUE)
table3
## # A tibble: 10 x 5
## rank `human rights` `environment/environmenta~ health `safe/safety`
## <int> <chr> <chr> <chr> <chr>
## 1 1 security protection safe health
## 2 2 universal impact occupatio~ occupational
## 3 3 respect social resort environment
## 4 4 declaration safe hygiene intrinsic
## 5 5 salient health environme~ workplace
## 6 6 voluntary stewardship care culture
## 7 7 principle performance wellbeing healthy
## 8 8 proclaim friendly wellness performance
## 9 9 complicit beyond zapolyarye hse
## 10 10 vpshr management workplace road
Figures 1 and 2 are more or ordinary bar charts. Note that I filter out a number of companies and group by period rather than year. The formula below also prepares data for figure 3.
# Filter docs by year and remove late-comer companies
meta_fig123 <- meta %>% filter(company != "EDF" & company != "Freeport-McMoran" & company != "Ecopetrol" & company != "Glencore" & company != "Randgold" & company != "Mosaic" & company != "Newmont" & company != "Occidental" & company != "Schlumberger" & company != "China Coal Energy")
docs_fig123 <- docs[meta_fig123$`corpus position`]
# Count UNGC, UNGPs, VPSHR mentions
mentions <- meta_fig123 %>%
# get UNGC and UNGP mentions
add_column(`UNGC mentions` = (str_count(docs_fig123, coll("global compact")) + str_count(docs_fig123, coll("united nations global compact"))), .before = 1) %>%
add_column(`UNGP mentions` = (str_count(docs_fig123, coll("united nations guiding principles")) + str_count(docs_fig123, coll("guiding principles on business"))), .before = 2) %>%
add_column(`VPSHR mentions` = (str_count(docs_fig123, coll("vpshr")) + str_count(docs_fig123, coll("voluntary principles on security and human rights"))), .before = 2) %>%
# Get wordcount
add_column(`wordcount` = str_count(docs_fig123, "\\S+")) %>%
# Years to periods
mutate(period = case_when(year >= 1998 & year <= 2010 ~ "1998-2010", year >= 2011 ~ "2011-2015")) %>%
#filter(country == "UK") %>%
# Group by company and period
group_by(company, period) %>%
# Add up mentions and word count cols
summarise_at(vars(`UNGC mentions`, `UNGP mentions`, `VPSHR mentions`, wordcount), sum) %>%
# Normalised mentions
mutate(`Normalised_UNCG_mentions` = `UNGC mentions`/wordcount) %>%
mutate(`Normalised_UNGP_mentions` = `UNGP mentions`/wordcount) %>%
mutate(`Normalised_VPHSR_mentions` = `VPSHR mentions`/wordcount) %>%
ungroup %>%
mutate_if(is.character, as.factor)
# Count overall mentions
mentions_of_ungc <- sum(mentions$`UNGC mentions`)
mentions_of_ungps <- sum(mentions$`UNGP mentions`)
mentions_of_vpshr <- sum(mentions$`VPSHR mentions`)
# Plot by company and period
ungc_mentions.p <- ggplot(mentions, aes(company, Normalised_UNCG_mentions)) +
geom_col() +
facet_grid(period ~ .) +
background_grid(major = "xy", minor = "none") +
#labs(title = "Figure 1: Mentions of the UN Global Compact", subtitle = "by company and period") +
ylab("Mentions \n normalised by document lengths") +
theme(axis.text.x = element_text(angle = 90, vjust = -0.01),
axis.text.y = element_blank(),
axis.title.y = element_text(size = 10),
axis.title.x = element_blank())
ungp_mentions.p <- ggplot(mentions, aes(company, Normalised_UNGP_mentions)) +
geom_col() +
facet_grid(period ~ .) +
background_grid(major = "xy", minor = "none") +
#labs(title = "Figure 2: Mentions of the UN Guiding Principles", subtitle = "by company and period") +
ylab("Mentions \n normalised by document lengths") +
theme(axis.text.x = element_text(angle = 90, vjust = -0.01),
axis.text.y = element_blank(),
axis.title.y = element_text(size = 10),
axis.title.x = element_blank())
ungc_mentions.p
ungp_mentions.p
Cowplot provides a simple function for saving plots.
ggsave("UNGC_mentions.png", ungc_mentions.p, dpi = 600, width = 14.82, height = 10.19, units = c("cm"))
ggsave("UNGPs_mentions.png", ungp_mentions.p, dpi = 600, width = 14.82, height = 10.19, units = c("cm"))
Table 4 simply calculates NPMI for “global compact” within BG Group’s reports.
bg_locs <- meta %>% filter(company == "BG")
table4 <- unlist(docs_nostops[bg_locs$`corpus position`]) %>%
get_npmi(., "global compact", ngram = 1) %>%
slice(2:21) %>% select(collocate = ngram) %>% add_column(rank = 1:20, .before = 1)
# To generate a publication-ready table
#kable(table4, format = "html", caption = "Table 4: Top 20 collocates with âGlobal Compactâ, BG Group, ranked by npmi.") %>%
# kable_styling("striped", full_width = TRUE)
table4
## # A tibble: 20 x 2
## rank collocate
## <int> <chr>
## 1 1 un
## 2 2 ungc
## 3 3 reaffirm
## 4 4 ten
## 5 5 iogp
## 6 6 unite
## 7 7 signatory
## 8 8 sign
## 9 9 nation
## 10 10 gri
## 11 11 declaration
## 12 12 universal
## 13 13 gl
## 14 14 fulfil
## 15 15 toxic
## 16 16 ipieca
## 17 17 remain
## 18 18 dnv
## 19 19 principle
## 20 20 commit
Following from Figures 1 and 2 above.
vpshr_mentions.p <- ggplot(mentions, aes(company, Normalised_VPHSR_mentions)) +
geom_col() +
facet_grid(period ~ .) +
background_grid(major = "xy", minor = "none") +
#labs(title = "Figure 3: Mentions of the Voluntary Principles on Security and Human Rights", subtitle = "by company and period") +
ylab("Mentions \n normalised by document lengths") +
theme(axis.text.x = element_text(angle = 90, vjust = -0.01),
axis.text.y = element_blank(),
axis.title.y = element_text(size = 10),
axis.title.x = element_blank())
vpshr_mentions.p
Table 5 is perhaps the most computationally complex table in the paper. I draw on a, SpacY (Arnold, 2017 link; Honnibal and Johnson, 2015 link), a Python package through the cleanNLP R library in order to isolate verbs within the ranking of collocates for the UN Global Compact. From there the table is straightforward, but the process is time-consuming. On a Windows pc, Visual Studio C++ build tools are also required.
# Merge "ungc" and "global compact" and "united nations global compact" and "un global compact"
docs_verbs <- docs_nostops %>%
str_replace_all(., "united nations global compact", "ungc") %>%
str_replace_all(., "un global compact", "ungc") %>%
str_replace_all(., "global compact", "ungc") %>%
str_replace_all(., "unglobalcompact", "ungc") %>%
str_replace_all(., "voluntary principles on security and human rights", "vpshr")
ungc_npmi <- get_npmi(docs_verbs, "ungc")
vpshr_npmi <- get_npmi(docs_verbs, "vpshr")
ungc_words <- ungc_npmi %>% # create a character vector of ngrams to categorise through CleanNLP
select(ngram) %>%
unlist %>%
as.character %>%
paste(., sep = " ", collapse = " ")
vpshr_words <- vpshr_npmi %>%
select(ngram) %>%
unlist %>%
as.character %>%
paste(., sep = " ", collapse = " ")
# First, initialise SpaCy in order to use with cleanNLP (you may need to install SpaCy using "pip install spacy", in Windows from an administrator console. More information here: https://spacy.io/).
cnlp_init_spacy()
## Loading required namespace: reticulate
# Generate tokens
writeLines(ungc_words, tfungc <- tempfile())
anno_ungc <- cnlp_annotate(tfungc)
# Generate table by creating tibble of categorised tokens, isolating verbs and left joining with npmi scores
table5_ungc_raw <- cnlp_get_token(anno_ungc) %>% # Tokenise
filter(upos == "VERB") %>% # Isolate "universal parts of speech" classification as verbs
select(ngram = word) %>% # Select the ngrams. The cleanNLP package also delivers lemmas but the corpus had already been lemmatised
left_join(., ungc_npmi, by = "ngram") # (Re)join with npmi score
# And again for teh vpshr
writeLines(vpshr_words, tfvpshr <- tempfile())
anno_vpshr <- cnlp_annotate(tfvpshr)
# Generate table by creating tibble of categorised tokens, isolating verbs and left joining with npmi scores
table5_vpshr_raw <- cnlp_get_token(anno_vpshr) %>% # Tokenise
filter(upos == "VERB") %>% # Isolate "universal parts of speech" classification as verbs
select(ngram = word) %>% # Select the ngrams. The cleanNLP package also delivers lemmas but the corpus had already been lemmatised
left_join(., vpshr_npmi, by = "ngram") # (Re)join with npmi score
Note that the resulting table is not very tidy. The NLP algorithm returns a lot of acronyms as verbs (also words like ‘danish’). I have to select complete or near complete words manually to complete Table5.
table5_left <- table5_ungc_raw %>%
filter(., ngram == "disappear" | ngram == "join" | ngram == "disclo" | ngram == "found" | ngram == "assure" | ngram == "participate" | ngram == "submit" | ngram == "rotate" | ngram == "reflected" | ngram == "indica" | ngram == "disclose" | ngram == "invite" | ngram == "illustrate" | ngram == "lead" | ngram == "appear" | ngram == "observe" | ngram == "articulate" | ngram == "promote" | ngram == "harmonize" | ngram == "initiate" | ngram == "link" | ngram == "stockpile" | ngram == "serve" | ngram == "find" | ngram == "belong" | ngram == "continue" | ngram == "become" | ngram == "describe" | ngram == "steer" | ngram == "set" | ngram == "hope" | ngram == "remain" | ngram == "include" | ngram == "draw" | ngram == "seek" | ngram == "declare" | ngram == "do" | ngram == "communicate" | ngram == "ask" | ngram == "accept" | ngram == "verify" | ngram == "stimulate" | ngram == "keep" | ngram == "reflect" | ngram == "call" | ngram == "govern" | ngram == "expand" | ngram == "strive" | ngram == "embed" | ngram == "implement" | ngram == "adopt" | ngram == "believe" | ngram == "break" | ngram == "look" | ngram == "stand" | ngram == "intend" | ngram == "prepare" | ngram == "indicate" | ngram == "realize" | ngram == "involve" | ngram == "obtain" | ngram == "rely" | ngram == "relate" | ngram == "live" | ngram == "represent" | ngram == "make" | ngram == "gather" | ngram == "provide" | ngram == "emerge" | ngram == "establish" | ngram == "take" | ngram == "pursue" | ngram == "expect" | ngram == "achieve" | ngram == "adapt" | ngram == "apply" | ngram == "read" | ngram == "see" | ngram == "strengthen" | ngram == "put" | ngram == "identify" | ngram == "protect" | ngram == "learn" | ngram == "surround" | ngram == "vary" | ngram == "ensure" | ngram == "develop" | ngram == "build" | ngram == "deliver" | ngram == "carry" | ngram == "finance" | ngram == "enter" | ngram == "add" | ngram == "affect" | ngram == "certify" | ngram == "invest" | ngram == "enable" | ngram == "reach" | ngram == "assess" | ngram == "centre" | ngram == "begin" | ngram == "undertake" | ngram == "enhance" | ngram == "need" | ngram == "distribute" | ngram == "generate" | ngram == "agree" | ngram == "go" | ngram == "hold" | ngram == "may" | ngram == "project" | ngram == "maintain" | ngram == "collect" | ngram == "improve" | ngram == "create" | ngram == "consider" | ngram == "operate" | ngram == "come" | ngram == "manage" | ngram == "give" | ngram == "be" | ngram == "know" | ngram == "mean" | ngram == "grow" | ngram == "allow" | ngram == "exclude" | ngram == "start" | ngram == "perform" | ngram == "cost" | ngram == "have" | ngram == "receive" | ngram == "must" | ngram == "offer" | ngram == "reduce")
table5_right <- table5_vpshr_raw %>%
filter(., ngram == "implement" | ngram =="read" | ngram =="promote" | ngram =="join" | ngram =="enter" | ngram =="seek" | ngram =="submit" | ngram =="agree" | ngram =="expect" | ngram =="protect" | ngram =="undertake" | ngram =="avoid" | ngram =="identify" | ngram =="carry" | ngram =="set" | ngram =="become" | ngram =="contain" | ngram =="reflect" | ngram =="introduce" | ngram =="develop" | ngram =="come" | ngram =="engage" | ngram =="provide" | ngram =="include" | ngram =="prepare" | ngram =="establish" | ngram =="find" | ngram =="begin" | ngram =="ensure" | ngram =="support" | ngram =="apply" | ngram =="maintain" | ngram =="integrate" | ngram =="require" | ngram =="understand" | ngram =="see" | ngram =="use" | ngram =="can" | ngram =="aim" | ngram =="create" | ngram =="continue" | ngram =="operate" | ngram =="follow" | ngram =="manage" | ngram =="improve" | ngram =="increase")
This makes the basis of Table 5, with some manual editing to complete words (so “disclo” to “disclose”) etc.
To view the words that were removed, try:
table5_Left_removed <- table5_ungc_raw %>%
filter(., ngram != "disappear" & ngram != "join" & ngram != "disclo" & ngram != "found" & ngram != "assure" & ngram != "participate" & ngram != "submit" & ngram != "rotate" & ngram != "reflected" & ngram != "indica" & ngram != "disclose" & ngram != "invite" & ngram != "illustrate" & ngram != "lead" & ngram != "appear" & ngram != "observe" & ngram != "articulate" & ngram != "promote" & ngram != "harmonize" & ngram != "initiate" & ngram != "link" & ngram != "stockpile" & ngram != "serve" & ngram != "find" & ngram != "belong" & ngram != "continue" & ngram != "become" & ngram != "describe" & ngram != "steer" & ngram != "set" & ngram != "hope" & ngram != "remain" & ngram != "include" & ngram != "draw" & ngram != "seek" & ngram != "declare" & ngram != "do" & ngram != "communicate" & ngram != "ask" & ngram != "accept" & ngram != "verify" & ngram != "stimulate" & ngram != "keep" & ngram != "reflect" & ngram != "call" & ngram != "govern" & ngram != "expand" & ngram != "strive" & ngram != "embed" & ngram != "implement" & ngram != "adopt" & ngram != "believe" & ngram != "break" & ngram != "look" & ngram != "stand" & ngram != "intend" & ngram != "prepare" & ngram != "indicate" & ngram != "realize" & ngram != "involve" & ngram != "obtain" & ngram != "rely" & ngram != "relate" & ngram != "live" & ngram != "represent" & ngram != "make" & ngram != "gather" & ngram != "provide" & ngram != "emerge" & ngram != "establish" & ngram != "take" & ngram != "pursue" & ngram != "expect" & ngram != "achieve" & ngram != "adapt" & ngram != "apply" & ngram != "read" & ngram != "see" & ngram != "strengthen" & ngram != "put" & ngram != "identify" & ngram != "protect" & ngram != "learn" & ngram != "surround" & ngram != "vary" & ngram != "ensure" & ngram != "develop" & ngram != "build" & ngram != "deliver" & ngram != "carry" & ngram != "finance" & ngram != "enter" & ngram != "add" & ngram != "affect" & ngram != "certify" & ngram != "invest" & ngram != "enable" & ngram != "reach" & ngram != "assess" & ngram != "centre" & ngram != "begin" & ngram != "undertake" & ngram != "enhance" & ngram != "need" & ngram != "distribute" & ngram != "generate" & ngram != "agree" & ngram != "go" & ngram != "hold" & ngram != "may" & ngram != "project" & ngram != "maintain" & ngram != "collect" & ngram != "improve" & ngram != "create" & ngram != "consider" & ngram != "operate" & ngram != "come" & ngram != "manage" & ngram != "give" & ngram != "be" & ngram != "know" & ngram != "mean" & ngram != "grow" & ngram != "allow" & ngram != "exclude" & ngram != "start" & ngram != "perform" & ngram != "cost" & ngram != "have" & ngram != "receive" & ngram != "must" & ngram != "offer" & ngram != "reduce")
table5_right_removed <- table5_vpshr_raw %>%
filter(., ngram != "implement" & ngram !="read" & ngram !="promote" & ngram !="join" & ngram !="enter" & ngram !="seek" & ngram !="submit" & ngram !="agree" & ngram !="expect" & ngram !="protect" & ngram !="undertake" & ngram !="avoid" & ngram !="identify" & ngram !="carry" & ngram !="set" & ngram !="become" & ngram !="contain" & ngram !="reflect" & ngram !="introduce" & ngram !="develop" & ngram !="come" & ngram !="engage" & ngram !="provide" & ngram !="include" & ngram !="prepare" & ngram !="establish" & ngram !="find" & ngram !="begin" & ngram !="ensure" & ngram !="support" & ngram !="apply" & ngram !="maintain" & ngram !="integrate" & ngram !="require" & ngram !="understand" & ngram !="see" & ngram !="use" & ngram !="can" & ngram !="aim" & ngram !="create" & ngram !="continue" & ngram !="operate" & ngram !="follow" & ngram !="manage" & ngram !="improve" & ngram !="increase")
To complete Table 5:
table5 <- table5_left[1:10,1] %>% bind_cols(table5_right[1:10,1]) %>% rename(UNGC = ngram) %>% rename(VSHR = ngram1) %>% add_column(rank = 1:10, .before = 1) %>% add_column(rank1 = 1:10, .before = 3)
table5
## # A tibble: 10 x 4
## rank UNGC rank1 VSHR
## <int> <chr> <int> <chr>
## 1 1 disappear 1 implement
## 2 2 join 2 read
## 3 3 disclo 3 promote
## 4 4 found 4 join
## 5 5 assure 5 enter
## 6 6 participate 6 seek
## 7 7 submit 7 submit
## 8 8 rotate 8 agree
## 9 9 reflected 9 expect
## 10 10 indica 10 protect
Figure 4 is a ‘lexical dispersion plot.’ Such plots are used to illustrate how keywords are distributed across bodies of literature. Each row represents a specific item in a corpus, normalised for length. In this case I aim to show how patterns of ‘risk’ and ‘human rights’ mentions differ.
meta_BG <- meta %>% filter(company == "BG")
docs_BG <- docs[meta_BG$`corpus position`]
names(docs_BG) <- paste("BG", meta_BG$year)
BG_hr_plot <- textplot_xray(kwic(unlist(docs_BG), phrase("human rights"))) + # xray plot (lexical dispersion) from Quantedq
theme(axis.title.y=element_blank(),
axis.text.y = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title = element_blank(),
plot.title = element_blank(),
strip.text.y = element_blank()) +
background_grid(major = "xy", size.major = 1)
BG_risk_plot <- textplot_xray(kwic(unlist(docs_BG), "risk")) +
theme(axis.title.y=element_blank(),
axis.text.y = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title = element_blank(),
plot.title = element_blank()) +
background_grid(major = "xy", size.major = 1)
# bg_docnames <- gridExtra::tableGrob(names(bg_words))
lexical_disp.p <- gridExtra::grid.arrange(BG_hr_plot, BG_risk_plot, ncol = 2, widths = 2:3)
lexical_disp.p
## TableGrob (1 x 2) "arrange": 2 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
# To add caption use:
#lexical_disp <- gridExtra::grid.arrange(BG_hr_plot, BG_risk_plot, ncol = 2, widths = 2:3, bottom = grid::textGrob("Human Rights and Risk compared\nBG reports 2001-2015", gp = grid::gpar(fontsize = 12, fontface = "bold")))
We can save this plot with:
ggsave(filename = "Human_Rights_Risk_Lexical_Dispersion_Plot.png", plot = lexical_disp.p, dpi = 600, width = 14.82, height = 10.19, units = c("cm"))
The following library versions were used to compile this document on the 18th of December 2018:
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252
## [2] LC_CTYPE=English_United Kingdom.1252
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] bindrcpp_0.2.2 collocateR_0.8 gridExtra_2.3 cleanNLP_2.3.0
## [5] cowplot_0.9.3 kableExtra_0.9.0 knitr_1.21 usethis_1.4.0
## [9] devtools_2.0.1 quanteda_1.3.14 tidytext_0.2.0 forcats_0.3.0
## [13] stringr_1.3.1 dplyr_0.7.8 purrr_0.2.5 readr_1.3.0
## [17] tidyr_0.8.2 tibble_1.4.2 ggplot2_3.1.0 tidyverse_1.2.1
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-137 fs_1.2.6 lubridate_1.7.4
## [4] httr_1.4.0 rprojroot_1.3-2 SnowballC_0.5.1
## [7] tools_3.5.1 backports_1.1.3 utf8_1.1.4
## [10] R6_2.3.0 lazyeval_0.2.1 colorspace_1.3-2
## [13] withr_2.1.2 tidyselect_0.2.5 prettyunits_1.0.2
## [16] processx_3.2.1 compiler_3.5.1 cli_1.0.1
## [19] rvest_0.3.2 xml2_1.2.0 desc_1.2.0
## [22] labeling_0.3 scales_1.0.0 callr_3.1.0
## [25] digest_0.6.18 rmarkdown_1.11 pkgconfig_2.0.2
## [28] htmltools_0.3.6 sessioninfo_1.1.1 rlang_0.3.0.1
## [31] readxl_1.1.0 rstudioapi_0.8 bindr_0.1.1
## [34] generics_0.0.2 jsonlite_1.6 tokenizers_0.2.1
## [37] magrittr_1.5 Matrix_1.2-15 Rcpp_1.0.0
## [40] munsell_0.5.0 fansi_0.4.0 reticulate_1.10
## [43] stringi_1.2.4 yaml_2.2.0 pkgbuild_1.0.2
## [46] plyr_1.8.4 grid_3.5.1 crayon_1.3.4
## [49] lattice_0.20-38 haven_2.0.0 hms_0.4.2
## [52] ps_1.2.1 pillar_1.3.1 reshape2_1.4.3
## [55] stopwords_0.9.0 pkgload_1.0.2 fastmatch_1.1-0
## [58] glue_1.3.0 evaluate_0.12 data.table_1.11.8
## [61] remotes_2.0.2 RcppParallel_4.4.2 modelr_0.1.2
## [64] cellranger_1.1.0 gtable_0.2.0 assertthat_0.2.0
## [67] xfun_0.4 broom_0.5.1 janeaustenr_0.1.5
## [70] viridisLite_0.3.0 memoise_1.1.0 spacyr_1.0
Wickham H, Tidyverse: Easily Install and Load ‘Tidyverse’ Packages (2017), https://www.tidyverse.org.↩
Benoit K, Quanteda: Quantitative Analysis of Textual Data (2018), https://quanteda.io; Silge J and Robinson D, Tidytext: Text Mining and Analysis Using Tidy Data Principles in R (2018), https://cran.r-project.org/web/packages/tidytext/index.html.↩