Chapter 6 Working with Strings

6.1 Remove a pattern from a string

price_table=tribble(~car, ~price,
        "Corvette", "$65,000",
        "Mustang GT", "$40,000")

# BASE R METHOD (sub by replacing something with nothing)
gsub("\\$", "",price_table$price) # (pattern, replace with, object$column)
## [1] "65,000" "40,000"
# TIDYVERSE METHOD
str_remove(price_table$price, pattern = "\\$")
## [1] "65,000" "40,000"

You can remove numbers by typing "[:digit:]"

panss_sem_data$cgi_sev=str_remove(panss_sem_data$cgi_sev, pattern = "[:digit:]")

6.2 Replace one pattern in a string with another

Tidyverse command: str_replace() or str_replace_all() Base R command: gsub()

# base R
gsub(mtcars, replacement = )

#tidyverse
str_replace_all(iris$Species, pattern=c("e", "a"), replacement="ZZZZ") |> 
  head()

str_replace(iris$Species, pattern=c("e", "a"), replacement="ZZZZ") |> 
  head()

6.3 Find (i.e., filter for) all instances of a string

Useful for finding very specific things inside a column (e.g., one particular person’s name in a roster of names; everyone with a particular last name)

Tidyverse command: str_detect() Base R command: grepl()

Note both must be nested inside of filter()

cars_df=rownames_to_column(mtcars, var = "car")

# base R
cars_df |> filter(grepl("Firebird", car))

# tidyverse
cars_df %>% filter(str_detect(car,"Firebird"))

You can also search for multiple strings simultaneously by including the “or” logical operator inside the quotes.

cars_df |> filter(str_detect(car, "Firebird|Fiat"))

You can also include the negation logical operator to filter for all instances except those with the specified string.

# base R
cars_df |> filter(!(grepl("Pontiac", car)))

# tidyverse
cars_df |> filter(!(str_detect(car, "Pontiac")))

6.4 Drop all rows from a data set that contain a certain string

# Tidyverse method
cars_df |> 
  filter(str_detect(car, "Merc", negate = TRUE)) #including negate=TRUE will negate all rows with the matched string

# base R
cars_df[!grepl("Merc", cars_df$car),]

6.5 Force all letters to lower case

Use stringr::str_to_lower()

blah=tribble(~A, ~B,
             "A","X",
             "A","X")

blah
## # A tibble: 2 x 2
##   A     B    
##   <chr> <chr>
## 1 A     X    
## 2 A     X
blah$A=str_to_lower(blah$A)

blah
## # A tibble: 2 x 2
##   A     B    
##   <chr> <chr>
## 1 a     X    
## 2 a     X