E Querying a relational database
Join operators allow you to merge different dataset using the same logic that you familiarized with learning MySql. In fact, you can actually use dplyr to connect to a database and query it using dplyr function that are translated in mysql syntax. To open a connection to the hotelreviews database:
con <- src_mysql(
host = "ba-isdsclass-programdev.lsu.edu",
port = 3306,
user = "",
password = "",
dbname = "hotelreviews")
src_tbls(con)
## [1] "author" "brand" "employee" "hotel"
## [5] "interaction" "lodgingchain" "response" "review"
- List all chain names and their respective brands. Order the list by chain and brand name.
tbl(con, 'lodgingchain') %>%
right_join(tbl(con, 'brand', by = c('chainId'))) %>%
select(chainName, brandName) %>% arrange(chainName, brandName)
## # Source: lazy query [?? x 2]
## # Database: mysql 5.6.19-log
## # [dario@ba-isdsclass-programdev.lsu.edu:/hotelreviews]
## # Ordered by: chainName, brandName
## chainName brandName
## <chr> <chr>
## 1 Accor Adagio
## 2 Accor Citigate
## 3 Accor Grand Mercure
## 4 Accor hotelF1
## 5 Accor ibis
## 6 Accor ibis Budget
## 7 Accor ibis Styles
## 8 Accor Mama Shelter
## 9 Accor Mercure
## 10 Accor MGallery
## # ... with more rows
- Report the name and segment of all the Hilton brands.
tbl(con, 'lodgingchain') %>%
inner_join(tbl(con, 'brand')) %>%
filter(chainName == 'Hilton Worldwide') %>%
select(brandName, brandSegment)
## # Source: lazy query [?? x 2]
## # Database: mysql 5.6.19-log
## # [dario@ba-isdsclass-programdev.lsu.edu:/hotelreviews]
## brandName brandSegment
## <chr> <chr>
## 1 Hilton Upper Upscale
## 2 Conrad Luxury
## 3 Home 2 Suites Upscale
## 4 Embassy Suites Upscale
## 5 Curio Luxury
## 6 Hampton Inn & Suites Upper Midscale
## 7 Homewood Suites Upscale
## 8 DoubleTree Upper Upscale
## 9 Waldorf Astoria Hotels & Resorts Luxury
## 10 Hampton Inn Upper Midscale
## # ... with more rows
- Report the authors’ names of those who reviewed a single hotel more than once. For each author, count the reviews written for each hotel. Order the output by author name and review by hotel.
tbl(con, 'author') %>%
inner_join(tbl(con, 'review')) %>%
inner_join(tbl(con, 'hotel')) %>% count(authorId, authorNickname, hotelId, sort = T)
## # Source: lazy query [?? x 4]
## # Database: mysql 5.6.19-log
## # [dario@ba-isdsclass-programdev.lsu.edu:/hotelreviews]
## # Groups: authorId, authorNickname
## # Ordered by: desc(n)
## authorId authorNickname hotelId n
## <dbl> <chr> <dbl> <dbl>
## 1 66772 PJBlue 84453 3
## 2 11379 doc038 75711 3
## 3 140650 poppop60 124839 3
## 4 30236 pepstar27 99288 3
## 5 104947 ralphunlv 81241 3
## 6 30971 TMCNEWS 208454 3
## 7 54188 Medalert 90957 3
## 8 148811 insurancesalesman 84095 3
## 9 100973 QueenRB 88460 3
## 10 124085 allaboutcomfort 223742 3
## # ... with more rows
- Report the count of the hotels that received no reviews.
tbl(con, 'hotel') %>%
select(hotelId, hotelName) %>%
anti_join(tbl(con, 'review')) %>%
summarise(totalNOreviews = n())
## # Source: lazy query [?? x 1]
## # Database: mysql 5.6.19-log
## # [dario@ba-isdsclass-programdev.lsu.edu:/hotelreviews]
## totalNOreviews
## <dbl>
## 1 935
- Report the number of hotels affiliated with each brand. Order (descending) by number of hotels.
tbl(con, 'lodgingchain') %>%
left_join(tbl(con, 'brand')) %>%
count(chainName, sort = T)
## # Source: lazy query [?? x 2]
## # Database: mysql 5.6.19-log
## # [dario@ba-isdsclass-programdev.lsu.edu:/hotelreviews]
## # Ordered by: desc(n)
## chainName n
## <chr> <dbl>
## 1 Marriott International 18
## 2 Accor 17
## 3 Wyndham Worldwide 16
## 4 Choice Hotels 12
## 5 Hilton Worldwide 11
## 6 Starwood Hotels and Resorts Worldwide 9
## 7 InterContinental Hotels Group (IHG) 9
## 8 Hyatt Hotels Corporation 7
## 9 Carlson Rezidor Hotel Group 7
## 10 Dusit Thani Group 6
## # ... with more rows
Report the hotel name and city location of all hotels that garnered exactly 100 reviews.
tbl(con, 'review') %>%
count(hotelId) %>% filter(n == 100) %>%
left_join(tbl(con, 'hotel')) %>%
select(hotelName, hotelCity)
## # Source: lazy query [?? x 2]
## # Database: mysql 5.6.19-log
## # [dario@ba-isdsclass-programdev.lsu.edu:/hotelreviews]
## hotelName hotelCity
## <chr> <chr>
## 1 Hotel Vertigo San Francisco
## 2 Omni Hotel at Independence Park Philadelphia
## 3 Bristol Hotel San Diego
## 4 Bahia Resort Hotel San Diego
## 5 Gramercy Park Hotel New York City
## 6 Hotel Rex, a Joie de Vivre hotel San Francisco
## 7 Crowne Plaza Austin Austin