E Querying a relational database

Join operators allow you to merge different dataset using the same logic that you familiarized with learning MySql. In fact, you can actually use dplyr to connect to a database and query it using dplyr function that are translated in mysql syntax. To open a connection to the hotelreviews database:

con <- src_mysql(
            host = "ba-isdsclass-programdev.lsu.edu", 
            port = 3306, 
            user = "", 
            password = "", 
            dbname = "hotelreviews")
src_tbls(con)
## [1] "author"       "brand"        "employee"     "hotel"       
## [5] "interaction"  "lodgingchain" "response"     "review"
  1. List all chain names and their respective brands. Order the list by chain and brand name.
tbl(con, 'lodgingchain') %>% 
  right_join(tbl(con, 'brand', by = c('chainId'))) %>% 
  select(chainName, brandName) %>% arrange(chainName, brandName)
## # Source:     lazy query [?? x 2]
## # Database:   mysql 5.6.19-log
## #   [dario@ba-isdsclass-programdev.lsu.edu:/hotelreviews]
## # Ordered by: chainName, brandName
##    chainName     brandName
##        <chr>         <chr>
##  1     Accor        Adagio
##  2     Accor      Citigate
##  3     Accor Grand Mercure
##  4     Accor       hotelF1
##  5     Accor          ibis
##  6     Accor   ibis Budget
##  7     Accor   ibis Styles
##  8     Accor  Mama Shelter
##  9     Accor       Mercure
## 10     Accor      MGallery
## # ... with more rows
  1. Report the name and segment of all the Hilton brands.
tbl(con, 'lodgingchain') %>% 
  inner_join(tbl(con, 'brand')) %>% 
  filter(chainName == 'Hilton Worldwide') %>% 
    select(brandName, brandSegment)
## # Source:   lazy query [?? x 2]
## # Database: mysql 5.6.19-log
## #   [dario@ba-isdsclass-programdev.lsu.edu:/hotelreviews]
##                           brandName   brandSegment
##                               <chr>          <chr>
##  1                           Hilton  Upper Upscale
##  2                           Conrad         Luxury
##  3                    Home 2 Suites        Upscale
##  4                   Embassy Suites        Upscale
##  5                            Curio         Luxury
##  6             Hampton Inn & Suites Upper Midscale
##  7                  Homewood Suites        Upscale
##  8                       DoubleTree  Upper Upscale
##  9 Waldorf Astoria Hotels & Resorts         Luxury
## 10                      Hampton Inn Upper Midscale
## # ... with more rows
  1. Report the authors’ names of those who reviewed a single hotel more than once. For each author, count the reviews written for each hotel. Order the output by author name and review by hotel.
tbl(con, 'author') %>% 
  inner_join(tbl(con, 'review')) %>% 
  inner_join(tbl(con, 'hotel')) %>% count(authorId, authorNickname, hotelId, sort = T)
## # Source:     lazy query [?? x 4]
## # Database:   mysql 5.6.19-log
## #   [dario@ba-isdsclass-programdev.lsu.edu:/hotelreviews]
## # Groups:     authorId, authorNickname
## # Ordered by: desc(n)
##    authorId    authorNickname hotelId     n
##       <dbl>             <chr>   <dbl> <dbl>
##  1    66772            PJBlue   84453     3
##  2    11379            doc038   75711     3
##  3   140650          poppop60  124839     3
##  4    30236         pepstar27   99288     3
##  5   104947         ralphunlv   81241     3
##  6    30971           TMCNEWS  208454     3
##  7    54188          Medalert   90957     3
##  8   148811 insurancesalesman   84095     3
##  9   100973           QueenRB   88460     3
## 10   124085   allaboutcomfort  223742     3
## # ... with more rows
  1. Report the count of the hotels that received no reviews.
tbl(con, 'hotel') %>% 
  select(hotelId, hotelName) %>% 
  anti_join(tbl(con, 'review')) %>% 
    summarise(totalNOreviews = n())
## # Source:   lazy query [?? x 1]
## # Database: mysql 5.6.19-log
## #   [dario@ba-isdsclass-programdev.lsu.edu:/hotelreviews]
##   totalNOreviews
##            <dbl>
## 1            935
  1. Report the number of hotels affiliated with each brand. Order (descending) by number of hotels.
tbl(con, 'lodgingchain') %>% 
  left_join(tbl(con, 'brand')) %>% 
  count(chainName, sort = T)
## # Source:     lazy query [?? x 2]
## # Database:   mysql 5.6.19-log
## #   [dario@ba-isdsclass-programdev.lsu.edu:/hotelreviews]
## # Ordered by: desc(n)
##                                chainName     n
##                                    <chr> <dbl>
##  1                Marriott International    18
##  2                                 Accor    17
##  3                     Wyndham Worldwide    16
##  4                         Choice Hotels    12
##  5                      Hilton Worldwide    11
##  6 Starwood Hotels and Resorts Worldwide     9
##  7   InterContinental Hotels Group (IHG)     9
##  8              Hyatt Hotels Corporation     7
##  9           Carlson Rezidor Hotel Group     7
## 10                     Dusit Thani Group     6
## # ... with more rows

Report the hotel name and city location of all hotels that garnered exactly 100 reviews.

tbl(con, 'review') %>% 
    count(hotelId) %>% filter(n == 100) %>% 
    left_join(tbl(con, 'hotel')) %>% 
      select(hotelName, hotelCity)
## # Source:   lazy query [?? x 2]
## # Database: mysql 5.6.19-log
## #   [dario@ba-isdsclass-programdev.lsu.edu:/hotelreviews]
##                          hotelName     hotelCity
##                              <chr>         <chr>
## 1                    Hotel Vertigo San Francisco
## 2  Omni Hotel at Independence Park  Philadelphia
## 3                    Bristol Hotel     San Diego
## 4               Bahia Resort Hotel     San Diego
## 5              Gramercy Park Hotel New York City
## 6 Hotel Rex, a Joie de Vivre hotel San Francisco
## 7              Crowne Plaza Austin        Austin