My new favorite way to scrape a wiki table

A short example using bow() and scrape() from Polite.

Patryk Soika true
10-30-2018

Dmytro Perepolkin’s package “polite” has the goal of promoting responsible web etiquette.

I’m posting this very short example as a template of sorts, mostly for my own benefit, mostly so I don’t lose it.


if("polite" %in% installed.packages() == F) {
  devtools::install_github("dmi3kno/polite")
}
library(polite)
library(rvest)
library(tidyverse)

url = "https://en.wikipedia.org/wiki/Comparison_of_UPnP_AV_media_servers"
xpath = '//*[@id="mw-content-text"]/div/table'

dframe <-
  url %>%
  bow() %>%
  scrape() %>%
  html_node(xpath = xpath) %>%
  html_table() %>%
  as_tibble()


dframe %>% filter(`Unix-like` == "Yes",
                  License != "Prop.",
                  `Still Supported` == "Yes") %>%
  select(-Windows,-Audio,-Images,-"OS X",-`Multilingual[1]`) %>%
  DT::datatable(options = list(
    pageLength = -1,
    lengthChange = FALSE,
    searching = FALSE,
    paging = FALSE,
    ordering = FALSE
  ))

The package does a lot more, but this is my most basic, template for using it. I know that # once you bow(), you don’t have to do it again in a session. You can simply nod() and then scrape() again.

You can read about the rest of the “polite” features at Dmytro Perepolkin’s git repository, at https://github.com/dmi3kno/polite.


References:

Citation

For attribution, please cite this work as

Soika (2018, Oct. 30). The Exhaust Pipe: My new favorite way to scrape a wiki table. Retrieved from https://friendimaginary.github.io/posts/2018-10-30-my-new-favorite-way-to-scrape-a-wiki-table/

BibTeX citation

@misc{soika2018my,
  author = {Soika, Patryk},
  title = {The Exhaust Pipe: My new favorite way to scrape a wiki table},
  url = {https://friendimaginary.github.io/posts/2018-10-30-my-new-favorite-way-to-scrape-a-wiki-table/},
  year = {2018}
}