A short example using bow() and scrape() from Polite.
Dmytro Perepolkin’s package “polite” has the goal of promoting responsible web etiquette.
I’m posting this very short example as a template of sorts, mostly for my own benefit, mostly so I don’t lose it.
if("polite" %in% installed.packages() == F) {
devtools::install_github("dmi3kno/polite")
}
library(polite)
library(rvest)
library(tidyverse)
url = "https://en.wikipedia.org/wiki/Comparison_of_UPnP_AV_media_servers"
xpath = '//*[@id="mw-content-text"]/div/table'
dframe <-
url %>%
bow() %>%
scrape() %>%
html_node(xpath = xpath) %>%
html_table() %>%
as_tibble()
dframe %>% filter(`Unix-like` == "Yes",
License != "Prop.",
`Still Supported` == "Yes") %>%
select(-Windows,-Audio,-Images,-"OS X",-`Multilingual[1]`) %>%
DT::datatable(options = list(
pageLength = -1,
lengthChange = FALSE,
searching = FALSE,
paging = FALSE,
ordering = FALSE
))
The package does a lot more, but this is my most basic, template for using it. I know that # once you bow(), you don’t have to do it again in a session. You can simply nod() and then scrape() again.
You can read about the rest of the “polite” features at Dmytro Perepolkin’s git repository, at https://github.com/dmi3kno/polite.
Perepolkin, Dmytro. Be Nice on the Web. R, 2018. https://github.com/dmi3kno/polite. (accessed October 30, 2018).
Wikipedia contributors, “Comparison of UPnP AV media servers,” Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Comparison_of_UPnP_AV_media_servers&oldid=837290833 (accessed October 30, 2018).
For attribution, please cite this work as
Soika (2018, Oct. 30). The Exhaust Pipe: My new favorite way to scrape a wiki table. Retrieved from https://friendimaginary.github.io/posts/2018-10-30-my-new-favorite-way-to-scrape-a-wiki-table/
BibTeX citation
@misc{soika2018my, author = {Soika, Patryk}, title = {The Exhaust Pipe: My new favorite way to scrape a wiki table}, url = {https://friendimaginary.github.io/posts/2018-10-30-my-new-favorite-way-to-scrape-a-wiki-table/}, year = {2018} }