This month we are looking at Data Scraping (collecting data from the web itself). The web is full of data in all shapes and forms that are just waiting to be ‘caught’ by you. From the page listing of all the waterproof jackets on Berghaus website, to the list of explorers in Wikipedia; from the reviews of the best places to stay in the Amazon on TripAdvisor to tweets about the Northern Lights. Yes, this month we are dusting off our hiking boots, busting out our binoculars, and heading off in search of #DataInTheWild!
What is Data Scraping?
Web pages contain a wealth of information, but this is usually locked-up in not-very-human-friendly text-based mark-up languages (such as HTML). Scraping lets you extract the data from web pages (be they in HTML, JSON, or a range of other different formats), and transform this data into something that a human can understand. “But isn’t scraping hard?” I hear you cry. The answer is “no, it doesn’t have to be”. Just as Tableau makes visualizing your data a breeze, there are many tools (like Import.io) that make data scraping only a click away. Even when you need to be able to code there are now many great resources that can help you learn to code (check out our post from January Learn to be a Python Charmer), and many resources like coding collaboration site GitHub where you can find code to help you get started.
be like Ash Ketchum from Pokemon and catch all the data!
- Guest blog post by Import.io about how to scrape data using Import.io
- Live Q&A webinar with Alex Gimson Community Manager from Import.io to answer all your Import.io data scraping questions!
- Blog post about using python to scrape Twitter data
- Blog post by Jewel Loree about getting data out of Wikipedia
- Be sure to check out our Viz of the Day page on our amazing new website!! for great viz inspiration
Let us know if you are making, or have made, a great Tableau viz using data you scraped from the web by tweeting #DataInTheWild.Tweet it! Who knows, we may highlight the viz as Viz of the Day.
Also tune in for our live Q&A about with Alex Gimson from Import.io to learn more about web scraping with Import.io – it’s your chance to have your Import.io questions answered; no question is too big or small!