Basic Scrapy tutorial
Have you tried to scrap a website using python? If yes then, you have probably used beautifulsoup. But in this tutorial I will cover scrapy for website scrapping. Scrapy is a opensource an open source and collaborative framework for extracting the data you need from websites.
Let's start the tutorial
Step 1 : Make virtual environment in python link
python -m venv virtualenvname
Step 2 : Active the virtual environment
virtualenvname\Scripts\activate
Step 3 : Install scrapy
pip install Scrapy
Step 4 : Make a scrapy project
scrapy startproject tutorial
Step 4 : Go to the project
cd tutorial
.Make a spiderscrapy genspider spidername websiteurl
scrapy genspider pythonspider https://en.wikipedia.org/wiki/Python_(programming_language)
You will see a file named pythonspider inside spiders folder
Now if you see inside your file, you will see some code.
import scrapy
class PythonspiderSpider(scrapy.Spider):
name = "pythonspider"
allowed_domains = ["en.wikipedia.org"]
start_urls = ["https://en.wikipedia.org/wiki/Python_(programming_language)"]
def parse(self, response):
pass
We have to write our scrapping code inside parse function. Let's begin
Go to this url https://en.wikipedia.org/wiki/Python_(programming_language), and see there is a headline Python (programming language) . We will going to get this string from this website
Inside parse function :
import scrapy
class PythonspiderSpider(scrapy.Spider):
name = "pythonspider"
allowed_domains = ["en.wikipedia.org"]
start_urls = ["https://en.wikipedia.org/wiki/Python_(programming_language)"]
def parse(self, response):
headline = response.css('span.mw-page-title-main::text').get()
print('------------------Output-------------------')
print(headline)
print('------------------Output-------------------')
Now type this command scrapy crawl pythonspider
You will see the output.