Basic Scrapy tutorial
Have you tried to scrap a website using python? If yes then, you have probably used beautifulsoup. But in this tutorial I will cover scrapy for website scrapping. Scrapy is a opensource an open source and collaborative framework for extracting the data you need from websites.
Let's start the tutorial
Step 1 : Make virtual environment in python link
python -m venv virtualenvname
Step 2 : Active the virtual environment
Step 3 : Install scrapy
pip install Scrapy
Step 4 : Make a scrapy project
scrapy startproject tutorial
Step 4 : Go to the project
cd tutorial
.Make a spiderscrapy genspider spidername websiteurl
scrapy genspider pythonspider
You will see a file named pythonspider inside spiders folder
Now if you see inside your file, you will see some code.
import scrapy
class PythonspiderSpider(scrapy.Spider):
name = "pythonspider"
allowed_domains = [""]
start_urls = [""]
def parse(self, response):
We have to write our scrapping code inside parse function. Let's begin
Go to this url, and see there is a headline Python (programming language) . We will going to get this string from this website
Inside parse function :
import scrapy
class PythonspiderSpider(scrapy.Spider):
name = "pythonspider"
allowed_domains = [""]
start_urls = [""]
def parse(self, response):
headline = response.css('').get()
Now type this command scrapy crawl pythonspider
You will see the output.