Basic Scrapy tutorial

Have you tried to scrap a website using python? If yes then, you have probably used beautifulsoup. But in this tutorial I will cover scrapy for website scrapping. Scrapy is a opensource an open source and collaborative framework for extracting the data you need from websites.

Let's start the tutorial

Step 1 : Make virtual environment in python link
```
  python -m venv virtualenvname
```
Step 2 : Active the virtual environment
```
  virtualenvname\Scripts\activate
```

Step 3 : Install scrapy pip install Scrapy
Step 4 : Make a scrapy project
```
  scrapy startproject tutorial
```

Step 4 : Go to the project cd tutorial .Make a spider scrapy genspider spidername websiteurl
```
  scrapy genspider pythonspider https://en.wikipedia.org/wiki/Python_(programming_language)
```
You will see a file named pythonspider inside spiders folder

Now if you see inside your file, you will see some code.

import scrapy


class PythonspiderSpider(scrapy.Spider):
    name = "pythonspider"
    allowed_domains = ["en.wikipedia.org"]
    start_urls = ["https://en.wikipedia.org/wiki/Python_(programming_language)"]

    def parse(self, response):
        pass

We have to write our scrapping code inside parse function. Let's begin

Go to this url https://en.wikipedia.org/wiki/Python_(programming_language), and see there is a headline Python (programming language) . We will going to get this string from this website

Inside parse function :

import scrapy


class PythonspiderSpider(scrapy.Spider):
    name = "pythonspider"
    allowed_domains = ["en.wikipedia.org"]
    start_urls = ["https://en.wikipedia.org/wiki/Python_(programming_language)"]

    def parse(self, response):
        headline = response.css('span.mw-page-title-main::text').get()
        print('------------------Output-------------------')
        print(headline)
        print('------------------Output-------------------')

Now type this command scrapy crawl pythonspider

You will see the output.

Basic Scrapy tutorial

You will see a file named pythonspider inside spiders folder