I am trying to store my scraped data with scrapy to a SQL database but my code does not send anything while no error is mentioned when runned.
I am using my sql connector since I don’t manage to install MySQL-python. My SQL database seems to running well and when I run the code the traffic KB/s raise. Please find below my pipelines.py code.
import mysql.connector
from mysql.connector import errorcode
class CleaningPipeline(object):
class DatabasePipeline(object):
def _init_(self):
self.create_connection()
self.create_table()
def create_connection(self):
self.conn = mysql.connector.connect(
host = 'localhost',
user = 'root',
passwd = '********',
database = 'lecturesinparis_db'
)
self.curr = self.conn.cursor()
def create_table(self):
self.curr.execute("""DROP TABLE IF EXISTS mdl""")
self.curr.execute("""create table mdl(
title text,
location text,
startdatetime text,
lenght text,
description text,
)""")
def process_item(self, item, spider):
self.store_db(item)
return item
def store_db(self, item):
self.curr.execute("""insert into mdl values (%s,%s,%s,%s,%s)""", (
item['title'][0],
item['location'][0],
item['startdatetime'][0],
item['lenght'][0],
item['description'][0],
))
self.conn.commit()
Advertisement
Answer
You need to add the class in ITEM_PIPELINES
first to let the scrapy know i want to use this pipeline.
In your settings.py
file Update the lines below with your class name as following.
# https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
'projectname.pipelines.CleaningPipeline': 700,
'projectname.pipelines.DatabasePipeline': 800,
}
The numbers 700 and 800 shows in which order the pipelines will process data, it can be any integer between 1-1000. Pipelines will process items in the order based by this number, so pipeline with 700 would process data before the pipeline with 800.
Note: Replace the projectname in 'projectname.pipelines.CleaningPipeline'
with your actual projectname.