I am trying to store my scraped data with scrapy to a SQL database but my code does not send anything while no error is mentioned when runned.
I am using my sql connector since I don’t manage to install MySQL-python. My SQL database seems to running well and when I run the code the traffic KB/s raise. Please find below my pipelines.py code.
import mysql.connector from mysql.connector import errorcode class CleaningPipeline(object): ... class DatabasePipeline(object): def _init_(self): self.create_connection() self.create_table() def create_connection(self): self.conn = mysql.connector.connect( host = 'localhost', user = 'root', passwd = '********', database = 'lecturesinparis_db' ) self.curr = self.conn.cursor() def create_table(self): self.curr.execute("""DROP TABLE IF EXISTS mdl""") self.curr.execute("""create table mdl( title text, location text, startdatetime text, lenght text, description text, )""") def process_item(self, item, spider): self.store_db(item) return item def store_db(self, item): self.curr.execute("""insert into mdl values (%s,%s,%s,%s,%s)""", ( item['title'][0], item['location'][0], item['startdatetime'][0], item['lenght'][0], item['description'][0], )) self.conn.commit()
Advertisement
Answer
You need to add the class in ITEM_PIPELINES
first to let the scrapy know i want to use this pipeline.
In your settings.py
file Update the lines below with your class name as following.
# https://docs.scrapy.org/en/latest/topics/item-pipeline.html ITEM_PIPELINES = { 'projectname.pipelines.CleaningPipeline': 700, 'projectname.pipelines.DatabasePipeline': 800, }
The numbers 700 and 800 shows in which order the pipelines will process data, it can be any integer between 1-1000. Pipelines will process items in the order based by this number, so pipeline with 700 would process data before the pipeline with 800.
Note: Replace the projectname in 'projectname.pipelines.CleaningPipeline'
with your actual projectname.