Our customers often ask us what's the best workflow for working with Scrapy projects. A popular approach we have seen and used in the past is to split the spiders folder (typically project/spiders) into two folders: project/spiders_prod and project/spiders_dev, and use the SPIDER_MODULES setting to control which spiders are loaded on each environment. This works reasonably well until you have to make changes to common code used by many spiders (ie. code outside the spiders folder), for example, common base spiders.
Nowadays, DVCs (in particular, git) have become more popular and people are quite used to branching, so we recommend using a simple git workflow (similar to GitHub flow) where you branch for every change you make. You keep all changes in a branch while they're being tested and finally merge to master when they're finished. This means that the master branch is always stable and contains only "production-ready" spiders.
If you are using our Scrapy Cloud platform, you can have 2 projects (myproject-dev, myproject-prod) and use myproject-dev to test the changes in your branch. scrapy deploy in Scrapy 0.17 now adds the branch name to the version name (when using version=GIT or version=HG), so you can see which branch you are going to run directly on the panel. This is particularly useful with large teams working on a single Scrapy project, to avoid stepping into each other when making changes to common code.
Here is a concrete example to illustrate how this workflow works:y
git checkout -b issue123
scrapy deploy dev
git checkout master git merge issue123 git pull # make sure to pull latest code before deploying scrapy deploy prod
We recommend you keep your common spiders well-tested and use Spider Contracts extensively to test your final spiders. Otherwise, experience tells us that base spiders end up being copied (instead of reused) out of fear of breaking old spiders that depend on them, thus turning their maintenance into a nightmare.