Scrapebase + Permit.io: Web Scraping with Authorization
5 min read • 5/5/2025
24 comments • 147 views

This is a submission for the Permit.io Authorization Challenge: API-First Authorization Reimagined
What I Built
I built Scrapebase - a web scraping service with tiered access controls that demonstrates API-first authorization using Permit.io. The project separates business logic from authorization concerns using Permit.io's policy-as-code approach.
In many applications, authorization is implemented as an afterthought, resulting in security vulnerabilities and technical debt. Scrapebase demonstrates how to build with authorization as a first-class concern from day one.
Screenshot: Demo page : Click
Key Features
- Tiered Service Levels: Free, Pro, and Admin tiers with different capabilities
- API Key Authentication: Simple authentication using API keys
- Role-Based Access Control: Permissions managed through Permit.io
- Domain Blacklist System: Resource-level restrictions for sensitive domains
- Text Processing: Basic and advanced text processing with role-based restrictions
Role-Based Capabilities
Feature | Free User | Pro User | Admin |
---|---|---|---|
Basic Scraping | ✅ | ✅ | ✅ |
Advanced Scraping | ❌ | ✅ | ✅ |
Text Cleaning | ✅ | ✅ | ✅ |
AI Summarization | ❌ | ✅ | ✅ |
View Blacklist | ✅ | ✅ | ✅ |
Manage Blacklist | ❌ | ❌ | ✅ |
Access Blacklisted Domains | ❌ | ❌ | ✅ |
Demo
Try it live at: https://scrapebase-permit.up.railway.app/
Test Credentials:
- Free User:
newuser
/2025DEVChallenge
- Admin:
admin
/2025DEVChallenge
Project Repo
Step 1: Clone the repository
Repository: github.com/0xtamizh/scrapebase-permit-IO
https://github.com/0xtamizh/scrapebase-permit-IO.git
cd scrapebase-permit-IO
<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24" class="highlight-action crayons-icon highlight-action--fullscreen-off"><title>Exit fullscreen mode</title>
<path d="M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z"></path>
Step 2: Set up Permit.io
- Create a free account at Permit.io
- Create a new project
- Set up:
- Resource type:
website
- Actions:
scrape_basic
,scrape_advanced
- Roles:
free_user
,pro_user
,admin
- Resource type:
- Configure role permissions as described above
- Generate an Environment API key from the dashboard
Step 3: Configure environment variables
Create a .env
file in the project root:
# Permit.io
PERMIT_API_KEY=permit_env_YOUR_ENVIRONMENT_KEY
API Keys for different user tiers
FREE_API_KEY=2025DEVChallenge_free
PRO_API_KEY=2025DEVChallenge_pro
ADMIN_API_KEY=2025DEVChallenge_admin
Optional: For AI summarization
DEEPINFRA_API_KEY=your_deepinfra_key
Server configuration
PORT=8080
NODE_ENV=development
Browser manager settings
MAX_CONCURRENT_REQUESTS=50
REQUEST_TIMEOUT=60000
QUEUE_TIMEOUT=120000
<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24" class="highlight-action crayons-icon highlight-action--fullscreen-off"><title>Exit fullscreen mode</title>
<path d="M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z"></path>
Step 4: Install dependencies and run
# Install dependencies
npm install
# Make sure to comment this line in src/utils/browserManager
//executablePath: process.env.CHROMIUM_PATH || '/usr/bin/chromium-browser', comment this line so it will use default chromium browser on your device
# Run in development mode
npm run dev
<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24" class="highlight-action crayons-icon highlight-action--fullscreen-off"><title>Exit fullscreen mode</title>
<path d="M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z"></path>
The server will start on http://localhost:8080
Step 5: Test the application
Using the UI:
- Open http://localhost:8080 in your browser
- "Log in" using the provided credentials
- User credentials:
newuser
/2025DEVChallenge
- Admin credentials:
admin
/2025DEVChallenge
- User credentials:
- Toggle between Basic (Free) and Pro plans
- Enter a domain to scrape (e.g., example.com)
Using the API directly:
# Test with free user
curl -X POST http://localhost:8080/api/processLinks \
-H "Content-Type: application/json" \
-H "x-api-key: 2025DEVChallenge_free" \
-d '{"url": "https://example.com"}'
# Test with admin user
curl -X POST http://localhost:8080/api/processLinks </span>
-H "Content-Type: application/json" </span>
-H "x-api-key: 2025DEVChallenge_admin" </span>
-d '{"url": "https://example.com", "advanced": true}'
# Get blacklist
curl http://localhost:8080/api/blacklist </span>
-H "x-api-key: 2025DEVChallenge_free"
# Add domain to blacklist (admin only)
curl -X POST http://localhost:8080/api/blacklist </span>
-H "Content-Type: application/json" </span>
-H "x-api-key: 2025DEVChallenge_admin" </span>
-d '{"domain": "example.com"}'
<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24" class="highlight-action crayons-icon highlight-action--fullscreen-off"><title>Exit fullscreen mode</title>
<path d="M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z"></path>
API-First Authorization
Core Authorization Flow
- User sends request with
x-api-key
header -
permitAuth
middleware intercepts the request - Middleware maps API key to user role
- User is synced to Permit.io
- Permission check runs against Permit.io cloud PDP
- Request is allowed or denied based on policy decision
┌──────────┐ ┌───────────────┐ ┌────────────┐ ┌──────────────┐
│ Client │───▶│ Scrapebase API│───▶│permitAuth │───▶│ Permit.io │
│ │◀───│ │◀───│ middleware │◀───│ Cloud PDP │
└──────────┘ └───────────────┘ └────────────┘ └──────────────┘
│ ▲
│ │
└────────────────────────────────────────────────────────┘
Permission policies defined in Permit.io dashboard
<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24" class="highlight-action crayons-icon highlight-action--fullscreen-off"><title>Exit fullscreen mode</title>
<path d="M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z"></path>
Implementation
The permitAuth
middleware handles both role assignment and permission enforcement:
// Role assignment based on API key
switch (apiKey) {
case process.env.ADMIN_API_KEY:
userKey = '2025DEVChallenge_admin';
tier = 'admin';
break;
// ...other keys
}
// User sync and permission check
await permit.api.syncUser({
key: userKey,
email: </span><span class="p">${</span><span class="nx">userKey</span><span class="p">}</span><span class="s2">@scrapebase.xyz
,
attributes: { tier, roles: [tier] }
});
const permissionCheck = await permit.check(user.key, action, 'website');
<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24" class="highlight-action crayons-icon highlight-action--fullscreen-off"><title>Exit fullscreen mode</title>
<path d="M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z"></path>
Dashboard Configuration
For permissions to work correctly, you must configure roles and their allowed actions in the Permit.io dashboard:
- Create resource type
website
- Create actions
scrape_basic
andscrape_advanced
- Create roles
free_user
,pro_user
, andadmin
- Assign permissions to roles:
-
free_user
: Canscrape_basic
onwebsite
-
pro_user
: Canscrape_basic
andscrape_advanced
onwebsite
-
admin
: Can do everything onwebsite
-
Configuring resource types and actions in Permit.io dashboard
Setting up role-based permissions for different user tiers
Managing users and their role assignments
Troubleshooting -> Check repo README
Challenges Faced
Cloud PDP Limitations
Initially, I tried implementing Attribute-Based Access Control (ABAC) by passing resource attributes:
// This DIDN'T work with cloud PDP
const resource = {
type: 'website',
key: hostname,
attributes: {
is_blacklisted: isBlacklistedDomain
}
};
const permissionCheck = await permit.check(user.key, action, resource);
<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24" class="highlight-action crayons-icon highlight-action--fullscreen-off"><title>Exit fullscreen mode</title>
<path d="M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z"></path>
The cloud PDP returned 501 errors because it only supports basic RBAC. I had to simplify to a pure RBAC approach:
// This works with cloud PDP
const permissionCheck = await permit.check(user.key, action, resourceType);
<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24" class="highlight-action crayons-icon highlight-action--fullscreen-off"><title>Exit fullscreen mode</title>
<path d="M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z"></path>
My Journey
Why I Built This
Traditional approaches to authorization often result in permission checks scattered throughout application code, creating maintenance nightmares and security risks. I created Scrapebase to demonstrate how modern applications can embrace externalized authorization as a core architectural principle.
Scrapebase isn't just another CRUD app – it tackles a real-world use case (web scraping) with meaningful access control requirements:
- Tiered service levels that mirror SaaS subscription models
- Administrative functions that require elevated permissions
- Resource-based restrictions through the domain blacklist system
What I Learned
Building Scrapebase with Permit.io taught me how to:
-
Technical Benefits
- Separation of authorization from business logic
- External policy management without code changes
- Scalable from RBAC to ABAC
-
Business Benefits
- Non-developers can manage permissions
- Centralized policy management
- Better security through consistent enforcement
-
Developer Experience
- Cleaner codebase
- Focus on core features
- Better maintainability
Why Permit.io Works for SaaS
Permit.io is ideal for SaaS applications because it:
- Centralizes policy management outside your codebase
- Provides a dashboard for non-developers to configure permissions
- Scales from simple RBAC to complex ABAC as your needs grow
- Offers audit logs for compliance and debugging
This externalized approach enables business stakeholders to manage authorization policies directly through the Permit.io dashboard, while developers focus on building features - the hallmark of a well-designed API-first authorization system.
Future Improvements
With more time, I would:
- Set up a local PDP to enable ABAC with resource attributes
- Implement tenant isolation for multi-tenant support
- Add UI components in the admin dashboard to view permission audit logs
- Create more granular roles and permissions beyond the three tiers
- Add a user management section to assign roles through the UI
By implementing these controls through Permit.io rather than hardcoding them, Scrapebase demonstrates how authorization can be managed through declarative policies instead of imperative code – fulfilling the promise of truly API-first authorization.