Scrapebase + Permit.io: Web Scraping with Authorization

5 min read • 5/5/2025

24 comments • 147 views

Scrapebase + Permit.io: Web Scraping with Authorization

This is a submission for the Permit.io Authorization Challenge: API-First Authorization Reimagined

What I Built

I built Scrapebase - a web scraping service with tiered access controls that demonstrates API-first authorization using Permit.io. The project separates business logic from authorization concerns using Permit.io's policy-as-code approach.

In many applications, authorization is implemented as an afterthought, resulting in security vulnerabilities and technical debt. Scrapebase demonstrates how to build with authorization as a first-class concern from day one.

Demo
Screenshot: Demo page : Click

Key Features

  • Tiered Service Levels: Free, Pro, and Admin tiers with different capabilities
  • API Key Authentication: Simple authentication using API keys
  • Role-Based Access Control: Permissions managed through Permit.io
  • Domain Blacklist System: Resource-level restrictions for sensitive domains
  • Text Processing: Basic and advanced text processing with role-based restrictions

Role-Based Capabilities

FeatureFree UserPro UserAdmin
Basic Scraping
Advanced Scraping
Text Cleaning
AI Summarization
View Blacklist
Manage Blacklist
Access Blacklisted Domains

Demo

Try it live at: https://scrapebase-permit.up.railway.app/

Test Credentials:

  • Free User: newuser / 2025DEVChallenge
  • Admin: admin / 2025DEVChallenge

Project Repo

Step 1: Clone the repository

Repository: github.com/0xtamizh/scrapebase-permit-IO

https://github.com/0xtamizh/scrapebase-permit-IO.git
cd scrapebase-permit-IO
Enter fullscreen mode
<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24" class="highlight-action crayons-icon highlight-action--fullscreen-off"><title>Exit fullscreen mode</title>
<path d="M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z"></path>

Step 2: Set up Permit.io

  1. Create a free account at Permit.io
  2. Create a new project
  3. Set up:
    • Resource type: website
    • Actions: scrape_basic, scrape_advanced
    • Roles: free_user, pro_user, admin
  4. Configure role permissions as described above
  5. Generate an Environment API key from the dashboard

Step 3: Configure environment variables

Create a .env file in the project root:

# Permit.io
PERMIT_API_KEY=permit_env_YOUR_ENVIRONMENT_KEY

API Keys for different user tiers

FREE_API_KEY=2025DEVChallenge_free PRO_API_KEY=2025DEVChallenge_pro ADMIN_API_KEY=2025DEVChallenge_admin

Optional: For AI summarization

DEEPINFRA_API_KEY=your_deepinfra_key

Server configuration

PORT=8080 NODE_ENV=development

Browser manager settings

MAX_CONCURRENT_REQUESTS=50 REQUEST_TIMEOUT=60000 QUEUE_TIMEOUT=120000

Enter fullscreen mode
<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24" class="highlight-action crayons-icon highlight-action--fullscreen-off"><title>Exit fullscreen mode</title>
<path d="M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z"></path>

Step 4: Install dependencies and run

# Install dependencies
npm install

# Make sure to comment this line in src/utils/browserManager //executablePath: process.env.CHROMIUM_PATH || '/usr/bin/chromium-browser', comment this line so it will use default chromium browser on your device

# Run in development mode npm run dev

Enter fullscreen mode
<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24" class="highlight-action crayons-icon highlight-action--fullscreen-off"><title>Exit fullscreen mode</title>
<path d="M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z"></path>

The server will start on http://localhost:8080

Step 5: Test the application

Using the UI:

  1. Open http://localhost:8080 in your browser
  2. "Log in" using the provided credentials
    • User credentials: newuser / 2025DEVChallenge
    • Admin credentials: admin / 2025DEVChallenge
  3. Toggle between Basic (Free) and Pro plans
  4. Enter a domain to scrape (e.g., example.com)

Using the API directly:

# Test with free user
curl -X POST http://localhost:8080/api/processLinks \
  -H "Content-Type: application/json" \
  -H "x-api-key: 2025DEVChallenge_free" \
  -d '{"url": "https://example.com"}'

# Test with admin user curl -X POST http://localhost:8080/api/processLinks </span> -H "Content-Type: application/json" </span> -H "x-api-key: 2025DEVChallenge_admin" </span> -d '{"url": "https://example.com", "advanced": true}'

# Get blacklist curl http://localhost:8080/api/blacklist </span> -H "x-api-key: 2025DEVChallenge_free"

# Add domain to blacklist (admin only) curl -X POST http://localhost:8080/api/blacklist </span> -H "Content-Type: application/json" </span> -H "x-api-key: 2025DEVChallenge_admin" </span> -d '{"domain": "example.com"}'

Enter fullscreen mode
<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24" class="highlight-action crayons-icon highlight-action--fullscreen-off"><title>Exit fullscreen mode</title>
<path d="M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z"></path>

API-First Authorization

Core Authorization Flow

  1. User sends request with x-api-key header
  2. permitAuth middleware intercepts the request
  3. Middleware maps API key to user role
  4. User is synced to Permit.io
  5. Permission check runs against Permit.io cloud PDP
  6. Request is allowed or denied based on policy decision
┌──────────┐    ┌───────────────┐    ┌────────────┐    ┌──────────────┐
│  Client  │───▶│ Scrapebase API│───▶│permitAuth  │───▶│  Permit.io   │
│          │◀───│               │◀───│ middleware │◀───│  Cloud PDP   │
└──────────┘    └───────────────┘    └────────────┘    └──────────────┘
     │                                                        ▲
     │                                                        │
     └────────────────────────────────────────────────────────┘
       Permission policies defined in Permit.io dashboard
Enter fullscreen mode
<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24" class="highlight-action crayons-icon highlight-action--fullscreen-off"><title>Exit fullscreen mode</title>
<path d="M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z"></path>

Implementation

The permitAuth middleware handles both role assignment and permission enforcement:

// Role assignment based on API key
switch (apiKey) {
  case process.env.ADMIN_API_KEY:
    userKey = '2025DEVChallenge_admin';
    tier = 'admin';
    break;
  // ...other keys
}

// User sync and permission check await permit.api.syncUser({ key: userKey, email: </span><span class="p">${</span><span class="nx">userKey</span><span class="p">}</span><span class="s2">@scrapebase.xyz, attributes: { tier, roles: [tier] } });

const permissionCheck = await permit.check(user.key, action, 'website');

Enter fullscreen mode
<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24" class="highlight-action crayons-icon highlight-action--fullscreen-off"><title>Exit fullscreen mode</title>
<path d="M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z"></path>

Dashboard Configuration

For permissions to work correctly, you must configure roles and their allowed actions in the Permit.io dashboard:

  1. Create resource type website
  2. Create actions scrape_basic and scrape_advanced
  3. Create roles free_user, pro_user, and admin
  4. Assign permissions to roles:
    • free_user: Can scrape_basic on website
    • pro_user: Can scrape_basic and scrape_advanced on website
    • admin: Can do everything on website

Configuring resource types and actions in Permit.io dashboard

Dashboard:Resource
Setting up role-based permissions for different user tiers

Dashboard:Roles
Managing users and their role assignments

Dashboard:Users

Troubleshooting -> Check repo README

Challenges Faced

Cloud PDP Limitations

Initially, I tried implementing Attribute-Based Access Control (ABAC) by passing resource attributes:

// This DIDN'T work with cloud PDP
const resource = {
  type: 'website',
  key: hostname,
  attributes: {
    is_blacklisted: isBlacklistedDomain
  }
};

const permissionCheck = await permit.check(user.key, action, resource);

Enter fullscreen mode
<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24" class="highlight-action crayons-icon highlight-action--fullscreen-off"><title>Exit fullscreen mode</title>
<path d="M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z"></path>

The cloud PDP returned 501 errors because it only supports basic RBAC. I had to simplify to a pure RBAC approach:

// This works with cloud PDP
const permissionCheck = await permit.check(user.key, action, resourceType);
Enter fullscreen mode
<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24" class="highlight-action crayons-icon highlight-action--fullscreen-off"><title>Exit fullscreen mode</title>
<path d="M18 7h4v2h-6V3h2v4zM8 9H2V7h4V3h2v6zm10 8v4h-2v-6h6v2h-4zM8 15v6H6v-4H2v-2h6z"></path>

My Journey

Why I Built This

Traditional approaches to authorization often result in permission checks scattered throughout application code, creating maintenance nightmares and security risks. I created Scrapebase to demonstrate how modern applications can embrace externalized authorization as a core architectural principle.

Scrapebase isn't just another CRUD app – it tackles a real-world use case (web scraping) with meaningful access control requirements:

  1. Tiered service levels that mirror SaaS subscription models
  2. Administrative functions that require elevated permissions
  3. Resource-based restrictions through the domain blacklist system

What I Learned

Building Scrapebase with Permit.io taught me how to:

  1. Technical Benefits

    • Separation of authorization from business logic
    • External policy management without code changes
    • Scalable from RBAC to ABAC
  2. Business Benefits

    • Non-developers can manage permissions
    • Centralized policy management
    • Better security through consistent enforcement
  3. Developer Experience

    • Cleaner codebase
    • Focus on core features
    • Better maintainability

Why Permit.io Works for SaaS

Permit.io is ideal for SaaS applications because it:

  1. Centralizes policy management outside your codebase
  2. Provides a dashboard for non-developers to configure permissions
  3. Scales from simple RBAC to complex ABAC as your needs grow
  4. Offers audit logs for compliance and debugging

This externalized approach enables business stakeholders to manage authorization policies directly through the Permit.io dashboard, while developers focus on building features - the hallmark of a well-designed API-first authorization system.

Future Improvements

With more time, I would:

  1. Set up a local PDP to enable ABAC with resource attributes
  2. Implement tenant isolation for multi-tenant support
  3. Add UI components in the admin dashboard to view permission audit logs
  4. Create more granular roles and permissions beyond the three tiers
  5. Add a user management section to assign roles through the UI

By implementing these controls through Permit.io rather than hardcoding them, Scrapebase demonstrates how authorization can be managed through declarative policies instead of imperative code – fulfilling the promise of truly API-first authorization.