Skip to Content
⚠️Active Development Notice: TimeTiles is under active development. Information may be placeholder content or not up-to-date.

Scrapers

Write short Python or Node.js scripts that fetch data from websites and APIs, produce CSV output, and feed it into the TimeTiles import pipeline. Events appear on the map without manual file handling.

Scraper functionality requires the enableScrapers feature flag to be enabled by an admin. See Self-Hosting > Configuration for setup.

How It Works

Your script (Python or Node.js) → Runs in an isolated Podman container → Produces CSV output → Feeds into the standard import pipeline → Events appear on the map

The scraper system has two parts:

ComponentWhat it does
TimeScrape Runner (apps/timescrape)Executes scripts in hardened containers. Stateless — no database access.
Scraper Management (apps/web)Repos, scheduling, quotas, and import pipeline integration.

Quick Start

1. Scaffold a scraper

npx @timetiles/scraper init my-scraper # Python (default) npx @timetiles/scraper init my-scraper --runtime node

2. Write your script

import requests from timetiles.scraper import output response = requests.get( "https://date.nager.at/api/v3/PublicHolidays/2026/DE", timeout=30, ) response.raise_for_status() for holiday in response.json(): output.write_row({ "title": holiday["localName"], "date": holiday["date"], "location": "Germany", "description": holiday.get("name", ""), }) output.save()

3. Create a manifest

Add a scrapers.yml at the root of your repo:

scrapers: - name: "German Holidays" slug: german-holidays runtime: python entrypoint: scraper.py output: data.csv schedule: "0 6 * * 1" # Every Monday at 06:00 UTC

4. Register the repo

In the dashboard, go to Scrapers > Scraper Repos and create a new repo. Point it at your Git repository or paste code directly.

5. Run it

Trigger manually from the dashboard, via API, or let the cron schedule handle it:

curl -X POST https://your-instance.com/api/scrapers/{id}/run \ -H "Authorization: Bearer YOUR_TOKEN"

If autoImport is enabled and a targetDataset is configured, the CSV flows through the standard import pipeline automatically.

The scrapers.yml Manifest

scrapers: - name: "My Scraper" slug: my-scraper runtime: python # or "node" entrypoint: scraper.py output: data.csv # default schedule: "0 6 * * *" # optional cron expression limits: timeout: 120 # seconds (10-3600, default 300) memory: 256 # MB (64-4096, default 512) defaults: # optional, applied to all scrapers runtime: python limits: timeout: 120 memory: 256

Helper Libraries

Python (timetiles.scraper)

from timetiles.scraper import output output.write_row({"title": "Event", "date": "2026-01-01", "location": "Berlin"}) output.write_rows([...]) # or multiple at once output.save() # required — writes CSV print(output.row_count)

Pre-installed: requests, beautifulsoup4, lxml, pandas, cssselect.

Node.js (@timetiles/scraper)

import { output } from "@timetiles/scraper"; output.writeRow({ title: "Event", date: "2026-01-01", location: "Berlin" }); output.save(); console.log(output.rowCount);

Pre-installed: cheerio, axios.

Scheduling

Set the schedule field in scrapers.yml to a standard five-field cron expression:

ExpressionMeaning
0 6 * * *Every day at 06:00 UTC
0 6 * * 1Every Monday at 06:00 UTC
0 */6 * * *Every 6 hours
0 0 1 * *First of every month at midnight

Scrapers can also be triggered via webhook — enable it in scraper settings to get a unique URL.

Source Types

TypeHow code is providedStorage
GitHTTPS URL + branchExternal hosting (GitHub, GitLab)
UploadJSON map of filenames to contentPayload database

For Git repos, the runner does a shallow clone at execution time. For uploads, code is sent directly to the runner.

Quotas

Scraper access requires trust level 3 (Trusted) or higher:

Trust LevelReposRuns/Day
Trusted (3)310
Power User (4)1050
Unlimited (5)UnlimitedUnlimited

Management

Users manage scrapers at /account/scrapers:

  • View repos with sync status
  • See scrapers with last run status and statistics
  • Force sync, trigger runs, delete repos
  • Expand run history with stdout/stderr logs

Further Reading

Last updated on