# knowd 🧠

Personal knowledge base for AI agents. Save web pages, search them semantically.

No UI — just conversation. Tell your agent "save this URL" and it extracts, chunks, embeds, and stores it. Ask "what did I save about X?" and it finds the most relevant passages.

## Features

- **Multi-provider embeddings** — OpenAI, Voyage AI, Cohere, Jina, or Ollama (local/free)
- **Semantic search** — cosine similarity over embedded chunks
- **Content extraction** — trafilatura handles the messy web
- **SQLite storage** — single file, no external services
- **Provider lock-in protection** — DB tracks which provider was used; can't mix incompatible embeddings

## Install

### As an OpenClaw / Clawdbot skill

```bash
# From Skills N'at
# Visit https://skills-nat.vercel.app and install via repo URL

# Or manually
git clone https://github.com/ianpcook/knowd-skill.git skills/knowd
pip3 install -r skills/knowd/requirements.txt
```

### Standalone

```bash
git clone https://github.com/ianpcook/knowd-skill.git
cd knowd-skill
pip3 install -r requirements.txt
python3 scripts/knowd.py --help
```

## Setup

Set at least one embedding provider API key:

| Provider | Env Variable | Model |
|----------|-------------|-------|
| OpenAI (default) | `OPENAI_API_KEY` | text-embedding-3-small |
| Voyage AI | `VOYAGE_API_KEY` | voyage-3-lite |
| Cohere | `COHERE_API_KEY` | embed-v4 |
| Jina | `JINA_API_KEY` | jina-embeddings-v3 |
| Ollama (local) | — | nomic-embed-text |

## Usage

```bash
# Save a URL
python3 scripts/knowd.py save "https://example.com/article"

# Search your knowledge
python3 scripts/knowd.py search "machine learning best practices" -k 5

# List saved sources
python3 scripts/knowd.py list

# Stats
python3 scripts/knowd.py stats

# Delete
python3 scripts/knowd.py delete "https://example.com/article"

# List available providers
python3 scripts/knowd.py providers
```

## How it works

1. **Fetch** — downloads the page, extracts readable text via trafilatura
2. **Chunk** — splits into ~500-token overlapping chunks at sentence boundaries
3. **Embed** — sends chunks to your chosen embedding provider
4. **Store** — SQLite with binary embeddings (no vector DB dependency)
5. **Search** — embeds your query, computes cosine similarity against all chunks

## License

MIT
