I eventually want to create an MCP server that gives AI models context on everything I’ve published online so that I can easily find things that I’ve posted. I currently have a Kagi filter that does this pretty well, but since my portion of the web isn’t very popular it usually has pretty stale data, and its hard to extend with content that isn’t under the ansonbiggs.com domain.
I really blame search engines in general for this, but I find that I often know where to find info, and never remember the actual contents. Overall this is fine, I just would like an easy way to find previous work so I don’t have to think too hard that the random Julia bug I’m hitting I posted about on Discourse in 2019 while I was working on Simulation Software for Rockets.
The first step to doing this is to see if I can get meaningful results off a small dataset. This website is just markdown files which AI really like to speak, and the content is easy to grab with Python making it easy to play with. My understanding is that the correct way to let models search is with this layout:
flowchart TD A[Input Files] --> B[Generate File Embeddings] B --> C[Store Embeddings in Vector DB] D[User Prompt] --> E[Generate Prompt Embedding] E --> F[Search Similar Embeddings] C --> F F --> G[Retrieve Top _n_ Matches] G --> H[Format Retrieved Data] H --> I[Send to Chat Model] D --> I I --> J[Generate Response] style A fill:#e1f5fe style D fill:#e8f5e8 style I fill:#fff3e0 style J fill:#f3e5f5
Which looks like a lot of steps but it isn’t that bad. I think the hardest part to understand is Embedding Models which I explain in that note, and Vector DBs which are just storage for the embedding results.
The code isn’t too bad
Import libraries that do all the actual work
import ollama # For managing models
import chromadb # Vector DB to store models
from pathlib import Path # The best library for working with filesystems
import tqdm # For status bars, since some embedding models can be slow, so its nice to see progress happening
Grab all the file contents
vault = Path("/Users/ansonbiggs/Library/Mobile Documents/iCloud~md~obsidian/Documents/brain/")
def get_md_contents(folder_path):
md_files = []
for md_file in folder_path.rglob("*.md"):
if contents := md_file.read_text(encoding='utf-8'):
md_files.append(contents)
return list(md_files)
documents = get_md_contents(vault)
An improvement I want to make in the final version is giving the model the source of the content so that it can do footnote links to it, but for now I’m just doing it the straightforward way.
Embed and Store
I straight up copied this code from the Ollama blog: https://ollama.com/blog/embedding-models
This embeds each of the files we read before, and stores them.
client = chromadb.Client()
collection = client.create_collection(name="docs")
# store each document in a vector embedding database
for i, d in enumerate(tqdm.tqdm(documents)):
response = ollama.embed(model="mxbai-embed-large", input=d)
embeddings = response["embeddings"]
collection.add(
ids=[str(i)],
embeddings=embeddings,
documents=[d]
)
Embed our prompt
prompt = "What does Anson currently do for work?" # Ask our question
# generate an embedding for the input and retrieve the most relevant docs
response = ollama.embed(
model="mxbai-embed-large",
input=prompt
)
results = collection.query(
query_embeddings=response["embeddings"],
n_results=3 # 3 seemed to work best, but I think you could get a lot smarter about how many you provide the chat model
)
data = results['documents'][0]
Ask a chat model
This is where the Prompt Engineering comes in. Properly formatting the context, and giving good instructions would be a huge part of making this project work correctly. llama3.1 really wants to complain about getting bad context and be meta (no pun intended).
output = ollama.generate(
model="llama3.1:8b",
prompt=f"""You are an expert assistant analyzing relevant information to answer questions.
CONTEXT INFORMATION:
{data_formatted}
QUESTION: {prompt}
INSTRUCTIONS:
- Base your answer primarily on the provided context
- If the context doesn't contain enough information, clearly state this
- Cite specific details from the context when possible
- Provide a clear, well-structured response
ANSWER:"""
)
print(output['response'])
The responses are in these callouts:
llama3.1:8b
ResponseBased on the provided context, Anson currently works at Shield AI, building autonomy for a better world. This information can be found under the “DMV” section of the geospatial timeline.
Specifically, it states:
DMV Currently working at Shield AI, to build autonomy for a better world.
This directly answers the question about what Anson currently does for work.
Claude 4 Sonnet Response
Based on the provided context, Anson currently works at Shield AI, where he is building autonomy for a better world.
The context indicates he is currently located in the DMV area (Washington D.C. metropolitan region) for this role. His previous experience includes working at Blue Origin on a lunar lander project supporting NASA, and at ULA on the Vulcan rocket, demonstrating a background in aerospace engineering.
For the most up-to-date information about his current role, the context suggests checking his resume at https://ansonbiggs.com/resume.
I think Claude answer is magnitudes better, but why pay Anthropic when Ollama makes it so easy to run locally on my M4 Macbook Pro?
Future Work
I want to parse the data better so that not only is the filename/location part of the embedding, but so that the chat model gets that data and can reference it for the user. I also think I just want this to be an MCP server so that whatever models I’m using at the time can leverage the data, and I don’t have to worry about whatever hardware ends up hosting this being capable of running a chat model quickly.
I also want to have this setup so that I can easily add new websites and profiles on websites. It wouldn’t be too hard to scrape everything under ansonbiggs.com (including subdomains), but I also would like to maybe be able to handle my Bluesky posts, or activity on random Discourse forums.