Strategy, Design & Engineering Partner
Enter the access code shared by your Rigoris team to view the full proposal.
CTC · Medical AI · March 2026
Building your view…
AI Knowledge Base
Last period, Rigoris conducted a detailed discovery into CTC's SharePoint environment, content structure, and workflow patterns. We reviewed the full Projects folder across multiple clients, mapped the internal folder template, analyzed the HubSpot deal history, and studied examples of real medical briefs and content outline templates used by CTC's writers.
What we found changed our thinking significantly, and in a good way. CTC's data is far more structured than it appears at the surface level. Every numbered project folder follows the same nine-subfolder template across all clients. Inside each project, the Medical content folder contains a clean separation between outlines, drafts, and finals. That structure is the foundation of what makes a precise, low-noise retrieval system possible.
ChatGPT Business was integrated with SharePoint via Microsoft's native connector. When a writer asked a question, the system searched recently connected documents and returned an answer based on a small, limited window of content, typically three to five results. Here is what that flow looked like:
Beyond the retrieval ceiling, there are three structural gaps ChatGPT cannot solve. It has no concept of project relationships. It cannot filter by date, client, or therapeutic area. And it cannot generate CTC's specific output format: the Content Outline Template with program overview, section names, topic descriptions, and slide counts.
The proposed system adds two layers of intelligence before any document search happens. This is what fundamentally separates it from ChatGPT's approach. Each layer narrows the search space so the final retrieval is working on a small, highly relevant pool rather than the entire library.
CTC's HubSpot data gives us 830 projects with client name, therapeutic area, service type, and close date. This layer handles structured queries instantly, narrowing the candidate pool to only relevant projects before any document search happens. Fast, cheap, and deterministic.
Recommended tool: PostgreSQL or equivalentA graph database models how things connect. A therapeutic area links to multiple clients. A client links to multiple program types. This layer expands the candidate pool intelligently, surfacing relevant work from adjacent clients and related therapeutic areas that a keyword or semantic search would never find on its own. This is what gives the system genuine institutional memory.
Recommended tool: Graph database (options to be evaluated)Only final deliverables (PPTX files inside each project's 3. Final subfolder) are embedded and searched semantically. Because Layers 1 and 2 have already narrowed the candidate set, RAG works on a small, highly relevant pool rather than the entire library. Each slide deck is processed with its project context preserved so retrieval always includes the metadata needed to understand what the content is.
A large language model sits at the end of the pipeline. It never touches the full library; it only receives the brief and the small set of retrieved documents. It generates a structured content outline in CTC's exact template format: program overview, learning objectives, section names, topic descriptions, and approximate slide counts.
Recommended tool: Claude Sonnet, stronger on structured document output than alternativesThis diagram shows where each data source lives, what our system creates, and how the pieces connect at runtime when a writer submits a brief.
3. Final only3. Final PPTX files from candidate projects are searched semantically against the brief. Top matching slides and decks are retrieved with full project context attached.The full SharePoint library is 4.5TB. The vast majority of that is noise: logistics spreadsheets, design files, invoices, budget trackers, and archived content going back to 2010. Our folder structure analysis identified a precise ingestion path: every numbered project folder across all 75 clients follows the same nine-subfolder template. Only one subfolder is relevant: Medical content / 3. Final.
Medical content / 3. Final onlyThe system uses SharePoint webhooks to stay up to date in real time. Every time a file is added, updated, or deleted inside a monitored folder, SharePoint fires a signal automatically. The ingestion service picks it up and updates the index within minutes, with no manual re-indexing needed for day-to-day changes. A nightly reconciliation job runs independently as a safety net to catch anything the webhook missed.
We have mapped out every edge case and how the system handles each one:
The system is designed to be built in two phases. Phase 1 delivers immediate value with SQL and RAG. Phase 2 adds the graph layer once the core system is proven and in use. Estimated indexable content is 5–15% of total SharePoint storage, likely 200–500GB of actual final deliverables.
| Component | What it does | Phase 1 | Phase 2 |
|---|---|---|---|
| SQL database | Stores project and client metadata copy | $0–25/mo | $0–25/mo |
| Vector database | Stores embedded content for semantic search | $50–100/mo | $50–100/mo |
| LLM API (generation) | Generates content outlines from retrieved context | $30–60/mo | $30–60/mo |
| Hosting (AWS) | App server, webhooks, logging, data transfer | $60–150/mo | $60–150/mo |
| Graph database | Relationship-aware retrieval layer | N/A | $65–100/mo |
| Estimated monthly infrastructure total | $140–335/mo | $205–485/mo | |
Both phases sit well within a $1,000/month infrastructure budget, with Phase 1 running $140–335/mo and Phase 2 at $205–485/mo. Exact figures will be confirmed once a full content inventory is available and specific tooling is selected.
Before finalizing the build plan, we want to align on a few things:
3. Final folders relevant content to index, or only PPTX?Medical content / 3. Final subfolder within each one.Your thoughts
Leave a note
Received.
Your Rigoris team will see this before the next call.