Ibiyemi Abiodun
« main pageDitto (v1)
Social media search that tells you the whole storyJanuary 2023 to April 2023
Ditto was my first try at entrepreneurship after I graduated from university. A cofounder and I constructed a new search engine that harnessed the power of large language models and web scraping to provide a more complete picture of what’s going on on the web.
It would collect posts from different social media platforms, detect trends, and piece the posts together using an LLM to generate a complete explanation of what’s going on in each corner of the internet.
For this project, I built four key pieces of software: the website, written in
TypeScript with Astro, a web-scraping demon daemon called escraper
, a data
ingestion tool called swallow
, and a browser extension.
Every 8 hours, escraper
would crawl the trending pages of Twitter and Reddit
(and later TikTok, thanks to my research into reversing TikTok’s
APIs) and index everything it found into Apache
Parquet files. These Parquet files were dumped into an AWS S3 bucket, where they
were periodically picked up by swallow
.
From there, swallow
would calculate sentence embeddings and key phrases for
each post and then dump the posts into a Postgres database as fast as possible.
The Postgres database was the backend for our search, which showed results in
our browser extension whenever our users performed an internet search or a
search on ditto.fyi
.
To make that work, I had to get very intimate with PostgreSQL and learn about
its COPY
function. This is the fastest way to insert data into a Postgres
database, as it defers the transaction overhead associated with INSERT
s until
after you are done streaming in all of the data.
escraper
replaced a lightweight Python scraper written by my cofounder. It was
able to process hundreds of pages in parallel and compress the results into
Parquet format in real time on tiny resource-constrained containers, which saved
us a ton of money (at first, we used Render to run our periodic tasks instead of
EC2 for ease of deployment, and Render doesn’t give out free credits to
startups).
swallow
used the precise memory management features available to Rust users to
allow sustained GPU processing in a way that was not feasible in the Python
prototype that it replaced - after a while, the Python version would always run
out of GPU memory because PyTorch doesn’t free all of it until the process is
killed.
The pivot
So if all of this technology was humming along, then why is this article titled “Ditto (v1)”?
The answer is that we could not prove that this idea would make money. Would anyone be interested in paying for this premium search experience that gives them the real tea in a real digestible format? Maybe.
Failing to raise funding on this idea was my first reality check in the world of entrepreneurship. And taking months to validate the idea was a mistake! It took a while for me to transition from my previous style of making software as perfect as I could before showing it to anyone, to a style where I show off prototypes just to gather feedback.
But we still had some runway left, so we pivoted to Ditto (v2), where the story continues.