Ibiyemi Abiodun

« main page

Ditto (v1)

Social media search that tells you the whole story

January 2023 to April 2023

rustawsreactastropostgresqlelasticsearch

Ditto was my first try at entrepreneurship after I graduated from university. A cofounder and I constructed a new search engine that harnessed the power of large language models and web scraping to provide a more complete picture of what’s going on on the web.

It would collect posts from different social media platforms, detect trends, and piece the posts together using an LLM to generate a complete explanation of what’s going on in each corner of the internet.

For this project, I built four key pieces of software: the website, written in TypeScript with Astro, a web-scraping demon daemon called escraper, a data ingestion tool called swallow, and a browser extension.

Every 8 hours, escraper would crawl the trending pages of Twitter and Reddit (and later TikTok, thanks to my research into reversing TikTok’s APIs) and index everything it found into Apache Parquet files. These Parquet files were dumped into an AWS S3 bucket, where they were periodically picked up by swallow.

From there, swallow would calculate sentence embeddings and key phrases for each post and then dump the posts into a Postgres database as fast as possible. The Postgres database was the backend for our search, which showed results in our browser extension whenever our users performed an internet search or a search on ditto.fyi.

To make that work, I had to get very intimate with PostgreSQL and learn about its COPY function. This is the fastest way to insert data into a Postgres database, as it defers the transaction overhead associated with INSERTs until after you are done streaming in all of the data.

escraper replaced a lightweight Python scraper written by my cofounder. It was able to process hundreds of pages in parallel and compress the results into Parquet format in real time on tiny resource-constrained containers, which saved us a ton of money (at first, we used Render to run our periodic tasks instead of EC2 for ease of deployment, and Render doesn’t give out free credits to startups).

swallow used the precise memory management features available to Rust users to allow sustained GPU processing in a way that was not feasible in the Python prototype that it replaced - after a while, the Python version would always run out of GPU memory because PyTorch doesn’t free all of it until the process is killed.

The pivot

So if all of this technology was humming along, then why is this article titled “Ditto (v1)“?

The answer is that we could not prove that this idea would make money. Would anyone be interested in paying for this premium search experience that gives them the real tea in a real digestible format? Maybe.

Failing to raise funding on this idea was my first reality check in the world of entrepreneurship. And taking months to validate the idea was a mistake! It took a while for me to transition from my previous style of making software as perfect as I could before showing it to anyone, to a style where I show off prototypes just to gather feedback.

But we still had some runway left, so we pivoted to Ditto (v2), where the story continues.