August 24, 2021

For making a viable Google competitor, I believe that ranking is a harder problem than indexing, but even if we just look at indexing, there are individual domains that contain on the order of one trillion pages we might want to index (like Twitter) and I’d guess that we can find on the order a trillion domains. If you try to configure any off-the-shelf search index to hold an index of some number of trillions of items to handle a load of, say, 1/100th Google’s load, with a latency budget of, say, 100ms (most of the latency should be for ranking, not indexing), I think you’ll find that this isn’t trivial. — I could do that in a weekend!