Linen is a Google-searchable Slack alternative for communities. We originally started Linen because we realized that Slack and Discord was a blackhole of information. There was a lot of relevant content that exists inside of these technical communities that the information is being lost. It has been a bit over a year since Linen has started and we have learned a lot about technical channels of making something search engine friendly and we wanted to share a few of our hurdles and learnings.
The first hurdle to making a website search engine friendly is Javascript. The same tool that makes your site realtime and interactive is the same thing that causes problems for search engines. The more javascript you have on your site the harder it is for crawler to navigate. The internet is enormous and even though Google seems like they have unlimited resources even they have to be selective in what they crawl and what they don’t. If two websites has the same quality of content but one is statically rendered and the other one relies a lot on Javascript navigations to get access to the content, Google will choose the one that is cheaper. The first version of Linen we simply rendered all the threads from a Slack community. We used NextJS to handle the server rendering for us.
Although the first version worked fine, our users were asking that they want their non threaded conversations indexed. This ran in to the second hurdle of pagination. Most chat apps are designed with infinite scrolling in mind, because pagination can arbitrarily cut off conversations so a scroll up is a better user experience. However you run into issues around overlapping content for search engines. Ideally a classic number based pagination would work best for search engines. The solution we ended at was two separate pagination styles. One for search engines and one for users. For search engines we came up with a custom pagination style where every 50 messages were grouped together in to a page. For users we ended up with a cursor/time based pagination where we can randomly index into any message/thread based on a timestamp. We then dynamic render based on whether the browser agent is a bot or a user. The downside of this approach is that code maintenance wise is a lot more work since you have two different pagination styles.
As Linen kept growing we ran into a problem of our website wasn’t getting indexed fast enough. We were adding content faster than Google’s crawl budget. A crawl budget is the amount of compute and resource a search engine will dedicate to a specific website. That typically is calculated based on some secret Google algorithm. The way to increase the crawl budget is a long term process of improving the content, quality, reputation, and usability of the website. Most are difficult improve in the short term. The one thing we could control was how efficient we would be for optimizing the crawl budget. The crawl budget is less of crawling number of pages and more of how much compute resources a search engine will be willing to allocate to your site. So if we decrease the resources that it cost to crawl our website we could increase the number of pages crawled. Ultimate we went through and optimized our bundle size and shrunk it by half. (You can read more
here. The result of this was that this increased the number of pages crawled by Google by 40% within 2 weeks.
Beyond these changes we have probably spent several hundred hours of engineering experimenting and optimizing Linen to be search engine friendly. The most frustrating part about getting something on search engines is the slow feedback cycle. We would try to change something and we won’t see results for weeks. If you want to see the code of how this all works come checkout [
https://github.com/Linen-dev/linen.dev](
https://github.com/Linen-dev/linen.dev) or hop in to this Linen community to say hello.