When I moved my blog from WordPress to Jekyll, I was faced with the problem of letting users search content in it. I didn’t give it much thought and used Google Custom Search Engine. In this post, I’d like to review possible options to search static sites, and review each of them.
Google Custom Search Engine
Integration Google Custom Search Engine into a static website is a multi-step process:
- Create a search engine item
- Configure it through the web interface
- Get the integration code.
Something like:
<script> (function() { var cx = '012662830594289748271:sgyljeirh_k'; var gcse = document.createElement('script'); gcse.type = 'text/javascript'; gcse.async = true; gcse.src = 'https://cse.google.com/cse.js?cx=' + cx; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(gcse, s); })(); </script> <gcse:searchresults-only></gcse:searchresults-only>
- Embed it into the site
The good side is that the implementation process is pretty straightforward. Yet, I had some issues with GCSE:
- The worst thing is that integration options are pretty limited. Only a very specific set of layouts are available. No CSS customization, only colors.
- There’s no way to configure what data is indexed and how it is indexed. Google is the owner of the data.
I wanted to have the search on the homepage, and to display results like my usual posts. Time to check for alternatives.
JSON file
There is another interesting concept: store the content in a structured JSON file, and query it. In Jekyll, at least one plugin implements this idea (i.e. Jekyll Search).
I didn’t select this option because this blog includes more than 400 posts, for over 3 Mo of text. While I didn’t do any performance testing on the solution, I felt it would be sluggish and/or wouldn’t scale well over time.
ElasticSearch
As a backend engineer, I’m pretty familiar with the Elastic stack in general, and more specifically with ElasticSearch. ElasticSearch is exactly what I was searching for: I can index my content offline, then send JSON search request using JavaScript from the site.
Unfortunately, I found no ElasticSearch free tier available. Besides, looking at how token authentication works convinced me it’s meant to be used behind a server application (no simple access from the web).
Algolia
After some more search, I found Algolia.
Algolia works the same way as ElasticSearch: files are indexed beforehand, and it offers a REST API for searches. Compared to the latter, however, it’s designed to be used on the front-end, and provides two different authentication tokens: one dedicated to administrative tasks such as indexing, and one for searching.
Additional benefits of using Algolia include:
- Free
-
There’s a free tier. Though it’s pretty limited, it exists. And if you’re working on a Open Source-related project, you can benefit from the second tier.
- Jekyll plugin
-
Implementing the indexation of your content yourself can be quite a daunting task. Algolia helps you not reinvent the wheel by providing a Jekyll plugin to do just that. With just a few configuration snippets and an additional command in the build file, you’re good to go.
I must admit my first experience with the plugin was far from the best one. I felt like a beta tester, and I probably was. However, the developer behind the plugin is quite reactive and very helpful. Thanks Tim!
Now, the plugin is pretty stable (I like to think I played a role in that, however small it might have been).
- Highlighting
-
When searching full text, it’s important to highlight terms search in the returned results. While it can be done manually, that’s a tedious extra step that can easily be handled server-side. Algolia allows it by inserting tags before and after the search term in the results. Those tags can be configured.
Afterwards, it’s just a matter of applying the correct CSS to get the visual highlight proper.
Setup
Setting up Algolia is a multi-step process:
- Register on the Algolia site
- Create a new application
- In the context of this app, create a new index
- Write down the API keys related to the app
- Configure the plugin gem in your Gemfile:Gemfile
group :jekyll_plugins do gem 'jekyll-asciidoc' gem 'jekyll-algolia' end
- Add it:
bundle install
- Configure the gem accordingly:_config.yml
algolia: application_id: <app_name> search_only_api_key: <search_only_api_key> index_name: <index_name> indexing_batch_size: 500 nodes_to_index: 'p,blockquote,li,dd' extensions_to_index: - adoc
- Set the
ALGOLIA_API_KEY
environment variable:export ALGOLIA_API_KEY=<admin_api_key>
Obviously, this should be part of a Continuous Integration pipeline. For this blog, I use Gitlab CI. - Index content:
jekyll algolia
The index should now be populated with some data. Let’s implement some search.
- Two client libraries are provided by Algolia:
one high-level and one low-level.
Choose the fitting one depending on the level of integration required.
The former was too rigid for my own taste, I decided to go for the later.
- Embed the following snippet into the page:
<script async src="https://code.jquery.com/jquery-3.2.1.min.js"></script> <script async src="https://cdn.jsdelivr.net/npm/algoliasearch@3/dist/algoliasearchLite.min.js"></script> <script async src="https://cdn.jsdelivr.net/npm/algoliasearch-helper@2.24.0/dist/algoliasearch.helper.min.js"></script>
- To send a search request, use the following:
var client = algoliasearch('{{ site.algolia.application_id }}', '{{ algolia.search_only_api_key }}'); var helper = algoliasearchHelper(client, '{{ site.algolia.index_name }}'); helper.setQuery(value); helper.search();
- Results are returned through callbacks.
There’s one callback for when results are returned (as JSON payload) and one if an error occurs:
helper.on('result', content => { // Handle the content and manage DOM accordingly }); helper.on('error', error => { console.error(`A search error occured: ${error.message}`); });
- Here's a sample of the actual JSON payload returned after a successful search.
Conclusion
Many alternatives are available to implement full-text search on a static site. Among those, Algolia is a pretty good choice: it offers a free tier, allows to configure how data is indexed and provides a Jekyll plugin for easier indexing as well as JavaScript libraries to help with the search.