31.08.2019 site-news static-site-generator

My solution to a search on Jekyll

The problem

When you decide for a static site, you have to make amends. You cannot simple log into an admin interface and write an article. You can’t just change the position of a widget to your heart’s desire. But you also gain a lot. For all that I don’t like when it comes to Jekyll , I highly appreciate what I gain and that is security, among other things. I can also make templates a lot easier than with e.g. WordPress. IMHO.

What you give up is simple and not really an epiphany: a searchable database.

I have been searching through the internet a lot and always ended up with the same solutions from others using Jekyll: a local search with JavaScript, with a file with all the text attached as a JavaScript file. To me, that was just not acceptable. It means downloading the whole site at once to users. All articles, all text, if you want a full text search. I’ve thought about just using the excerpt for searching, but that was not feasible as well, since it limits results a lot.

Granted, my site isn’t that big right now, but I have to think ahead. And also about mobile. Mobile users won’t visit, if a simple visit costs too much data. I don’t want to bore you with the details, but one page load costs 1.78 Mb right now (before caching) and takes 17 requests to the server, while everything is zipped (I do have caching rules set up on apache, just to put your mind at ease). Putting the whole site without images into one file, increases the load by around 500 kb right now. And that number would grow with every article on this site. It would also take longer and longer to search it.

So, to me, this sounds like a bad deal.

The solution

I am almost sure, that a database is the best approach to the issue. Even though it means to give up some security, by having dynamic scripts again. However, the setup itself is fairly simple to automate with Jekyll and you don’t need any elaborated scripts.

Here is what I did:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
DROP TABLE IF EXISTS database.data;

CREATE TABLE database.data (
	id_blog INT NOT NULL AUTO_INCREMENT,
	the_date DATE NOT NULL,
	title varchar(150) NOT NULL,
	content TEXT NOT NULL,
	link varchar(100) NOT NULL,
	excerpt TEXT NOT NULL,
	image varchar(50) NOT NULL,
	CONSTRAINT blog_PK PRIMARY KEY (id_blog)
)
ENGINE=InnoDB
DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_general_ci;
CREATE FULLTEXT INDEX blog_body_IDX ON database.data(content);

{% raw %}
{% for post in site.posts %}
INSERT INTO database.data (date, title, content, link, excerpt, image)
VALUES(
    '{{ post.date | date: "%Y-%m-%d" }}',
	'{{ post.title | striphtml | escape }}',
	'{{ post.content | striphtml | escape }}',
    '{{ site.url }}{{ post.url }}',
	'{{ post.excerpt | striphtml | escape }}',
	'{{ post.post_image | striphtml  }}'
);
{% endfor %}
{% endraw %}

You may notice the DROP DATABASE directive. That is part of my automation process. I could make Jekyll to only update this file, with the latest entries, but that would require me keeping track of what is in the database and so on. So, to me, the easiest solution is to drop the table and recreate it. It takes a few seconds, MySQL is very efficient. When the site get’s bigger, I can still add more logic here.

When I push my site to git, it rebuilds it, pushes everything onto my server, feeds the SQL into the database and deletes it. So, stop searching for it :P.

PHP and SQL

I am not going into too much detail, but it is a simple database connection via PDO and searching it, with some simple logic. At the end, it puts out some JSON, resembling a REST API point. I am sure it does not really qualify as a REST service, but it comes close I guess. The gist of the whole thing is this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
    public function searchBody($input)
    {
        $stmt = $this->prepare("
                SELECT title, url, excerpt, date, image FROM data
                WHERE MATCH (content)
                AGAINST (:searchpattern IN NATURAL LANGUAGE MODE)
                ORDER BY postimage ASC
                LIMIT 20;
        ");
        $stmt->bindParam(':searchpattern', $input);
        $stmt->execute();
        $result = $stmt->fetchAll(PDO::FETCH_ASSOC);
        return $result;
    }

The request itself is tricky in terms of Performance , from what you can read on the web, But for now, a full text search is really nice to have. I am certain that by limiting the amount of requests I can achieve the best result of performance versus usability. I have also added some code to look in titles first, then full text, hoping to improve the results.

All this is running in a separate area to improve security and is locked down as hard as I could do it. There are some .htaccess rules to prevent accessing anything on there, except what you are supposed to.

JavaScript

Combining it all together, I am using JavaScript in the forms to perform the actual search and manipulate the DOM to insert them. This way I don’t need PHP to actually render anything:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
function save_keyword($theID) {
    var input = document.getElementById($theID);
    console.log("input.value=" + input.value);
    sessionStorage.setItem("search-text", input.value);
    location.assign(window.location.protocol + "//" + window.location.host + "/search/index.html");
    return false;
}

function load_keyword(){
    var $keyword = sessionStorage.getItem("search-text");
    var $url = "https://rest.tuxstash.de/search.php?search-pattern=" + $keyword;
    console.log($url);
    fetch($url,
        {
            method: 'GET',
            headers: {
            Accept: 'application/json'
        }
    })
    .then(function(response) {
        return response.json();
    })
    .then(function(myJson) {
        console.log(myJson);
        if (myJson.length == 0){
            document.getElementById('progress').style.display = 'none';
            document.getElementById('search-empty').style.display = 'block';
            return;
        }
        else{
            document.getElementById('progress').style.display = 'none';
            document.getElementById('search-results').innerHTML += "<div class=\"row\">";
            myJson.forEach(
                function(obj){
                    document.getElementById('search-results').innerHTML +=  "<div class=\"col s12 m6 l6 dark-primary-color\">" +
                    "<a href=\"" + obj.url + "\">" +
                    "<div class=\"hoverable card\"><div class=\"card-image\">" +
                    "<img src=\"" + window.location.protocol + "//" + window.location.host + obj.postimage + "\"></img>" +
                    "</div><div class=\"card-content default-primary-color primary-text-color\">" +
                    "<p class=\"length-limit\">" + obj.excerpt + "</p>" +
                    "</div><div style=\"padding-left: 0.5vw;\" class=\"default-primary-color primary-text-color\">" +
                    obj.blogdate +
                    "</div><div style=\"padding-bottom: 2px;\" class=\"default-primary-color\"></div></div>" +
                    "</a></div>";
                }
            )
            document.getElementById('search-results').innerHTML += "</div>";
        }
    })
    .catch(function(error){
        console.log("Error fetching JSON :*(. Please report this to contact@tuxstash.de");
        document.getElementById('progress').style.display = 'none';
        document.getElementById('search-empty').style.display = 'block';
        return;
    });
}

The function save_keyword is simply added to the HTML form:

    &lt;form id="search-form" onsubmit="return save_keyword('search-input-footer');"&gt;

It saves the search keyword to the session storage, reloads it once it reaches the search page and then fires off a request to my RESTish service to get the search results. I think it is quite fast and elegant. I don’t have to do anything else than to commit my blog posts and other changes to git. Everything else is taken care of by my scripts. Just like on GitHub pages ;-).

If there is interest in my automation setup, please let me know on Twitter or by mail via contact@tuxstash.de. I am also interested in hearing any flaws you may think about with this setup or any other questions and input you may have. Have fun.

Link to the author's twitter Link to the authors ko-fi page

comments

Characters: 0/1000

gravatar portrait

 Pinned by contact@tuxstash.de

Come join the discussion and write something nice. You will have to confirm your comment by mail, so make sure it is legit and not a throwaway. Only the name part of it will be displayed, so don't worry about spam. If it does not show up after confirming it, it may be considered spam, but I curate them manually, so don't worry. Please read the privacy statement for more.