|Accidental updates||2018-04-02T21:11:41+01:00||butlerx||[Docker Postgres]|
Redbrick runs a service called Hackmd. It is web based markdown editor. At the start of the year Hackmd version one came out and in early February we decided to update it.
Unlike most updates in Redbrick this isn’t really a big thing as it runs inside Docker container, with its only external dependency being Postgres 9. We don’t actually have a central a central Postgres database in Redbrick, a problem for another day, just a mysql. So inside another container we run Postgres.
The update process was meant to be simple update the
docker-compose up --build, and wait....
And we waited but hackmd didn’t come back. One tail of the logs later and the problem was obvious Postgres had restarted. But not only that it was trying and failing to upgrade itself to postgres 10.
The Problem was Docker had pulled the latest version of postgres. This was fine when hackmd was first installed the year earlier when all the docker tags pointed to postgres 9, but wasn’t so great after.
The fix was pretty simple add a version tag to for the database image to the
docker-compose.yml. A quick
docker-compose up later and hackmd was
back with lots of new bells and whistles.
So we went through and changed all the other compose files to have tagged versions of dependencies.
Problem Solved. Clean our hands and move on. Well unfortunately not.
Over the next couple of months couple problems were mentioned with hackmd but no one really looked in to them. That was until it affected an admin. We couldn’t publish our roadmap for in coming admins.
So back to the logs and yep database issue. Seems that Postgres cant find some of the keys. Initial thought was that we missed a database migration way back when we upgraded. So we execed in to the container ran the migrations script and .... nothing, there wasn’t any.
Back to the drawing board. We start reviewing configs and double checking against the repo. But everything seems right. Next we decide to go to the heart of the problem the database itself. One docker run and we have a postgresql shell. Start digging though tables, trying to find the missing key when we get a duplicate key error and an alias.
Odd thing was when you looked up the alias there was only one entry for it. We couldn’t delete the duplicate entry as it didn’t exist and couldn’t modify the table entries as we got duplicate key errors.
Bit of googling later and we had a solution. Copy the table, delete it and restore. Is this the Database version of turn it off and on again?
hackmd=# SELECT DISTINCT * INTO notes from "Notes"; SELECT 268 hackmd=# DROP TABLE "Notes"; DROP TABLE hackmd=# ALTER TABLE notes rename to "Notes"; ALTER TABLE hackmd=# REINDEX DATABASE hackmd; REINDEX hackmd=# VACUUM(FULL, ANALYZE, VERBOSE);
What it turns out is the tables index was wrong. While Postgres’ attempt to update itself to 10 had failed it had modified the indexes for some of the tables and reverting the container didn’t magically fix the database inside.
So the tl;dr.