we screw up sometimes but they're usually fun stories
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

2018-04-02.md 3.4 KiB

3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
3 年之前
  1. ---
  2. title: Accidental updates
  3. date: 2018-04-02T21:11:41+01:00
  4. author: butlerx
  5. twitter: cianbutlerx
  6. description: Why you need to lock versions
  7. tags:
  8. - Docker
  9. - PostgrSQL
  10. ---
  11. # Why you need to lock versions
  12. ## The Preamble
  13. Redbrick runs a service called [HackMD](https://md.redbrick.dcu.ie). It is
  14. web-based markdown editor. At the start of the year HackMD version one came out
  15. and in early February we decided to update it.
  16. Unlike most updates in Redbrick, this isn't really a big thing as it runs inside
  17. Docker container, with its only external dependency being PostgreSQL 9. We don't
  18. actually have a central a central PostgreSQL database in Redbrick, a problem for
  19. another day, just a MySQL. So inside another container, we run PostgreSQL.
  20. The update process was meant to be simple update the `Dockerfile`, run
  21. `docker-compose up --build` and wait...
  22. ## What Actually Went Wrong
  23. And we waited but HackMD didn't come back. One tail of the logs later and the
  24. problem was obvious PostgreSQL had restarted. But not only that it was trying
  25. and failing to upgrade itself to PostgreSQL 10.
  26. The Problem was Docker had pulled the latest version of PostgreSQL. This was
  27. fine when HackMD was first installed the year earlier when all the docker tags
  28. pointed to PostgreSQL 9 but weren't so great after.
  29. ## The Fix
  30. The fix was pretty simple to add a version tag to for the database image to the
  31. `docker-compose.yml`. A quick `vim` and `docker-compose up` later and HackMD was
  32. back with lots of new bells and whistles.
  33. So we went through and changed all the other compose files to have tagged
  34. versions of dependencies.
  35. Problem Solved. Clean our hands and move on. Well unfortunately not.
  36. ## The Solution
  37. Over the next couple of months, a couple problems were mentioned with HackMD but
  38. no one really looked into them. That was until it affected an admin. We couldn't
  39. publish our roadmap for in coming admins.
  40. So back to the logs and yep database issue. Seems that PostgreSQL cant find some
  41. of the keys. An initial thought was that we missed a database migration way back
  42. when we upgraded. So we `exec`ed into the container ran the migrations script
  43. and ...nothing, there wasn't any.
  44. Back to the drawing board. We start reviewing configs and double checking
  45. against the repo. But everything seems right. Next, we decide to go to the heart
  46. of the problem the database itself. One docker run and we have a PostgreSQL
  47. shell. Start digging through tables, trying to find the missing key when we get
  48. a duplicate key error and an alias.
  49. BINGO
  50. The odd thing was when you looked up the alias there was only one entry for it.
  51. We couldn't delete the duplicate entry as it didn't exist and couldn't modify
  52. the table entries as we got duplicate key errors.
  53. Bit of googling later and we had a solution. Copy the table, delete it and
  54. restore. Is this the Database version of turn it off and on again?
  55. ```sql
  56. hackmd=# SELECT DISTINCT * INTO notes from "Notes";
  57. SELECT 268
  58. hackmd=# DROP TABLE "Notes";
  60. hackmd=# ALTER TABLE notes rename to "Notes";
  62. hackmd=# REINDEX DATABASE hackmd;
  65. ```
  66. What it turns out is the table's index was wrong. While PostgreSQL's attempt to
  67. update itself to 10 had failed it had modified the indexes for some of the
  68. tables and reverting the container didn't magically fix the database inside.
  69. So the TL;DR.
  70. * Always lock your container version
  71. * containers don't magically fix things
  72. * And validate your database after modifying it