we screw up sometimes but they're usually fun stories
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

2018-04-02.md 3.3 KiB

3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
  1. ---
  2. title: Accidental updates
  3. date: 2018-04-02T21:11:41+01:00
  4. author: butlerx
  5. tags:
  6. - Docker
  7. - Postgres
  8. ---
  9. # Why you need to lock versions
  10. ## The Preamble
  11. Redbrick runs a service called [Hackmd](https://md.redbrick.dcu.ie). It is web
  12. based markdown editor. At the start of the year Hackmd version one came out and
  13. in early February we decided to update it.
  14. Unlike most updates in Redbrick this isn't really a big thing as it runs inside
  15. Docker container, with its only external dependency being Postgres 9. We don't
  16. actually have a central a central Postgres database in Redbrick, a problem for
  17. another day, just a mysql. So inside another container we run Postgres.
  18. The update process was meant to be simple update the `Dockerfile`, run
  19. `docker-compose up --build`, and wait....
  20. ## What Actually Went Wrong
  21. And we waited but hackmd didn't come back. One tail of the logs later and the
  22. problem was obvious Postgres had restarted. But not only that it was trying and
  23. failing to upgrade itself to postgres 10.
  24. The Problem was Docker had pulled the latest version of postgres. This was fine
  25. when hackmd was first installed the year earlier when all the docker tags
  26. pointed to postgres 9, but wasn't so great after.
  27. ## The Fix
  28. The fix was pretty simple add a version tag to for the database image to the
  29. `docker-compose.yml`. A quick `vim` and `docker-compose up` later and hackmd was
  30. back with lots of new bells and whistles.
  31. So we went through and changed all the other compose files to have tagged
  32. versions of dependencies.
  33. Problem Solved. Clean our hands and move on. Well unfortunately not.
  34. ## The Solution
  35. Over the next couple of months couple problems were mentioned with hackmd but no
  36. one really looked in to them. That was until it affected an admin. We couldn't
  37. publish our roadmap for in coming admins.
  38. So back to the logs and yep database issue. Seems that Postgres cant find some
  39. of the keys. Initial thought was that we missed a database migration way back
  40. when we upgraded. So we execed in to the container ran the migrations script and
  41. .... nothing, there wasn't any.
  42. Back to the drawing board. We start reviewing configs and double checking
  43. against the repo. But everything seems right. Next we decide to go to the heart
  44. of the problem the database itself. One docker run and we have a postgresql
  45. shell. Start digging though tables, trying to find the missing key when we get a
  46. duplicate key error and an alias.
  47. BINGO
  48. Odd thing was when you looked up the alias there was only one entry for it. We
  49. couldn't delete the duplicate entry as it didn't exist and couldn't modify the
  50. table entries as we got duplicate key errors.
  51. Bit of googling later and we had a solution. Copy the table, delete it and
  52. restore. Is this the Database version of turn it off and on again?
  53. ```sql
  54. hackmd=# SELECT DISTINCT * INTO notes from "Notes";
  55. SELECT 268
  56. hackmd=# DROP TABLE "Notes";
  58. hackmd=# ALTER TABLE notes rename to "Notes";
  60. hackmd=# REINDEX DATABASE hackmd;
  63. ```
  64. What it turns out is the tables index was wrong. While Postgres' attempt to
  65. update itself to 10 had failed it had modified the indexes for some of the
  66. tables and reverting the container didn't magically fix the database inside.
  67. So the tl;dr.
  68. * Always lock your container version
  69. * containers don't magically fix things
  70. * And validate your database after modifying it