we screw up sometimes but they're usually fun stories
25개 이상의 토픽을 선택하실 수 없습니다. Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

2018-12-08.md 7.9 KiB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113
  1. ---
  2. title: 'NFS Out Of Sync: Apache Outage'
  3. date: '2018-12-08T21:11:41+01:00'
  4. author: greenday
  5. twitter: gruunday
  6. description: NFS Out Of Sync
  7. tags:
  8. - Apache
  9. - NFS
  10. ---
  11. # Redbrick Web Server Outage 07/12/2018 - 08/12/2018
  12. ## Alert Recieved
  13. * A raintank alert was recieved @ 23:38 to inform that the website was down
  14. * A customer informed the site was down @ 00:38
  15. ## Alert Validation
  16. * Exploration to the site revealed that there was in fact an apache error
  17. * The error was a 403 that apache couldn't read the files
  18. * And interesting not is that the webserver could also not read the custom redbrick error page, another hint that this was bigger than just one folder
  19. ## Fix
  20. * Error logs were investigated
  21. * Apache error logs gave an error of the following
  22. ```
  23. [Sat Dec 08 11:53:33 2018] [error] [client 66.249.81.152] (1)Operation not permitted: file permissions deny server access: /webtree/redbrick/rb_custom_error/403.html
  24. [Sat Dec 08 11:53:33 2018] [crit] [client 66.249.81.150] (1)Operation not permitted: /home/member/m/.htaccess pcfg_openfile: unable to check htaccess file, ensure it is readable
  25. [Sat Dec 08 11:53:33 2018] [error] [client 66.249.81.150] (1)Operation not permitted: file permissions deny server access: /webtree/redbrick/rb_custom_error/403.html
  26. [Sat Dec 08 11:53:33 2018] [crit] [client 66.249.81.154] (1)Operation not permitted: /home/member/m/.htaccess pcfg_openfile: unable to check htaccess file, ensure it is readable
  27. [Sat Dec 08 11:53:33 2018] [error] [client 66.249.81.154] (1)Operation not permitted: file permissions deny server access: /webtree/redbrick/rb_custom_error/403.html
  28. [Sat Dec 08 11:53:33 2018] [error] [client 66.249.75.33] PHP Warning: Unknown: failed to open stream: Operation not permitted in Unknown on line 0
  29. [Sat Dec 08 11:53:33 2018] [crit] [client 46.229.168.140] (1)Operation not permitted: /webtree/w/wiki/.htaccess pcfg_openfile: unable to check htaccess file, ensure it is readable
  30. [Sat Dec 08 11:53:33 2018] [error] [client 46.229.168.140] (1)Operation not permitted: file permissions deny server access: /webtree/redbrick/rb_custom_error/403.html
  31. [Sat Dec 08 11:53:38 2018] [crit] [client 157.55.39.210] (1)Operation not permitted: /webtree/p/pubsoc/.htaccess pcfg_openfile: unable to check htaccess file, ensure it is readable
  32. [Sat Dec 08 11:53:38 2018] [error] [client 157.55.39.210] (1)Operation not permitted: file permissions deny server access: /webtree/redbrick/rb_custom_error/403.html
  33. ```
  34. * This lead us to view logs from dmsg
  35. ```
  36. [36821900.601330] NFS: Server 192.168.0.24 reports our clientid is in use
  37. [36821900.605982] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  38. [36821905.612160] NFS: Server 192.168.0.24 reports our clientid is in use
  39. [36821905.616701] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  40. [36821910.622881] NFS: Server 192.168.0.24 reports our clientid is in use
  41. [36821910.626795] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  42. [36821915.633815] NFS: Server 192.168.0.24 reports our clientid is in use
  43. [36821915.637714] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  44. [36821920.644780] NFS: Server 192.168.0.24 reports our clientid is in use
  45. [36821920.648684] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  46. [36821925.655444] NFS: Server 192.168.0.24 reports our clientid is in use
  47. [36821925.660511] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  48. [36821930.666309] NFS: Server 192.168.0.24 reports our clientid is in use
  49. [36821930.670822] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  50. [36821935.677022] NFS: Server 192.168.0.24 reports our clientid is in use
  51. [36821935.680605] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  52. [36821940.687986] NFS: Server 192.168.0.24 reports our clientid is in use
  53. [36821940.691938] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  54. [36821945.698937] NFS: Server 192.168.0.24 reports our clientid is in use
  55. [36821945.702396] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  56. [36821950.709790] NFS: Server 192.168.0.24 reports our clientid is in use
  57. [36821950.713700] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  58. [36821955.720501] NFS: Server 192.168.0.24 reports our clientid is in use
  59. [36821955.724923] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  60. [36821960.731372] NFS: Server 192.168.0.24 reports our clientid is in use
  61. [36821960.735952] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  62. [36821965.742345] NFS: Server 192.168.0.24 reports our clientid is in use
  63. [36821965.746246] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  64. [36821970.753027] NFS: Server 192.168.0.24 reports our clientid is in use
  65. [36821970.756539] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  66. [36821975.763974] NFS: Server 192.168.0.24 reports our clientid is in use
  67. [36821975.767870] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  68. [36821980.774846] NFS: Server 192.168.0.24 reports our clientid is in use
  69. [36821980.779018] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  70. [36821985.785629] NFS: Server 192.168.0.24 reports our clientid is in use
  71. [36821985.790880] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  72. [36821990.796508] NFS: Server 192.168.0.24 reports our clientid is in use
  73. [36821990.800403] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  74. [36821995.807262] NFS: Server 192.168.0.24 reports our clientid is in use
  75. [36821995.811159] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  76. [36822000.818190] NFS: Server 192.168.0.24 reports our clientid is in use
  77. [36822000.822107] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  78. [36822005.828955] NFS: Server 192.168.0.24 reports our clientid is in use
  79. [36822005.833709] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  80. [36822010.839894] NFS: Server 192.168.0.24 reports our clientid is in use
  81. [36822010.845658] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  82. [36822015.850727] NFS: Server 192.168.0.24 reports our clientid is in use
  83. [36822015.854476] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  84. [36822020.861622] NFS: Server 192.168.0.24 reports our clientid is in use
  85. [36822020.865539] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  86. [36822025.872404] NFS: Server 192.168.0.24 reports our clientid is in use
  87. [36822025.876325] NFS: state manager: lease expired failed on NFSv4 server 192.168.0.24 with error 1
  88. ```
  89. * From this an admin identified the error "clientid is in use" can mean that NFS (Netword File Storage) and server (Web Server) were out of sync
  90. * This means that there were error messages to do with permissions
  91. * The next step was to try unmount the NFS and remount it
  92. * Apache was using the NFS and would not let the NFS unmount
  93. * Apache was stopped.
  94. * There was still something stopping the NFS from unmounting
  95. * It was decided that safest option over forcing an unmount was the reboot the machine
  96. * The machine was rebooted and the NFS mounted successfully
  97. * The website and files were restored to their working state
  98. On behalf of the admin team we appologise for the outage
  99. Regards,
  100. greenday && The admin Team