Page 1 of 2

Maintenance underway on Saturday

Posted: Sat Oct 14, 2017 5:13 pm
by emk
Server performance has been terrible Saturday, and I'm troubleshooting. More soon, and I apologize for the inconvenience.

Re: Maintenance underway on Saturday

Posted: Sat Oct 14, 2017 5:21 pm
by emk
emk wrote:Server performance has been terrible Saturday, and I'm troubleshooting. More soon, and I apologize for the inconvenience.

Ah, there we go! I switched the entire forum to a brand-new cloud server (using our Terraform scripts, so it took about 10 minutes), and now it's moving along at a nice speedy clip again! I'm not sure what was wrong with the old server, but not even rebooting it was helping any, so it's gone. If you see bad performance again in the coming months, please feel free to mention it here. This forum is supposed to be fast, and if it's not, that's a bug.

Re: Maintenance underway on Saturday

Posted: Sat Oct 14, 2017 5:50 pm
by emk
OK, we may have another 15 minutes of downtime while I mess around with fixing this issue as well.

Re: Maintenance underway on Saturday

Posted: Sat Oct 14, 2017 6:06 pm
by emk
Looks good! The X-Forward-For stuff has theoretically been set up, and I'm testing it now.

Re: Maintenance underway on Saturday

Posted: Sun Oct 15, 2017 12:23 am
by emk
OK, I'm sad. :cry: The X-Forward-For fix didn't work. I'll need to fix it in the web server, not PHP.

Re: Maintenance underway on Saturday

Posted: Sun Oct 15, 2017 1:38 am
by Adrianslont
emk wrote:OK, I'm sad. :cry: The X-Forward-For fix didn't work. I'll need to fix it in the web server, not PHP.

I have no idea what that means but I appreciate that you do and that you are giving your time so that we may quibble over our opinions and share our language learning experiences. You are a legend.

Re: Maintenance underway on Saturday

Posted: Mon Oct 16, 2017 10:28 am
by emk
Somebody is hitting the site with a ton of traffic, and the site is not responding gracefully. Check this out:

Code: Select all

[ec2-user@ip-172-31-19-62 ~]$ uptime
 10:12:52 up 1 day, 16:56,  1 user,  load average: 152.63, 153.32, 153.24


"load average: 152" means that we've got 150 processes waiting for 4 CPUs, which is why everything takes forever.

What's happening is that:

  1. We're getting a huge number of inbound requests, so...
  2. The Apache webserver is spinning up more copies of itself to handle the traffic, but...
  3. Eventually it runs of RAM, and so...
  4. Everything becomes hugely slow, therefore...
  5. Go to 2, and repeat until the server dies.

The solutions are some combination of:

  1. Teach Apache not to start so many copies, and just "shed" the traffic with errors instead, which would at least break the loop above. This is more annoying than it should be, because I'm using the official PHP distribution for Docker, which is apparently garbage in this regard (and several others).
  2. Figure out who's hitting our site with lots of traffic and block them.
  3. Pay for a bigger server.

Re: Maintenance underway on Saturday

Posted: Mon Oct 16, 2017 10:54 am
by emk
emk wrote:Figure out who's hitting our site with lots of traffic and block them.

The "Yandex" search engine indexing bot is just hammering the site, with zero sense of politeness. I'm going to try blocking it using robots.txt and see if it takes a hint.

I still need to tune the PHP Apache image to only start a small number of servers, though, as a longer-term measure.

Re: Maintenance underway on Saturday

Posted: Mon Oct 16, 2017 11:56 am
by emk
OK, Yandex is banned, which helped some. But I'm still seeing terrible performance. :-(

Working notes:
  • Performance is slow even when we're not running too many Apache instances. It's extremely fast for a few minutes after restarting Apache (and maybe for a longer period after replacing the machine last night).
  • The slow requests are (1) page rendering and (2) image fetches, but static pages are relatively fast.
  • Normally, this would point the blame at either the database (which looks really fast, however) or the EBS disk (which also looks good).
  • We're not out of magic "burst mode" credits, which is the Amazon technology we use to keep this site running on a shoestring. Specifically, we have a full set of DB and disk credits, so no problems there. CPU credits are dangerously low but not exhausted (this morning at least), so we should be running at full CPU speed.
So I'm a bit confused, to say the least. I've ruled out the obvious culprits, and I'm addressing various issues I know about. But this may take a few days to figure out.

Re: Maintenance underway on Saturday

Posted: Mon Oct 16, 2017 4:08 pm
by zenmonkey
Just for info - I got a lot of "504 Gateway Timeout Error" this morning.