We're hitting our server limits this week

Discuss technical problems and features here
User avatar
emk
Brown Belt
Posts: 1222
Joined: Sat Jul 18, 2015 12:07 pm
Location: Vermont, USA
Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish.
Language Log: viewtopic.php?f=15&t=723
x 3888
Contact:

We're hitting our server limits this week

Postby emk » Mon Oct 16, 2017 1:02 pm

While working on the recent maintenance issues, I realized that we've maxed out our server for the moment.

In order to keep the costs for this site low ($30/month for everything), we run on an Amazon t2.micro "burst" server. This is normally quite fast for a cheap server, but this is because it has certain number of "CPU Credits". Once it runs out of CPU credits, it slows down tremendously. This makes sense for a small site like ours, because we want pages to load quickly, but we're not serving 100s of requests per second day in and day out.

Which brings me to Yandex, which is apparently a Russian search engine or something. It was crawling our site at high speed, and it used up all of our CPU credits. Here's the graph of our balance over the last day and a half:

llo-cloudwatch-cpu-last-3-days.png
llo-cloudwatch-cpu-last-3-days.png (168.01 KiB) Viewed 648 times


Oops. Well, I've banned Yandex, and I'm going to tell some other search engines to put the brakes on, and we'll see if that helps.

In the longer run, we could upgrade from a t2.micro to a t2.small:

llo-instance-cost.png
llo-instance-cost.png (118.09 KiB) Viewed 648 times


This would take our current monthly costs of US$30/month, and raise them to about $45:

llo-monthly-costs.png
llo-monthly-costs.png (235.5 KiB) Viewed 648 times


Or we could be be really extravagant and consider an m3.medium, which doesn't have "burst mode" CPU credits, and which can't fall off a cliff. But it's more expensive, and probably massive overkill.

So I'm going to try to tweak the robots.txt file a bit to slow down the crawlers, and see if the problem goes away. Cross your fingers.
8 x

User avatar
emk
Brown Belt
Posts: 1222
Joined: Sat Jul 18, 2015 12:07 pm
Location: Vermont, USA
Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish.
Language Log: viewtopic.php?f=15&t=723
x 3888
Contact:

Re: We're hitting our server limits this week

Postby emk » Mon Oct 16, 2017 1:42 pm

As a temporary workaround, I've replaced our server (again), giving us a new 30-credit balance on CPU credits, and I've set an alarm to notify me if the balance goes below 10 credits, so maybe I can catch this problem before it becomes critical. I've also asked most crawlers to slow down their crawl speed, because it looks like that's probably the root of our problems. Users read a few pages and take their time, which is a good match for the CPU credit model, but crawlers just keep grinding away hour after hour.

And of course, I put in a hard limit of 20 Apache workers so we can't spin up 150+ when things start melting down, and we'll instead dump traffic. And I've told Apache to make new PHP processes every 1,000 requests, so if PHP is leaking memory, it won't do anything bad.

But if the site still keeps getting slow again, we'll need a bigger server.
13 x

User avatar
emk
Brown Belt
Posts: 1222
Joined: Sat Jul 18, 2015 12:07 pm
Location: Vermont, USA
Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish.
Language Log: viewtopic.php?f=15&t=723
x 3888
Contact:

Re: We're hitting our server limits this week

Postby emk » Mon Oct 16, 2017 2:34 pm

OK, it looks like our CPU credits have stabilized nicely for the time being, since I banned Yandex crawls and slowed down Bing, etc.:

llo-cpu-credit-balance.png
llo-cpu-credit-balance.png (73.79 KiB) Viewed 616 times


So we can keep running on the cheap server for now, and performance should stay reasonably high with any luck. It shouldn't take a big server to run this forum. We just need to keep an eye on bots and on tuning, I think. Not that I wouldn't love that t2.small instance; it has more RAM which would allow us to host more projects like the Super Challenge bot.
14 x

User avatar
emk
Brown Belt
Posts: 1222
Joined: Sat Jul 18, 2015 12:07 pm
Location: Vermont, USA
Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish.
Language Log: viewtopic.php?f=15&t=723
x 3888
Contact:

Re: We're hitting our server limits this week

Postby emk » Tue Oct 17, 2017 12:06 pm

Just an update on the performance tuning. With Yandex disabled, and several other search engines forbidden to index more than 1 page every 10 seconds, our CPU credit balance seems to be accumulating nicely:

llo-cpu-credit-with-new-robots-txt.png
llo-cpu-credit-with-new-robots-txt.png (117.61 KiB) Viewed 445 times

As long as this number stays well above 10 credits or so, the forum should normally be snappy. It's just that we can't stand multi-hour indexing runs where a search engine requests a page or more every second, because they eventually drain the balance.
2 x

User avatar
Serpent
Black Belt - 2nd Dan
Posts: 2517
Joined: Sat Jul 18, 2015 10:54 am
Location: Moskova
Languages: heritage
Russian (native); Belarusian, Polish

fluent or close: Finnish+ (certified C1), English; Portuguese, Spanish, German+, Italian+
learning: Croatian+, Ukrainian, Czech; Romanian+, Galician; Danish, Swedish
exploring: Latin, Karelian, Catalan, Dutch, Chaucer's English
+ means exploring the dialects/variants
x 3298
Contact:

Re: We're hitting our server limits this week

Postby Serpent » Tue Oct 17, 2017 5:35 pm

Um just saying that Yandex is as big a deal in Russia as Google in the US. Isn't it possible to limit its crawling capacity instead of banning it entirely?
8 x
: 40 / 40 Budva na pjenu od mora: 3rd season (Croatian/Montenegrin)
LyricsTraining now has Finnish and Polish :)

User avatar
rdearman
Site Admin
Posts: 2643
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
French (studies), Italian (studies), Mandarin (studies),
Esperanto TAC (Only god knows why), Finnish (only in it for the cookies)
Language Log: viewtopic.php?f=15&t=1836
x 5495
Contact:

Re: We're hitting our server limits this week

Postby rdearman » Wed Oct 18, 2017 8:12 am

Serpent wrote:Um just saying that Yandex is as big a deal in Russia as Google in the US. Isn't it possible to limit its crawling capacity instead of banning it entirely?

Yes, it is possible. The snag is that I need to configure it as a user (bot) so that the forum software gives out the pages in a nice consistent way rather than the bot following down broken links, etc. I'll configure the bot this week for Yandex and hopefully we can turn it back on and it will play nice with the forum software.

EDIT: I found their UserAgent-ID on their website and configured a bot, just need to turn it back on and make sure it plays nice.
4 x
"Never blame on malice that which can be explained by stupidity."

User avatar
emk
Brown Belt
Posts: 1222
Joined: Sat Jul 18, 2015 12:07 pm
Location: Vermont, USA
Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish.
Language Log: viewtopic.php?f=15&t=723
x 3888
Contact:

Re: We're hitting our server limits this week

Postby emk » Wed Oct 18, 2017 11:11 am

rdearman wrote:
Serpent wrote:Um just saying that Yandex is as big a deal in Russia as Google in the US. Isn't it possible to limit its crawling capacity instead of banning it entirely?

Yes, it is possible. The snag is that I need to configure it as a user (bot) so that the forum software gives out the pages in a nice consistent way rather than the bot following down broken links, etc. I'll configure the bot this week for Yandex and hopefully we can turn it back on and it will play nice with the forum software.

That's probably not going to be enough for Yandex. They crawl our pages much faster than the other search engines, and they do it for many hours at a time. The problem is that this eventually exhausts our "CPU credit" balance, as you can see in the graphs above. Do you see the two points below when the line starts going almost straight down? That's mostly Yandex, from what I saw in the logs.

Image
They just burn through our CPU in a totally irresponsible fashion. Google and Bing are a lot more mellow, and don't just hammer our server for hours on end.

I don't think that creating a bot account will be sufficient to slow down Yandex's crawling. I can try to tune their crawl speed with non-standard robots.txt extensions, but I'm honestly not willing to waste much of my time dealing with abusive bots that try to crawl at ridiculous rates. See this comment by the author of "badbotblocker":

The problem is that both Yandex and Baidu are rather poorly behaved - they hit your website way too fast, downloading large bandwidth files in quick succession. That's actually what led me to the bad-bot-blocker project in the first place. Baidu has also been accused of not respecting robots.txt though I have not personally observed that.
This is the reason they're blocked, not because they're new or non-English.

So once I finish getting the proxy IP address stuff fixed, I can try re-enabling Yandex with a much slower crawl rate. But if they kill our server another time, they're gone for good. Ditto for Baidu: Either they can respect robots.txt and crawl at a reasonable rate, or they're not welcome. But well-behaved bots are welcome to stay. So I'll give Yandex one more chance as soon as the other admin backlog is sorted out.
9 x


Return to “Technical Support and Feature Requests”

Who is online

Users browsing this forum: No registered users and 1 guest