Maintenance Day - Nov 3rd

fearlessfrog · November 3, 2020, 11:05pm

I’ll need to keep rebooting the forum side for a bit today, so if anyone is writing anything long with lots of pictures then today isn’t a great day for it.

fearlessfrog · November 3, 2020, 11:48pm

Ok, we’re on new hardware so I won’t be doing that annoying ‘read only mode’ again unless I preannounce.

The next bit will require that file/image uploads are disabled. It will reject anything larger than 0 bytes (that’s pretty small) but I need to move the images and don’t want to lose any on the way (or shut down the entire forum while it’s happening).

Also, if images completely disappear in old topics then that’s to be expected, as I’m setting up a test of only a couple of test images in the new location, so old ones will break. It should only be for a short time.

weaponz248 · November 4, 2020, 12:05am

They all look normal to me,

chipwich · November 4, 2020, 12:39am

The Boss said new hardware. But this?

This isn’t new enough?

No, not that. It’s just so dang huge.

Digital Combat Simulator Black Shark Screenshot 2020.11.03 - 16.56.58.99

fearlessfrog · November 4, 2020, 1:17am

Ok, image uploading turned off now - starting the files migration to new thingy. Not sure how long it will take, so we might have to use the expression ‘if a thousand words painted an image, use those’ for a bit.

Thanks for your patience everyone!

Wes · November 4, 2020, 1:51am

When you can, can we get some updated stats? How big is our slice of the internet pie these days?

fearlessfrog · November 4, 2020, 2:01am

Rough figures, but there are about 512,000 image uploads to move to new storage, that comes in to about 240GB when optimized down a bit. I’m trying to get the costs down and switching CDN too. One of the things that has grown uncomfortable over the last 4 years or so is the cost of the CDN/files hosting side, so a lot of today is to tweak that. The longest part of this is the moving from one AWS S3 bucket to another, as it’s not a quick process due to the number of objects. It might be a day or two before we can reenable uploads unfortunately. The underlying app doesn’t store the objects in a way that’s conducive to ‘by date’, so I can’t do an overlap restore easily.

For the server relocation, we run inside Docker containers so this was more a case of moving from an ancient Ubuntu 14 base to Ubuntu 20 LTS. The easiest way to do that is backup/new server/restore rather than upgrade in-place (for the thing running the Docker containers).

Our Docker containers are (1) ngix and the main app, (2) Postgres 12 relational DB, (3) a memory store called Redis in its own instance, which is mainly used for background jobs sync.

Wes · November 4, 2020, 2:27am

I had looked up on Discourse and how it works before to have seen that it’s Docker based and that is the only way they build it but I haven’t read in detail to know that you can move the database externally but I suppose that makes sense as a scaling option as does Redis.

In regards to the images how are things optimized behind the scenes? Should we try to pre-optimize our images at all in terms of format or max resolutions?

fearlessfrog · November 4, 2020, 2:45am

We take up to 4MB per image, which I think is the smallest real limit, given we are starting to get 4K screenshots etc, and we don’t want to make it hard to upload. What it does do is store the original (that you can click through to) and then we make a thumbnail ‘pyramid’, so that the versions shown in the topics are optimized for size and fidelity. There’s about 3 versions made from the original, with the aim of making browsing the site faster in long topics with lots of images. It means there’s a lot of files when you decide to relocate them It’s the first time we’ve done so since about 2015, so it was a good run.

fearlessfrog · November 4, 2020, 3:01am

I should say stuff in terms of traffic as well.

Last month we got 14,989,978 web requests, across the forum and articles site, with 6.3TB of bandwidth served. The total unique IP visitors was 133,592. Our firewall/WAF blocked 2,055 attacks last month, mainly denial of service and/or broken packets that try to stop the site from serving content. There’s a bot in Ukraine that’s been running for over 4 years now that every 3 seconds tries to overload anything at the ‘forums.’ CNAME - people love us apparently!

Seriously though, it’s crazy Chris pays for this all out of his pocket (and with help from member donations as well, although I don’t want/have visibility in any of that).

For something put together with string and bits of salvia (don’t ask), it’s actually got decent traffic for something with no ads or sponsors.

fearlessfrog · November 4, 2020, 6:07pm

Ok, the big file objects (mainly images) move sort of didn’t work, so I’m going to punt this for a month or so. We planned to use this time to (a) move servers and (b) move files/CDN and (a) has been done but (b) is partially complete. Essentially what happened is a short network outage during the copy process, meaning I need to reverify them all (so we don’t lose images, even just a few) and the verify takes about almost as long as the copy. C’est la vie, I’ll do something better next time with retries and longer timeouts etc.

I’ll reenable image uploads to the existing infrastructure we have now, so no harm done. Sorry about the duration of the ‘maintenance ‘‘day’’’ but looks like it’s all over for now - thanks for your patience.

Wes · November 4, 2020, 6:16pm

Awesome work @fearlessfrog!
Not exactly as planned but what plan ever survives contact with the enemy, right?

For all the time we’ve been here if that’s a “bad” maintenance window, I don’t think anyone has any complaints. Perhaps just withdrawal symptoms…

We’re lucky to be on such a good platform, which credit is due to those developers of course, but the trophy goes to you Mudspike maintainers for not only keeping us on line but making such an excellent choice of platforms in the first place!

sobek · November 4, 2020, 6:23pm

Typical.

Does AWS have something like gsutil?

fearlessfrog · November 4, 2020, 6:39pm

It does, and and the defaults were a bit too low and I didn’t remember to up them, plus I didn’t gather the timeouts in a log. So basically I made sweet sweet love to the pooch, but will get another go next month. It’s really for a % cost saving, so it can wait a bit.

tempusmurphy · November 4, 2020, 7:25pm

Just want to say a big THANK YOU to all the great people who make this haven the fantastic place it has become.

Victork2 · November 4, 2020, 7:29pm

Great work guys!!

chipwich · November 4, 2020, 8:34pm

Many thanks @fearlessfrog for your efforts. The down time was pretty minimal IMHO, and do I detect a wee bit snappier response?

fearlessfrog · November 4, 2020, 8:39pm

No worries - yep, it should be a bit better. We have our hamsters at peak efficiency now.

Hamster Working Out GIF - Hamster Workout Gym - Discover & Share GIFs

piper · November 4, 2020, 11:39pm

Well done fearlessfrog!

Like I keep telling my manager when the sprint ends, “shits complicated”.