Serpent wrote:edit: it does render correctly starting from May 27, but that's only the last half a year of the challenge
unless there's a connection to the security issues i kinda don't understand why this had to be done so late in the challenge, and couldn't be put off until the next one began
The superchallenge bot has three parts:
- The database, which stores everybody's content data.
- The web app.
- The "update" task, which contacts Twitter every 5 minutes to see if there are new tweets.
Here's a rough timeline of what happened:
- (earlier this year) The old site was broken into, and all the user accounts, passwords, forum code, etc., were at risk of being compromised.
- We shut down the old site, posted a warning message, and migrated the whole site to a new dedicated web server and database (which is why we don't get constant timeouts any more).
- When we migrated the site, I we also migrated the database. The old database was configured to use ISO Latin 1 (a western European character set), but it was actually storing UTF-8 data with each byte being treated as a Latin 1 character. I fixed it using a variation of this approach. Unfortunately, even though we migrated the database, we didn't have enough time to properly package the web app and the update task for the new server, and rdearman ran them someplace else.
- At some point, I think the Super Challenge database might have been imported a second time, or some other code wasn't fixed properly, and we went back to confused encodings.
- Last week, rdearman's old hosting company told him that they no longer supported the "update" task, which would leave the bot unable to poll Twitter.
- This weekend, rdearman and spent about 5 hours on Google Hangouts moving the web app and the "update" task to the new server. This means that anybody can submit code for new features, we can review it, and we like the changes, we can deploy them in just a few minutes.
We couldn't have just left this alone until the end of the challenge, because that would have meant either storing the database on a compromised server in April), or it would have meant that the bot could no longer read new tweaks (last week).
Ideally, what we need to do at this point is identify when the bad tweets end, and the good tweets begin, preferably to the nearest day. Then I would need to go into the database, identify the last broken tweet and the first good one, and write a whole bunch of SQL to fix the double-encoding for some database records but not others. Unfortunately, this is pretty fiddly work and it usually takes me a while to sort it out.
I apologize to everybody doing the challenge! I'm sorry that it's taken so long to get the bot sorted out.
EDIT: Oh, yeah. If anybody is interested,
all the code for the site is publicly available. I can also provide a few sample good and bad rows from the database if anybody wants to try their hand at this.