HTLAL Preservation

Posted: Tue Apr 25, 2023 5:26 am
by Raconteur
Is there any effort currently underway to preserve HTLAL content in full or in part?

Old HTLAL is still a goldmine of valuable resources and discussions, much of it from a time when forums in general were more lively. I do wonder what can be done to preserve that going forward. For some threads especially relevant to me, I already did an unsophisticated copy+paste backup. Is there more that can be done, however?

I just hate the idea of one day going to and seeing the website domain permanently gone. And it's bound to happen.

But what can be done about it? Saving "important threads" manually seems a gargantuan task. It would take a team. And moreover, who decides what to keep, how do we store it, and publish it, and ... would publishing it even be legal? (I vaguely recall some odd terms that stated that whatever got posted on HTLAL became IP of the domain, could be wrong tho).

As for more sophisticated/automated ways to backup the site wholesale, is there any way forward? We all know that backs up websites. But their snapshots are often incomplete. Is there a way to nudge them to categorize the HTLAL forum as website of some significance? Would that mean a complete (or more complete?) backup over on the Wayback Machine?

Just some musings from a longtime reader admittedly not very well versed in the technologies in question, but concerned about the resource disappearing for good.

Re: HTLAL Preservation

Posted: Tue Apr 25, 2023 10:35 am
by Iversen
I remember that there was a member who made a complete copy, but it would definitely be against the 'odd terms' to republish the lot in its current form. As for individual items - well, you gave FX the right to dispose over your contributions, but it's unclear whether this excluded to make partial copies of the parts of the forum at the content level - at least the 'authors rights' aren't obliterated by the terms. I have a complete copy in MSWord of my own contributions, and I wouldn't hesitate to quote them (partly because I made them in conjunction with the upload to HTLAL and I didn't give away my rights to use my own copies)- but in practice I rarely consult my own writings.

Re: HTLAL Preservation

Posted: Tue Apr 25, 2023 10:45 am
by tastyonions
Some enterprising programmer could definitely write a scraper to simply copy all the pages, and I don’t think that would be terribly complicated. Hosting it somewhere would be another question.

Re: HTLAL Preservation

Posted: Tue Apr 25, 2023 1:27 pm
by Raconteur
Out of curiosity, these are the terms I dug up (registration page). Don't think that's what I remember from way back when, but either way maybe these "current" ones aren't as restrictive? In IP legal terms, I got no clue.
If you do post content or submit material you grant and its affiliates a nonexclusive, royalty-free, perpetual, irrevocable, and fully sublicensable right to use, reproduce, modify, adapt, publish, translate, create derivative works from, distribute, and display such content throughout the world in any media. You grant the right to use the name that you submit in connection with such content. You represent and warrant that you own or otherwise control all of the rights to the content that you post; that the content is accurate; that use of the content you supply does not violate this policy and will not cause injury to any person or entity; and that you will indemnify or its affiliates for all claims resulting from content you supply. has the right but not the obligation to monitor and edit or remove any activity or content. takes no responsibility and assumes no liability for any content posted by you or any third party.

Re: HTLAL Preservation

Posted: Tue Aug 08, 2023 12:09 am
by mrwarper
Raconteur wrote:Is there any effort currently underway to preserve HTLAL content in full or in part?

"In part" being the operative word in 2023, there is. I am probably the member who Iversen remembers made a complete copy.

My backup covers the full HTLAL forum posts (that did not require moderator access) up to the semi-comatose state back in 2020 when the last sizeable amount of messages were posted there. There was some little non-spamming activity afterwards (up until mid-2022 IIRC) but I had noticed a few glitches in the forum by that time, so I stopped updating my 'full' backup automatically and post-2020 backups should be merged manually to ensure nothing meaningful is lost before a 'final cut' of the archive can be produced.

Unfortunately anything like that (a few days' worth of work) is more likely to happen when I decide to archive the thing once and for all and gain back some GBs of space, rather than out of interest in language learning discussions. I would like to think it is because I have finally 'matured' as a language learner --so I would not need to keep referring to the same old (or new), tired discussions--, but maybe it is simply because I have too many higher priorities now (about time too!). One way or another, the fact is I seldom read anything any more at old HTLAL, or here.

Please don't get me wrong, it is still fun to read you guys, but my so-renewed flare never seems to last enough to post again (it took me some conscious effort to log in and / to write this), and it's not unusual that I take months' breaks between very short visits -- I don't think I have kept a dedicated browser window for a couple of days just to read so many forum threads in full, in years.

That said, I will try and find a suitable home for the HTLAL backup if I ever think I am done with it for good. Meanwhile, anyone interested please feel free to contact me*, I am always open to discussing stuff ; )

* Just keep in mind that sometimes the forum does not notify me of PMs and such, so if you want me to read you before another year or more goes by, email me. If the forum won't let you, simply append -at- yahoo -dot- es to my forum user name and you have my address.

Re: HTLAL Preservation

Posted: Tue Aug 08, 2023 7:38 pm
by rdearman
I would have to speak with EMK, but we have a number of DB instances hosted on AWS, so we could possibly get you a mysql or other DB instance for you to load everything on to?

Cloud storage is probably best. Or maybe stick it on Google Drive, post a link here and lots of people can take copies.