ArchiveBox or similar for shared archiving of research project

Stopwatch1986@lemmy.ml · 2 days ago

ArchiveBox or similar for shared archiving of research project

irmadlad@lemmy.world · 1 day ago

I wonder if an authorised remote user (ie an affiliated researcher) can easily instruct ArchiveBox to store a URL and later retrieve it

Once you download the data and persist it on local storage, it’s available to whomever has access to that drive or server.

Also, ideally a random user should be able to retrieve the archived web page or file (eg a PDF, CSV etc).

For rando access, you could put the data on a public ftp server, or even get fancier with html styled pages. If I understand you correctly, you want a random user to be reading your report that has citations, so that when a rando user clicks the citation, they are presented with whatever you downloaded with ArchiveBox. Kind of Wikipedia style. Speaking of which, a wiki framework might be just the ticket you are looking for.

Download the data, integrate it in to a selfhosted wiki, and it would be available to rando users. Of course your wiki server will have to have all the accoutrements of security so you don’t get hacked by a bazillion bots.

Stopwatch1986@lemmy.ml · 11 hours ago

A wiki is a good idea. Putting a Singlefile or similar all-in-one file in a repository and provide index numbers organised as a look-up table would also work for easy retrieval by a random research user. Both require some admin and more effort from the researchers.

I wish there was a hostable version of archive.is for near-zero maintenance. You just submit a URL over the internet and the web page is cached once along with a screenshot. Then, anyone can access the archived version. This can be done already with archive.is but we have no control over its future, which is critical for long-term dependable archiving.

irmadlad@lemmy.world · 8 hours ago

This can be done already with archive.is but we have no control

Did a little digging this morning. I honestly can’t find a selfhosted, archive.is alternative. All the solutions I came up with are either paid for and online use only, or free, but still online use only.

Stopwatch1986@lemmy.ml · 4 hours ago

Thanks for doingthe digging. An archivist may know something more. Or the archive.is people.

irmadlad@lemmy.world · 4 hours ago

It might be worthwhile to run your scenario by the folks at https://lemmy.world/c/datahoarder