As stated in the short description,

"I will help the contributors link from their texts directly to the passages they quote, within the limits of online availability of the originals. Based on these attempts, I hope to find a way to automate the process."

The ambition of this meta project is to improve referencing within the online journal in comparison to common practices:

The way I envision it, this project could bring the reader a click away from the cited section. Superficially, this would save time,

It matters whether the contributors to the disrupted Journal of Media Practice are interested in this magnitude of hyperlinking, whether they think this project will add anything to theirs. Being part of this project is a choice to be made. Since some contributors are using their own servers to host the content it is important to consider whether they are comfortable with linking to de-privatized knowledge. Even though linking to pirated content that is already available to the public can not be seen as copyright infringement under the European Copyright Directive, many may see it as a grey area best avoided in "public".

None of the affordances described above seem too difficult to implement and the point of this writing is to develop the idea, communicate it with other contributors to the journal, gather suggestions and document the difficulties one encounters when setting something like this up. Like that, I want to move as far as possible in the direction of automating direct linking. And since one of the goals of this volume is to test how much the form of the scholarly article can be transformed, it makes sense to think about "solutions" which can cover different formats and forms (html, doc, odt, pdf/print ...). But this is all under the hood and I discuss it in greater detail a bit further down the blog. First, let's look at the surface.

This is what a citation could look like

"I went down yesterday to the Piraeus with Glaucon the son of Ariston (Plato 2004: 1)."

By hovering over number 1 above, which refers to the page number in Plato's republic, one can access multiple versions of page number 1 of Plato's Republic. Furthermore, I have linked the year (2004) with the bibliographical entry below, whence one can follow various links to the book in question (not the specific page).

Project Gutenberg (html) via Hypothesis Aaaaarg (clip) This server (pdf)

In essence, that's all there is to this. The difficult part is to make direct linking to references work across formats and to chip away at the amount of time authors have to spend looking up links. Then again, there are other issues still. I shall slowly move through these in the course of the publishing process of this journal and subsequent blog posts. As it now stands, I see two steps towards furthering this project. The first is to make referencing a part of the editorial work - meaning that I would go through all the references contributors make in their final platform versions for the disruptive Journal of Media Practice, search for as many original documents for their references as are available online and append all the links. Whatever I learn from that will hopefully be helpful when attempting to automate the process.


To "make direct linking to references work across formats and to chip away at the amount of time authors have to spend looking up links" one needs to:

  1. Avoid the need to manually look up discrepancies between page numbers in pdfs, figure out how to link to html and epub?
  2. Avoid the need to manually find additional versions of files/content?
  3. Decide whether the code executing this is meant to be used by contributors during writing or later on, by editor and designers? Or both?
  4. Based on decisions with regard to the above, write a program which is capable of generating citation entries and bibliographies with multiple links.

@1: In the above example, page 36 of the PDF ("This server") corresponds to page 1 of Plato's writing. In most if not all cases there will be differences between page numbers in the print journal/book (the ones that we refer to - 1) and page numbers the way a PDF reader "sees" them (the one's that are needed for linking - 36). This is not a problem if one creates each link manually, as it involves looking up both numbers anyway. Without additional effort, one could go even further and use a public library called aaaaarg, which has a functionality that affords linking to specific sections of pages. Even if one considers linking to content in other formats, such as html/epub, video and audio, the amount of link building remains the same - one just needs to find a linkable section (something with an id in html) or create one (by highlighting text with hypothesis). The decision to be made here is whether it makes sense to try and minimize the amount of manual work involved at this point or not?

@2: Perhaps work-saving should only enter the picture at the next stage, when trying to find and link to additional versions of the cited content? This could be achieved by automatically retrieving links based on information usually provided in bibliographies (title, year, author, perhaps DOI) with the help of open-access databases (DOAB, DOAJ, Library Genesis, aaaaarg, Project Gutenberg etc.). It might be interesting to include versions in other languages. The "manual" work above (such as the found discrepancy between PDF and print page numbers) would then serve as the basis on which to generate direct links automatically.

@3: The way I see it, this is a decision about which formats the code works with and what it's interface is like. Will it only turn docx into html? Markdown into html and pdf? Is it just a script run from the terminal? Is it a Wordpress plug-in? An add-on for Zotero and Word? Is it an extension of Pandoc, a universal document converter, widely used in online publishing?

@4: In any case, there is work to be done. And opinions to be expressed?


Plato. 2004. The Republic. Hackett Publishing Company: Indianapolis, Cambridge.
Libgen Aaaaarg Project Gutenberg This server