Epidoc and literary artifacts


, , , , ,

It was with a fair amount of interest that I read through AWOL that Corpus Scriptorum Ecclesiasticorum Latinorum (CSEL) texts are now available in XML (TEI/Epidoc) format through Github – just the sorts of texts I’m interested in adding to de commentariis

But it turns out there’s a fair bit of work to do on the texts before they’re usable in a programmatic way. The format of the XML raises two questions for me. It’s always confused me that people talk about using “epidoc” (“Epigraphic documents in TEI XML”) to encode literary texts. Why is it used in this way, to encode documents it is apparently not designed to encode?

The second question follows on from this. I don’t know whether this is an artefact of using Epidoc or if it’s an artefact of the particular choices made to encode the CSEL. The standard numbering systems of the critical editions of these texts are effectively lost in the Epidoc versions of the text online, rendering them problematic for programmatic access to the data in the standard scholarly reference systems.

Different texts have different breakdowns, for example, Book/Poem/Line, Book/Line, Letter Number/Line, and so on depending on the particular text and the choices made by the editor of the critical edition. In the Perseus format (the “old” format?) the TEI documents have a header that tells my programs on De Commentariis the structure of the document breakdown, thus:

<encodingdesc> <refsdecl doctype="TEI.2">
    <state delim="." unit="book"></state>
    <state unit="chapter"></state>
    <state unit="section"></state>
</refsdecl> </encodingdesc>

This tells me that that this particular text is encoded in book.chapter.section format, e.g. 5.3.2. Then the text body itself has those very book/chapter/section divisions in it:

<div1 type="book" n="1">
  <div2 type="chapter" n="1">
      <div3 type="section" n="1">
          <p>Gallia est omnis divisa in partes tres, quarum unam incolunt Belgae, aliam Aquitani, tertiam qui ipsorum lingua Celtae, nostra Galli appellantur.</p>

This gives the document a structured, hierarchical view of the content. Everything contained with the div1 element with the attributes type=“book” and n=“1” is a part of Book 1, and the div2 element inside that with type=“chapter” and n=“1” is 1.1. and inside that the div3 with a type=“section” and n=“1” is 1.1.1. The abstract document structure (according to the standardised referencing established by the critical edition) is encoded directly onto the data structure. It’s an excellent XML structure that reflects directly the way the data is referenced, with enough flexibility to encode many different types of referencing schema, as long at it’s laid out in the metadata and the relationship is hierarchical. It’s easily navigable with standardised XML tools like xpath/xquery or simple XML DOM (document object model) manipulation.

On the other hand, this is not:

<p>Sancto episcopo Salonio Saluianus salutem in domino. <note type="chapter"> 1 </note> </p>
<p>Omnes admodum homines, qui pertinere ad humani officii <lb n="5”></lb>

In this style of format, the presentation of the text (the original page it was scanned from) is confused with the data structure, and the critical data structure information is presented in the form of an annotation attached to a particular line (rather than enclosing all the lines which belong to chapter 1). This style of document is incredibly difficult to use with standard tools like xpath. This is highlighted if we go down just a little further into the text:

tantum laudem aucupantes tam indignis rebus curam impen­ <lb></lb>
derent, non tam inlustrasse mihi ipsa ingenia quam damnasse<note type="chapter"> 3 </note> <lb n="10”></lb>
uideantur. nos autem, qui rerum magis quam uerborum ama­ <lb></lb>

Where does chapter 3 start? Clearly not half-way through impenderent and most likely not at the word break in damnasse uideantur. Is it at the comma after impenderent? At the full stop after uideantur? A human, familiar with the original text, might be able to decide: a simple algorithm inside a computer program, probably not.

I bring this up – I know it may seem churlish, after all any open XML version of an ancient text has to be a good thing – because I feel that in the “official” digital classics circles there is a certain enthusiasm for recoding existing XML texts to the Epidoc format, but if this is the result, it’s a definite step backwards. Forgive me if I am wrong and this is merely the first step in getting from the presentation layer (the scan of the book) to the data layer (a properly structured XML version of the text). But I certainly hope this style of markup isn’t regarded as the standard way to proceed into the future.

Update: an interesting set of notes by @paolomonella on Epidoc and the difference between “documents” and “texts” is found here and there is a twitter conversation here.

de commentariis – recent updates


, ,

So far got the following features in place

  • Automatic permissions on verified email accounts. Google+ social logins automatically verified.
  • Edit your own existing commentary items by just clicking on them
  • “Read more” link on other long commentary items, see here for example (requires login)
  • Many styling and layout improvements.

Coming up soon: features to support instructors in the classical languages to create, manage and allow their student cohorts to engage with their ancient texts via the commentaries they create. List of features planned.

de commentariis: crowd-sourced commentary for ancient texts


, , , , , , ,

I’d just like to alert people to a new digital classics resource I’ve been working on during the evenings and weekends these past three weeks or so.


The tool is about the creation of “crowd sourced” / “social” commentaries on ancient texts. I hate both of those terms in scare quotes — I don’t like buzzwords like that — but I can’t think of better term. Being literal-minded with the domain name, “a network on commentaries”. What’s not to like? Click the link above and find out!



De Commentariis uses data from the Perseus project’s online open-source data repository. Because of this the number of texts – especially Greek ones – is severely limited at the moment but I hope to get more as the texts improve in the Perseus repository and I overcome my own technical limitations effectively extracting the data. I’ve got a some “suggested texts” linked from the home page, but you can list and view all the available texts (some that say they are available aren’t in a great state though, so please be aware of that limitation! I’m a Livy scholar and there’s next to no Livy in it!).


DeCommentariis.Net example commentary.

In order to see the texts you must register an account.[fn1] You can sign up with either Google, Twitter, or Facebook credentials (OAuth); or just register a simple account on the site and fill in the fields and put a good password in place. Once you do any of the former steps you’ll be sent an automatic email address verification email. In this email there is a link. Click the link and you should be able to use the site.

After you register, send me an email, or just reply to the verification email and I will add the “make commentary” permission to your account (on my to-do list: automate those permissions). Until I do that you can’t enter any commentary items. (Update: if you verify your email address – google+ social logins are automatically verified – permissions are added automatically).

The site is running on pretty limited resources at the moment so be a bit forgiving if it gets slow under all your eagerness to log on and check it out.

I would love to hear your feedback.

[fn1]: Your browser will warn you about “insecure” security certificate. I have to use a “self-signed” certificate for the moment, because at this point I’m not about to pay $200+ per year bribe to a security signing authority for a signed security certificate. The alternative is let you send your password unencrypted to my server and that’s just silly. Therefore, there’ a self-signed certificate. Update: proper security cert installed.

The Ship of State? A question.


, , , , ,

Who, exactly, was Livy borrowing from when he wrote 24.8.12-13. He would have surely already had this concept in his mind. The source of the metaphor is supposedly Plato’s Republic IV, but I wonder if Livy would have read that? If he’s reading Polybius I suppose he may have been taught Greek philosophy at some stage. He probably would have read Horace Carm 1.14 O navis referent in mare te novi, traditionally titled as “To the Ship of State” but the ship in question could well have been a woman or Horace’s own life, although Quintillian Inst 8.6.44 seems entirely sure the ship in the poem, and its struggles to reach the safe harbour of pace atque concordia (Quintillian’s words), are an allegory for the state. Although I’ll note that Fraenkel, E, Horace 1957 Oxford:Clarendon Press p. 155 fn 4, appears to take umbrage at the notion that Horace’s ode was about Horace’s own life, and usually I prefer to believe Fraenkel’s lucid and learned interpretations of Horace.

Meanwhile, in book 24, Livy has Q. Fabius Maximus use this direct nautical metaphor in a speech about who should be elected the consuls for the next year (214):

quilibet nautarum vectorumque tranquillo mari gubernare potest; ubi saeva orta tempestas est ac turbato maria rapitur vento navis, tum viro et gubernatore opus est. non tranquillo navigamus, sed iam aliquot procellis submersi paene sumus; itaque quis ad gubernacula sedeat summa cura providendum ac praecavendum vobis est. (Livy 24.8.12-13)

While the sea is tranquil anyone of the sailors or passengers is able to helm (the ship); when a savage storm has arisen and the ship is ravaged by winds on a turbulent sea, then the job is for a (real) man and a (proper) pilot. We do not sail in tranquil weather, but recently we have been nearly sunk by several hurricanes; and thus (the question of) who would sit at the helm you ought to guard and give attention to with the highest of care.

The “several hurricanes” being events such as the battles of Trasimene and of course, Cannae.

I am finding it off that something of Cicero’s doesn’t seem to pop up here initially: we go straight from Plato to the poets. I should expect that there’s a rich literature on this metaphor that has somehow bypassed my research so far. Time to correct that.

Pac Rim 28 | 28th Pacific Rim Roman Literature Seminar



Pac Rim 28 | 28th Pacific Rim Roman Literature Seminar.

I can personally vouch for the Pac Rim Roman Lit Seminar; it was a great event last year and everyone I speak to who has been usually recommends it too. It’s a ‘seminar’ format, so single sessions in which everyone sits. You get a lot of really useful feedback on your paper too.

Dates: 6th July 2014 to 9th July 2014 (n.b. Sunday 6th July is the opening night reception and papers will begin on Monday 7th July)

Location: La Trobe University City Campus, 360 Collins St Melbourne Victoria 3000

Ancient and modern scholars alike have described, represented, deciphered and constructed Rome in a multiplicity of ways. Both now and in the past, writers have attempted to make sense of Rome’s identity/identities as an urban landscape, as a political entity, as a producer and consumer of culture, as an idea and as an empire. Rome is cast in a myriad of ways in literary texts: an ideal society, a fallen state, a reinvigorated civilisation, a mirror or an historical parallel, and scholars increasingly recognise that even Roman texts which nominally set their action in entirely different time periods and geographical locations or in the realms of mythology cannot escape dealing with and therefore theorising Rome itself. As a concept ‘Rome’ is flexible and mutable, and in the hands of skilled writers the boundaries of this concept might be reinforced, questioned or challenged.

This conference invites papers that examine the different ways that the idea of Rome has been, and still is, theorised in literary texts. This theme may be interpreted widely to include papers on how Rome is theorised as a literary artefact in scholarship and/or in literature. Papers on the wide range of areas which intersect with Latin literary study are invited; these include (but are not limited to) literary theory, philosophy, politics, geography and reception studies.

Papers on this theme of either 20 or 45 minutes duration are invited. 20-minute papers will be delivered in sessions of 30 minutes each and 45-minute papers in sessions of 60 minutes, to give adequate time for discussion. Depending upon the response to this call it may be necessary to limit the number of 45-minute papers to ensure that the conference does not go over time.

See the website link above for more information on submitting a paper.

Mendeley versus Papers: research software smackdown


, , , , , ,

Regular readers of my blog may already know of the struggles I’ve had with Papers, the bibliographic database and research tool. That last link goes to what is by far (a huge margin) my single most popular blog page. That is because the Wikipedia entry for Papers links to it. But if you need verification of my negative appraisal of Papers in those posts, or in this one, just have a look at the comments. Anyway, I took the decision a couple of weeks ago to stop trying to make Papers work for me at all and try another tool in its place. Ditching it and moving over to Mendeley was relatively straightforward for me.

Mendeley web application.

The Mendeley web application interface

Mendeley comes in three major components.

The first part is the web application, where you sign up for an account. The account is free, and you get up to 2 gigabytes of storage space for your research database. If you need more you can purchase a plan to get more space. I’ve got several hundred papers in my database and it uses 600 megabytes, less than half the allocated free space. Technically, all you actually need is the web application. The Mendeley web app also has this social networking aspect, but I think these features are actually rubbish in a general sense.

The continual focus by Papers on extending its ‘social networking’, rather than fixing the serious data reliability issues and extending its core research and citation features was one of the reasons I decided to cut it loose in the end. On this account I don’t care for similar features in Mendeley. If I want to get a social network of academic research interests, there’s always academia.edu. That is, besides regular networking at conferences, and participating on relevant mailing lists. Mendeley uses the social network to retrieve articles off those which other people upload into their databases. Yet the usefulness of this feature is going to depend how many Mendeley users there are in your research area.

The second part of Mendeley is “Mendeley Desktop”, a free download from the site. It’s your local application that you run as a native app on your Mac or PC. It downloads your research database from your online account. It uploads any new papers that you add to the desktop app to the database in the web application. There are Desktop versions for Windows (XP and later) and Linux as well as Mac OSX. The Desktop app can also import your Papers library into Mendeley if you are converting. It directly imports your Papers2 database. I am not sure if it can import a Papers3 database. Papers3 does its darnedest to hide the database from you: you may have to export your Papers3 database to a .bib file and allow Mendeley to import that. Mendeley Desktop also has some neat features and some drawbacks compared to Papers (see below).

The third part of Mendeley is an iOS app. The Mendeley iOS app is free, unlike the Papers iOS app. For Papers, you have to buy the iOS app as a separate item to the Mac or Windows app. The Mendeley iOS app, just like the Desktop program and basic levels of web storage, is completely free. Like the Desktop app, the Mendeley iOS app syncs to the central web-based data repository. Papers2 tries to cross-sync its iOS app to the desktop via your wifi network, an inferior solution. The Papers3 iOS app syncs to your desktop via Dropbox, or iCloud. The Mendeley iOS app lets you carry around your research database on your iPhone or iPad or both. This is great for reading research articles on the train or bus, during lunch, or just sitting around under a shady tree in beautiful Queensland weather (did I mention I go to the University with the most beautiful campus in all Australia?). The Papers iOS apps have this functionality too of course, but at a cost. There is also the matter of the two different iOS apps depending whether you use Papers2 or Papers3.

There are some nice features that you gain from switching from Papers over to Mendeley:

  1. Mendeley automatically syncs its database to a nominated .bib file for BibTeX or BibLaTeX so you can always have one up to date with your research data. This is important for people like me who use plain-text tools like Pandoc and LaTeX to create and edit their articles. Having to remember when I last performed a manual export of the .bib file from Papers was a pain in the neck.

  2. Mendeley generates citation keys in much nicer format. The default is a straight author-date format (Mcphee2014). This way you don’t have to remember those awful random appendices that Papers tacked onto the end of its cite keys. And Mendeley doesn’t generate the colon between the author and the year (Mcphee:2014zkwel). To convert from the Papers format to the one used by Mendeley, I had to do a bulk ‘regular expressions’ search and replace on documents. I had already created. But that didn’t take long (because I use simple marked-up plain text as my main document format). Now it’s much nicer to insert references into my documents, as it’s easy to recall the citation key.

  3. It’s free if you have less than 2GB of PDFs (I mentioned this already but it bears repeating).

  4. I feel that Mendeley’s duplicate paper detection and merge is superior to Paper’s. But, Papers has an author merge and journal merge feature that Mendeley doesn’t. This is pretty neat when you get several variants of Author or Journal names, and Mendeley doesn’t have this feature. Instead you have to edit the offending documents one by one so all the relevant authors and journals match. This is not as nice as Papers’ superior method of dealing with duplicate authors and journals.

  5. I far prefer the central-server sync scheme used by Mendeley to the Dropbox or iCloud style database file sharing, or inter-device wi-fi sync that Papers uses. The Papers developers clearly have struggled with these latter mechanisms (and cross-sync can be hellish to do successfully at the best of times). Furthermore, Mendeley Desktop’s local configuration and data store is sqlite, a standard lightweight application storage database. This means that standard tools exist which allow a geek like me to hack into my local Mendeley database if needs be. I have found this feature useful to clean up the horrible citation keys that Mendeley imported from my Papers database. But if this last point sounds like gobbledegook to you, just remember that Mendeley’s storage of your precious research data is more reliable than Papers.

What you do lose when you switch from Papers to Mendeley is the internal search hook into the online article databases (e.g. JSTOR, Web of Science, Pub Med, ArXiv, etc). With Mendeley, you have to go to each database that you use one at a time and use their various web search facilities. Then you have to import each result into Mendeley with the supplied browser bookmarklet. This is an ugly throwback to go about searching for people used to Papers’ integrated search. Papers itself can search research databases and import the selected results directly. Mendeley does not have this feature. Yet the Mendeley website lists “Search across external databases” in the feature comparison matrix as “Almost there!” With luck, this is an important feature that Mendeley won’t lack for too long.

Mendeley can auto-import PDFs that you save into a configurable directory. When I last checked this feature out a few years ago, it didn’t read JSTOR metadata in the PDFs in a correct manner. You had to do a tedious clean up of the resultant data by hand in the Desktop app. If this applied to you, it negates the feature and creates dispiriting extra manual work. Later version may have fixed this defect, but I have not yet tried it with the current version yet.

Mendeley does have a search tool for searching papers that other users have imported into Mendeley. This is helpful if you are in a field that has a lot of Mendeley users. But if you are not not, then you won’t find many results. I tried searching for something obvious in my field and got only two pages of results. Most of which I already had in my library. Any one of the relevant online databases would have given back hundreds of results. So you need a large pool of researchers in your field for this to be a great feature.

There are also some other minor drawbacks to using Mendeley.

  1. In the desktop application, online, and in the iOS app the columns you can view and sort in your research database is limited. They are not at all flexible or in anyway configurable. For example, you can’t view and sort by citation key. You can filter by publication or by Author name using a side-bar on the left. You can search you own collection though and that’s pretty flexible.

  2. The reference manager, which inserts citations into documents and builds your bibliography automatically, is only available for Word. Also the documentation implies that Open Office and LaTeX options are available also. Although Word was the only option on the menu that I saw. I don’t use Word or Open Office for my research publications (and you should not either, word processors suck!). I use Pandoc so I guess I’m plumb out of luck. You can insert citations in several different portable flavours with Papers’ Citation.app. These include Pandoc and Multimarkdown, as well as Papers’ own format. Papers has more options for citations — if only I could have gotten it to be reliable. Both products use the CSL format for citation formatting (as does, for example, the citeproc tool which Pandoc relies on). Mendeley needs to add Pandoc and Multimarkdown citation insertion support. But, Mendeley’s sensible citation key generation, combined with Pandoc’s simple reference style, makes manual insertion of citations pretty easy: @Mcphee2014 p. 1.

In Mendeley, author names that have apostrophes in them, such as “O’Dwyer”, generate invalid citation keys in the .bib file (e.g. “O’Dwyer1999″). You have to perform a manual edit of the citation key in the desktop app to fix it to something valid, e.g. “ODwyer1998″. This is a known bug in Mendeley, let’s hope they fix it soon.

However, putting up with those drawbacks beats losing research data to database corruptions! If Papers didn’t have such a large range of very fatal data reliability bugs it would have many more interesting features than Mendeley. Trust in your research database’s reliability has to be absolute for any researcher. The Papers team have left their users in the lurch on this score. Promising to fix it in future updates just doesn’t cut it. Such fatal bugs should never be in a public release. And once detected post-release, an emergency path should be available within hours. It shows fundamental misunderstanding of software engineering principles. Mendeley, is not as flashy or as feature-rich as Papers, and lacks many advanced features, but gets the basics right. Also, the Papers developers shut down public threads on their support site, to keep negative comments being visible. This is a terrible, non-open way to approach support issues! Without a public forum, users can’t solve each other’s problems. They have to rely on official support channels only, which can take weeks to answer the simplest of queries. Or they use unreliable unofficial channels. Without public forums, critical bug reports, generated by their hasty release of a poor-quality beta version of Papers3 (that they had the gall to charge money for), overwhelmed the support staff. The desire to control what their users were saying about their product resulted in a major loss of reputation.

I will sum up with an analogy. Mendeley is like a basic model car that is unremarkable in features and gizmos and only only comes in one color. But it gets you from A to B with pretty good fuel economy and in a reliable fashion too (imagine a 1980s Japanese sedan). In contrast, Papers is like a nice-looking car with tons of nice styling and loads of gizmos and advanced features as standard. But every third morning it won’t start without a complete oil change and full service. Once a year it tends to dump its gearbox on the freeway while you are in the middle of driving it (imagine a 1970s Fiat, Rover or Leyland). Thus, while I’d love to have a beautiful, stylish car, with all the bells and whistles, the tow truck and mechanic’s bills (and the time wasted) is killing me. And preventing me from getting to work on time, and sometimes not at all, so … no. Mendeley it is.

Athenian Democracy and War


, , , , , , ,

Dr. David Pritchard, Senior Lecturer in Greek History in my department at the University of Queensland, talks to Dr Anastasia Bakogianni of Open University about his research dealing with the Athenian Democracy and its relationship with Athenian war-making, for Classics Confidential’s very worthwhile video series. Visit Classics Confidential – The Dark Side of Democracy, with David Pritchard