Some of you may be aware I’ve started programming for the iOS system in recent weeks. The stuff I’ve been doing, a native iOS interface into the Perseus Online Latin Word Tool, has really been a warm-up (to build up my Objective-C and Cocoa Touch chops) for the thing that I really want to build. I won’t go too far into that because it would be too boring to explain it in detail. Let’s just call it, “The Livy Electronic Reader”. Think of it as an iPad app that allows you to build your own translation and commentary of Livy (or some subsection of it), and as an ancillary, publish the data out to a shared Dropbox directory (or, maybe iCloud, or possibly a shared publishing mechanism, perhaps something like, a “Livy wiki”). My plan, once I’ve done enough for Livy, is to perhaps extend it to other authors, e.g. Caesar and Tacitus. I picked Livy because that’s my research interest. I’m really building a tool for myself to use for my PhD.
However, if anyone can answer the following questions about Perseus, its data format and URL scheme, or know where I can find answers, I’d be much obliged.
The first question I have, is why does the XML interface behave inconsistently in the data it returns? Can it be made consistent? More importantly, can it be made predictable and therefore computable?
Here are some examples of what I mean.
First, something that behaves reasonably predictably. These first links are to Caesar’s de Bellico Gallico
- This link: http://www.perseus.tufts.edu/hopper/xmlchunk?doc=Perseus:text:1999.02.0002:book=1 returns all of book 1 (well I didn’t check all the way to the bottom of the document but it looks right to me).
-
However, if you try this link: http://www.perseus.tufts.edu/hopper/xmlchunk?doc=Perseus:text:1999.02.0002:book=1:chapter=1 you will see 1.1 of that same work.
-
Can you see the pattern developing here? Try this one: http://www.perseus.tufts.edu/hopper/xmlchunk?doc=Perseus:text:1999.02.0002:book=1:chapter=1:section=5 and it shows you 1.1.5 as you would predict.
Clearly, the “document name” is “1999.02.0002” and by adding arguments :book=n :chapter=n and :section=n you can select more or less of the content as you wish. Perfect! Give me a reference to Caesar and I can retrieve the text in an easily transformable XML format.
In contrast to to the former logical behaviour, consider these following links to Weissenborn and Muller’s 1898 edition of Livy’s text.
- The first link I tired, I expected to behave like the first one of Caesar’s above: http://www.perseus.tufts.edu/hopper/xmlchunk?doc=Perseus:text:1999.02.0169:book=1 … however it returns only 1.1.1 (^ actually I’ll get to exactly what the text is in a second).
-
This link: http://www.perseus.tufts.edu/hopper/xmlchunk?doc=Perseus:text:1999.02.0169:book=1:chapter=1 behaves as one expects and returns 1.1 (^)
However if you look at those two links, you’ll see the text is not the same text. That’s because what I really labelled above a “1.1.1” is really book 1 praefectus 1. You can access it directly with this URL:
- http://www.perseus.tufts.edu/hopper/xmlchunk?doc=Perseus:text:1999.02.0169:book=1:chapter=pr (the entire preface).
OK, so maybe the Livy text is thrown by the presence of the special “preface” chapter.
- Lets try book 2: http://www.perseus.tufts.edu/hopper/xmlchunk?doc=Perseus:text:1999.02.0169:book=2 … nope, that’s definitely only 2.1.1
-
http://www.perseus.tufts.edu/hopper/xmlchunk?doc=Perseus:text:1999.02.0169:book=2:chapter=1 … and this one is all of 2.1
-
We can also select a specific section: http://www.perseus.tufts.edu/hopper/xmlchunk?doc=Perseus:text:1999.02.0169:book=2:chapter=1:section=8 (2.1.8).
Additionally, if you want, say, book 22 chapter 1, you might predict that this could work:
But sadly, no. That’s not even a valid document. I guess the later books are in different editions, and thus to get to book 22, it’s an entirely different document URL.
Let’s try to get the whole book contents:
- http://www.perseus.tufts.edu/hopper/xmlchunk?doc=Perseus:text:1999.02.0170:book=22 … Ah, nope, it’s just 22.1.1 again.
Predictability is one of the greatest virtues for URI schema, and this seems to break it. The data inside the documents suggest it is broken into book, chapter, section, and the URL retrieval scheme suggests it can be retrieved as such, but there is different behaviour depending on the document content (Livy or Caesar).
So it looks like that in my app I’ve have to build a static tree of the document URLs, rather than being able to compute them on the fly, which is a much better way of doing things, usually.
Does anyone have any insight to this behaviour of Perseus? Suggestions? Comments?
Pingback: » How to retrieve ancient text data from Perseus (inlustre monumentum est)