inlustre monumentum est

~ An Antipodean View on Classical Greece, Rome & the Mediterranean.

inlustre monumentum est

Category Archives: Greek Classics

How to retrieve ancient text data from Perseus

10 Wednesday Apr 2013

Posted by scot mcphee in Digital Classics, Greek Classics, Latin Classics, Software & Tools

≈ 1 Comment

Tags

data, digital humanities, digital resources, Perseus, software systems, x=x

In my last post I was describing problems with the URL schema not being entirely predictable, and therefore computable from body of text to body of text (e.g. from Livy to Caesar). That is the way the URLs are formed, what constitutes a ‘body of text’, and what you might expect to see returned in a request and how that varies with each textual work.

Update: Schema will now include a ‘urn’ attribute

Warning: this is a long and somewhat technical post about using the Perseus CTS API to fetch classical texts as XML data

This stuff is important for software developers and “digital classicists” (that is, classicists who work with computer-information systems for analysing information about the classical world).

On the Digital Classics mailing list, some helpful hints managed to emerge to my queries. The first is, the Perseus XML interface I was using (it’s the one that’s behind the helpful “XML” button at the bottom of each passage in the HTML version that you typically use with your web browser) is probably on its last legs.

CTS Overview

The more up-to-date (but still in beta) version is Perseus CTS; where “CTS” stands for Canonical Text Services. CTS is built on work done by the Homer Multitext Project.

CTS appears to have three main functional components:

  • A catalogue service (actually called “getCapabilities”)
  • A reference validation and exploration service
  • A service that retrieves text

Some commentary on its limitations

What it is missing, is a search service. The catalogue is huge. It has listed in it every available Greek and Roman text in the Perseus database and includes details of all editions and translation of each text. It’s available here http://www.perseus.tufts.edu/hopper/CTS?request=GetCapabilities and I’m not actually linking that URL because don’t click on it just yet. It’s 2.1 MB of XML. Your browser may not like especially like it. Mine only manages to load it properly half the time.

When you do manage to download it and save it on your local disk (highly recommended), you’ll see it’s a pretty comprehensive catalogue of the data. Unordered. With no links to the texts in either the reference validation or text retrieval services, and nothing obvious as a field that gives you the unique identifier needed.

What the references are constructed from

The reference validation service assumes you know the reference you want to validate (and discover the sub-components of). But you need that first-level peek into the initial reference. Perseus uses Thesaurus Linguae Graecae referencing system for Greek texts, and the Packard Humanities Institute PHI Latin Texts system for Latin texts. These both principally organise their respective corpora around authors, assigning each their own index number. Thus, Homer is ‘tlg0012′ and Livy ‘phi0914′.

The references are formatted into a type of reference called a URN.

How to create the references

Now I’m going to tell you how to construct a functional reference ID for the CTS system.

First thing, load the catalogue URL into your browser. I’m not going to link it but cut and paste this one into your browser: http://www.perseus.tufts.edu/hopper/CTS?request=GetCapabilities – if you know how to use Wget or Curl use that instead.

Save the file to a convenient location on your disk. I called mine “CTS.xml”.

Open the file in a text editor. Notepad won’t cut it. Word most certainly will not (it’s not even a text editor!). One the Mac, I recommend BBEdit. [Update: it's been pointed out on the mailing list that Oxygen XML editor is an ideal tool. I use this tool at work and have it on my Mac at home. An Academic licence is $99, a full one nearly $500. Unless you do extensive work in XML I would not recommend to buy it. Probably on Windows by default Internet Explorer is the default program for an XML file. It, or Safari on the Mac, will suffice to read the document. Google's Chrome also works pretty well. Browsers will also "pretty print" the XML to make it easier to view.]

Use your editor’s search capability to find the author you want.

The ‘textgroup’ (normally the author) identifies the first level

You’ll find that the author’s work is contained in an XML element called “textgroup”. Here’s the text group for Livy, along with the groupname element identifying it:

<textgroup projid="latinLit:phi0914">
  <groupname xml:lang="en">Titus Livius (Livy)</groupname>
  ... (thousands of lines omitted)
</textgroup>

Pay careful attention to the ‘projid’ attribute of the textgroup. This helps form the root of the URN used to identify the text in Perseus. The URN always starts with ‘urn:cts:’. Add the projid to that, like this:

urn:cts:latinLit:phi0914

Check it in the reference validation service

That’s all texts/editions/translations by/of Livy in the Perseus database. Here’s a link to the reference validation service: http://www.perseus.tufts.edu/hopper/CTS?request=GetValidReff&urn=urn:cts:latinLit:phi0914. If you open that link, you’ll see, in XML, a list of all the available URNs for every version and edition and translation of Livy in the database. But unfortunately, no descriptive information what each version edition or translation is!

We still need the catalogue file. Go back to the catalogue file.

The ‘work’ identifies the next level of reference

Search for a book. In my case, let’s look for “Book 1″ of Livy. You’ll see the catalogue file is unordered. The version I looked at, Livy books started at Book 11 (what? The one of the missing books is miraculously in the Perseus database I hear you say? Unfortunately, it’s just the periocha of book 11). The unordered nature of the database make it especially annoying: you have to search, and you can’t browse.

Anyway the entry for Book 1 looks something like this:

<textgroup projid="latinLit:phi0914">
  <groupname xml:lang="en">Titus Livius (Livy)</groupname>
  <!-- ... (thousands of lines omitted) -->
  <work projid="latinLit:phi0011" xml:lang="lat">
    <title xmlns="http://purl.org/dc/elements/1.1/" xml:lang="en">
    The History of Rome, Book 1</title>
 <!-- ... (thousands of lines omitted) -->
</work></textgroup>

See how the Book is contained in an XML element called “work”? Note the “projid” element of the work. In this case, we don’t need the “latinLit:” part, the interesting part of the id is the “phi0011″: that’s the ID for Book 1 of Livy. We add it to the URN we’ve been constructing as follows:

urn:cts:latinLit:phi0914.phi0011

The ‘edition’ and/or ‘translation’ identifies a specific version of the work

While that’s supposed to be valid reference to Livy’s book 1, Perseus contains at least two Latin editions of the text and three English translations. These are listed inside the “work” element in either “edition” or “translation” elements, like so (for brevity I have omitted some lines that give data about the citation system of the edition):

<work projid="latinLit:phi0011" xml:lang="lat">
  <title xmlns="http://purl.org/dc/elements/1.1/" xml:lang="en">
   The History of Rome, Book 1</title>
  <edition projid="latinLit:perseus-lat1">
    <label xml:lang="en">The History of Rome, Book 1</label>
    <description xmlns="" xml:lang="en">Titi Livi ab urbe condita libri 
     editionem priman curavit Guilelmus Weissenborn editio altera auam
     curavit Mauritius Mueller Pars I. Libri I-X. Editio Stereotypica.
     Titus Livius. W. Weissenborn. H. J. M&amp;#252;ller. Leipzig. 
     Teubner. 1898. 1.
    </description>
    <!-- some lines omitted -->         
  </edition>
  <translation projid="latinLit:perseus-eng1">
    <label xml:lang="en">The History of Rome, Book 1</label>
    <description xmlns="" xml:lang="en">Livy. Books I and II With An
     English Translation. Cambridge. Cambridge, Mass., Harvard 
     University Press; London, William Heinemann, Ltd. 1919.
    </description>
    <!-- some lines omitted -->         
  </translation>
  <edition projid="latinLit:perseus-lat2">
    <label xml:lang="en">The History of Rome, Book 1</label>
    <description xmlns="" xml:lang="en">Livy. Books I and II With An
     English Translation. Cambridge. Cambridge, Mass., Harvard 
     University Press; London, William Heinemann, Ltd. 1919.
    </description>
    <!-- some lines omitted -->         
  </edition>
  <edition projid="latinLit:perseus-lat3">
    <label xml:lang="en">The History of Rome, Book 1</label>
    <description xmlns="" xml:lang="en">Livy. Ab urbe condita. Robert
     Seymour Conway. Charles Flamstead Walters. Oxford. Oxford 
     University Press. 1914. 1.</description>
    <!-- some lines omitted -->         
    <memberof collection="Perseus:collection:Greco-Roman"></memberof>
  </edition>
  <translation projid="latinLit:perseus-eng2">
    <label xml:lang="en">The History of Rome, Book 1</label>
    <description xmlns="" xml:lang="en">Livy. History of Rome by Titus
     Livius, the first eight Books. literally translated, with notes 
     and illustrations, by. D. Spillan. York Street, Covent Garden,
     London. Henry G. Bohn. John Child and son, printers. 1857. 1.
    </description>
    <!-- some lines omitted -->         
  </translation>
  <translation projid="latinLit:perseus-eng3">
    <label xml:lang="en">The History of Rome, Book 1</label>
    <description xmlns="" xml:lang="en">Perseus:bib:oclc,2311635, Livy.
     History of Rome. English. Translation by. Rev. Canon Roberts. New
     York, New York. E. P. Dutton and Co. 1912. 1. Livy. History of 
     Rome. English Translation. Rev. Canon Roberts. New York, New York.
     E.P. Dutton and Co. 1912. 2.</description>
    <!-- some lines omitted -->         
  </translation>
</work>

Now, assuming we’re after the Teubner edition of the text (the first one), we can use that edition’s ‘projId’ attribute as before, and stripping the ‘latinLit’ from it and adding it to the URN we’ve been building up, we get:

urn:cts:latinLit:phi0914.phi0011.perseus-lat1

This is the complete reference to the Weissenborn & Mueller edition of Livy’s Book 1 published by Teubner.

Check it in the reference service

We can hit up the reference validation service with that URN as follows: http://www.perseus.tufts.edu/hopper/CTS?request=GetValidReff&urn=urn:cts:latinLit:phi0914.phi0011.perseus-lat1 – you will see a complete collection of URNs for the distinct parts of Book 1 in the Teubner edition of the text.

URNs for specific passages

This URN is all of the preface that’s found at the start of Book 1:

urn:cts:latinLit:phi0914.phi0011.perseus-lat1:pr

This URN is all of Chapter 1 of Book 1 (not including the preface):

urn:cts:latinLit:phi0914.phi0011.perseus-lat1:1

You can also get parts of chapters, here is 1.4.2:

urn:cts:latinLit:phi0914.phi0011.perseus-lat1:4.2

Fetch the text chunk you want

These arguments are passed to the ‘urn’ parameter of text retrieval service of Perseus like this: http://www.perseus.tufts.edu/hopper/CTS?request=GetPassage&urn=urn:cts:latinLit:phi0914.phi0011.perseus-lat1:pr (that’s the preface).

Anatomy of the URN format used by Perseus

    urn:cts:latinLit:phi0914.phi0011.perseus-lat1:4.2
    {1}:{2}:   {3}  :   {4} . {5}   .  {6}       :{7}
  • {1} It’s a urn. This part is fixed.
  • {2} The urn is part of the ‘cts’ namespace. This part is fixed.
  • {3} The Latin Literature namespace. Would be ‘greekLit’ for Greek texts, and possibly other values.
  • {4} The textgroup’s identifier. It’s normally either the TLG or PHI author index value. In the catalogue it’s contained in the ‘projid’ attribute of the ‘textgroup’ element, stripped of the namespace.
  • {5} The work’s identifier. This may map to an author’s title or to an individual book in a larger collection of texts. This also apparently comes from either TLG or PHI indices (I’ve not verified this fact for sure). In the catalogue it’s contained in the ‘projid’ attribute of the ‘work’ element, stripped of the namespace.
  • {6} The edition of the work. This may also be a translation. This is a Perseus-specific value. In the catalogue it’s contained in the ‘projid’ attribute of the ‘edition’ or ‘translation’ element, stripped of the namespace.
  • {7} The text reference. This will be specific to the work and edition you are referencing. You can find out a simple unadorned list of what’s available by querying the reference validation service with the URN up to this point at the argument.

Note how the textgroup, work and edition use dots for separators but otherwise the data element delimiter is a colon.

Commentary

There are still problems:

  • You cannot get all of book 1 in a single hit (at least for Livy).
  • If you want book 2, you have to repeat this process (it’s phi0012)
    • So, Chapter 1 of book 2 of the Teubner text looks like this URN: urn:cts:latinLit:phi0914.phi0012.perseus-lat1:1
    • Repeat and rinse for other books/editions
  • Entirely different authors and works may have different results or slightly different algorithms for building URNs.
  • The catalogue elements ‘textgroup’, ‘work’, ‘edition’ and ‘translation’ should each have a child element, ‘urn’, that builds this URN for you, so that such explanations as I’ve attempted are unnecessary.
  • The reference checking service needs to include a modicum of descriptive information about the URNs that are returned.
  • There needs to be a search service that stitches all this together.

I hope someone can find this of use.

Digital Classics and the data of ineffable mystery

31 Wednesday Oct 2012

Posted by scot mcphee in Digital Classics, English Literature, Greek Classics, Latin Classics, Literature, Science & Tech, Software & Tools

≈ 1 Comment

Tags

data, digital humanities, literature

I found that this article, by Stephen Marche titled Literature is not Data: Against Digital Humanities, in the Los Angeles Review of Books, was very thought provoking as far as polemic goes. Of course literature isn’t just mere “data”; but I also think that data about literature can still give you insight into it. One of the comments, by “mad scientist”, sums up the biggest problem with this critique when it says:

… simply to insist — again — on the ineffable mystery of literature isn’t particularly interesting.

Literature, like all art, does have an element of “ineffable mystery” but that’s not the only thing it has.

Anyway the entire polemic seems to me to be misplaced. It might be a new feeling for academics of English literature to be relying on databases and software tools but I suspect most modern Classicists simply couldn’t live without their Perseus or Brepolis access. Perhaps because Classicists are also nearly always Classical Historians and many of us have a close relationship with Archaeology and Archaeologists. Many of us Are Archeologists first and foremost (I’m not, however). Those of us trained in the Internet Age are completely normalised to the idea of databases and digital resources. Many of us have pocket Latin and Greek dictionaries in the form of smartphone applications.

But I think, in the Classical field, it goes to something deeper. Our field has always had an element of this: lonely scholars slaving over commentaries, compiling dictionaries or creating concordances. I certainly do not envy those who came before us and built up databases of texts with an index for every unique word stem used in it! That, to me sounds like such an amazingly stultifying job description, I’m glad I live in an age when all that prior hard word can be digitised and automated and made available for my daily use at the touch of a button!

But there’s also a great insight that I think is yet to be fully realised. For example, the creation and classifying of stemma codicum, so important to us in understanding how the literature has been transmitted to us through the ages, I think may be an area that will benefit from future computational insights. Another could be understanding the relationship of texts and authors; and the identification of insertions and errata another. These are things which were once done by hand, now the use of computers can speed them up and let scholars do the important work of humanist analysis and understanding rather than the mere donkey-work of collating word-frequency tables and transmission of stylistic markers in different works. Where the understanding of texts intersects with the understanding of history, the use of computational analysis, like that of definitive archaeological data before that, will also help us to sharpen our focus and broaden our horizons.

I for one welcome our new computer overlords.

A new Classics blog: futurusessay

14 Sunday Oct 2012

Posted by scot mcphee in Academia, Classics, Digital Classics, Greek Classics, Greek History, Latin Classics, Personal, Roman History

≈ Comments Off

Tags

blog, futurusessay, uq

A PhD colleague and friend in the Classics department at UQ, has started a Classics blog called futurusessay – a nice play on the Latin for ‘about to be’. he blogs under the moniker ‘Futurus’. Go there and read it!

Getty Villa (review)

29 Saturday Sep 2012

Posted by scot mcphee in Archaeology, Art & Art History, Classical History, Classics, Greek Classics, Greek History, Latin Classics, Roman History

≈ Comments Off

Tags

antiquities, getty villa, los angeles, museums, review

I reviewed the Getty Villa on Yelp. Although I have given it 4 out of 5 stars it I have two critiques of its collection from a professional standpoint, namely:

I think the Villa itself could be put to better use than as a merely beautiful container for the objects. The villa, being a replica Roman villa, could be better used to explained Roman social customs. The first thing to point out is the owner of the original villa was the Roman equivalent of J. Paul Getty: a very rich man. The structure of the Roman familia could be discussed; the roles of the paterfamilias, his wife and children, and the household slaves. It could go into the daily routine of the Roman household, etc. It could also be used to explain how Greek models of cultured life penetrated Roman life, for example, in the form of the peristyle garden. It also could at least have one interior room with the actual interior decoration of a Roman villa; rather than the heavily Georgian-period block colour models that it follows.

Last, I am not sure of the layout of the collection. Museum studies isn’t my area of expertise, on reflection I am sure that the thematic grouping of the objects could be improved. For example, in amongst the portraits (divided into men and women) there are a jumble of portrait heads and funerary monuments, Greek and Roman, with no explanation of the difference between burial practices and their evolution over time, and the social role of the portrait busts and monumental statues. I also had minor issues with some inscription translations put onto the cards.

Does anyone think these are unfair criticisms?

Amphorae VI (2012)

14 Saturday Jul 2012

Posted by scot mcphee in Academia, Archaeology, Egyptian History, Greek Classics, Greek History, Latin Classics, Reception, Roman History

≈ Comments Off

Tags

conference, postgraduate

I just got back from Amphorae VI which this year was held at Auckland University, three days of excellent postgraduate papers. Big kudos to organisers Lawrence Xu and Nicola Wright and their team of volunteers! As well as hearing some excellent presentations I got good feedback from several people on my own paper Treachery Worse Than Punic: Livy’s Landscape and Hannibal’s Invasion of Italy, which I will use to hopefully improve it further. Also met and hung out with friends new and old, its great to discuss research in informal settings like this. Its maintained a consistently good quality of papers for six years now! Next year Amphorae VII will be at Sydney University.

Music from the Iliad

23 Wednesday May 2012

Posted by scot mcphee in Digital Classics, Greek Classics

≈ Comments Off

Tags

Iliad, music, x=x

No, I don’t mean music that’s in the Iliad, I mean music that’s generated from the text of the Iliad.

Henry Francis Lynam writes to the Digital Classicist mailing list:

Hi folks,

As the long days of summer approach, I thought a bit of musical entertainment might be in order. I’ve mapped the accents in the Iliad to a 12-note scale to produce some digital music. This uses a Python script to parse the Perseus Ancient Greek files in beta code and extract the accents. It uses EasyABC to produce the score. You can listen to the results at:

https://www.dropbox.com/s/h759nyjc6fa4o0c/Iliad.mp3

The score is available at:

https://www.dropbox.com/s/36cen8d8m72p7m2/Iliad.pdf

Henry.

CFP: International PhD Student Conference Laetae segetes III

19 Thursday Apr 2012

Posted by scot mcphee in Greek Classics, Latin Classics, Medieval History

≈ Comments Off

Tags

call for papers, conference

-----Original Message-----
From: phil.muni.cz [mailto:radova@phil.muni.cz] 
Sent: Tuesday, 17 April 2012 11:11
To: xxxx
Subject: Call for papers: International PhD Student Conference Laetae segetes III

Dear Colleagues,

The Department of Classical Studies, Faculty of Arts, Masaryk University, Brno, Czech Republic would like to formally announce International PhD Student Conference Laetae segetes III, at which beginning researchers can present the fruits of their work. This event is a continuation of similar colloquiums held in 2005 and 2007 ; on these occasions, young scholars from Central European universities submitted their contributions, the majority of which were published in the conference proceedings – an online version of the proceedings is available on the website http://www.phil.muni.cz/wuks/home/publikace

CONFERENCE DATE AND PLACE: November 13–16, 2012, Brno, Czech Republic.

ABSTRACTS

Abstracts of papers to be presented in English, German, Italian, or French are invited for consideration by the Conference Academic Committee. Please submit your abstract (up to 200 words) in the attached submission form until August 31, 2012 via e-mail to the following address: radova@phil.muni.cz or marie.okacova@mail.muni.cz. Acceptance notification will be sent to you till September 13, 2012.

PRESENTATIONS

Individual 15–20-minute paper presentations will be followed by 5 minutes of discussion.2

PROGRAMME

Parallel sessions and panel discussions will be scheduled over four days; papers will be grouped by sessions (Ancient Greek and Latin literature; Classical languages; Latin Middle Ages and Byzantology; Neo-Latin and Modern Greek studies). The conference programme will be available on the website http://www.phil.muni.cz/wuks/

REGISTRATION

Standard registration fee is 45 EUR/1 100 CZK.

Payment should be made by bank transfer until October 13, 2012. Registration can be done via University Shopping Centre, where you get a confirmation of your registration: https://is.muni.cz/obchod/baleni/58520?lang=en

The participation fee includes: conference proceedings, reception meal (as will be specified in the conference programme) and refreshment during coffee breaks.

Participation fee does NOT include: hotel booking and payment, and excursion. The organizing committee will book rooms for the conference participants only at the University Hotel (Garni); single room: ca 33 EUR per night; double room: ca 40 EUR per night (two persons) – the stated prices are valid from 1 January, 2012.

PUBLICATION

All papers will be considered for publication in refereed conference proceedings that will be launched in 2013.

On behalf of the conference organizing committee, with kind regards, Irena Radová and Marie Okáčová Conference Coordinators

Department of Classical Studies
Faculty of Arts, Masaryk University
Arna Nováka 1
602 00 Brno
Czech Republic
Tel.: 00420 549 49 3850
Fax : 00420 549 49 37 41
website: http://www.phil.muni.cz/wuks/

Ancient and Modern Olympics blog

18 Wednesday Apr 2012

Posted by scot mcphee in Ancient Religion, Archaeology, Greek Classics, Greek History, Latin Classics, Roman History

≈ Comments Off

Tags

blog, olympics, sport

An interesting blog about the Ancient Olympics: Ancient and Modern Olympics, which uses evidence from inscriptions, wills, pottery and so forth to illustrate various aspects of the Olympics.

(Via the CLASSICISTS mailing list.)

CFP: Greek Myths on the Map (Bristol July 2013)

18 Wednesday Apr 2012

Posted by scot mcphee in Ancient Religion, Greek Classics, Latin Classics

≈ Comments Off

Tags

call for papers, conference, myth

Greek Myths on the Map
The Sixth Bristol Myth Conference
31st July – 2nd August, 2013

Greek myths were inextricably connected to the physical
environments in which they were set. This connection is
strikingly evident in the use of myths to explain and
communicate the significance of physical and human geography.
Polybius boldly asserts that “in the present day, now that all
places have become accessible by land or sea, it is no longer
appropriate to use poets and writers of myth as witnesses of the
unknown” (4.40.2). Yet mythology was never entirely banished:
myths were incorporated into geographical descriptions
throughout antiquity and across a broad spectrum of genres,
even as activities such as exploration, conquest and scientific
endeavour altered how the world was understood and perceived.
This conference will examine the various practical and
conceptual roles Greek mythology played in attempts to
describe, represent and explain the physical and human
geography of the ancient world.

We invite proposals for papers on topics related to this theme.
Questions that papers might address include: What motivates
writers to incorporate mythical narratives into geographical
descriptions? What can myths communicate about the
environment that purely geographical description cannot? Do
diverse and changing perceptions of the physical world affect the
ways in which stories about the mythological past are told? How
do mythical geographies relate to physical and conceptual
geographies? In what ways do political, religious or social forces
impact on the interplay between mythical and geographical
thought?

Please send abstracts (c. 250 words) for proposed 25-minute
papers to clasmyth-conference@bristol.ac.uk by Monday, 17th
September, 2012. Informal enquiries may be addressed to the
conference organizers, Jessica Priestley and Greta Hawes, at the same
address.

Formatting Poetry, v.2 | the CAMPVS

02 Monday Apr 2012

Posted by scot mcphee in Digital Classics, English Literature, Greek Classics, Latin Classics, Literature

≈ Comments Off

Tags

css, html, poetry, x=x

A great and simple way to markup poetry with simple CSS in basic HTML can be found at the CAMPVS blog. Formatting Poetry, v.2 | the CAMPVS.

← Older posts

Data

  • About
  • LatinOWL
  • Classics
  • Digital Classics
  • Ancient History
  • Post-Classical History
  • Software & Tools
  • Social Sciences
  • Personal

Meta

  • Log in
  • Entries RSS
  • Comments RSS
  • WordPress.org

Enter your email to receive notifications of new posts.

Proudly powered by WordPress Theme: Chateau by Ignacio Ricci.