BU ASL Corpus as ELAN

ASL Avatar at DePaul University

An ASL corpus from Boston University's The National Center for Sign Language and Gesture Resources (NCSLGR) corpus [Neidle and Vogler 2012] is now available in ELAN format. It is now possible to access the only publically available ASL corpus via the tools and interface of ELAN.

The names of new .eaf files are a combination of the original SignStream file names with one of the associated media file names appended to it. For example, the name for the .eaf file containing the converted data from the original SignStream file accident.xml that annotates the video file 1065_small_2.mov is accident_1065_small_2.eaf.

For information on installing ELAN, see the ELAN download page and an excellent online user's guide is available at www.mpi.nl/corpus/html/elan_ug/index.html.

Installation


  1. Download

  2. Unzip both files in a common folder. The directory structure should look like this:

              mycorpus/
                  elanBUcorpus/
                  video/

  3. If you have not done so already, download and install ELAN

  4. Open ELAN and select any file in the elanBUcorpus folder.

Status


In a conversion project of this magnitude, not all data can be converted completely. The following is the current state of the conversion process.

SignStream file Conversion status
accident.xml Conversion complete; compatible with ELAN

ali.xml Videos not properly synced and of two lengths;
muhammed_ali_1052_small_0.mov is 5:14.47;
muhammed_ali_1052_small_2.mov and
muhammed_ali_1052_small_3.mov are 2:37.23

biker.xml Conversion complete; compatible with ELAN
boston-la.xml Conversion complete; compatible with ELAN
close call.xml Conversion complete; compatible with ELAN
dorm prank.xml Conversion complete; compatible with ELAN
DSP Dead Dog Story.xml Conversion complete; compatible with ELAN
DSP Immigration Story.xml   Conversion complete; compatible with ELAN
DSP Intro to a Story.xml Conversion complete; compatible with ELAN
DSP Ski Trip Story.xml video compromised, image distorted/missing
football.xml Conversion complete; compatible with ELAN
lapd.xml Conversion complete; compatible with ELAN
ncslg10a.xml -
ncslg10t.xml
850 sentences, each in a separate .eaf file
Not all sentences have been tested in the
new format, but those that have are fully
compatible with ELAN.
roadtrip1.xml Conversion complete; compatible with ELAN
roadtrip2.xml Annotations are out of sync with beginning
of video by a little more than 1 second.
Second half of annotations is missing.
scarystory.xml Conversion complete; compatible with ELAN
siblings.xml Conversion complete; compatible with ELAN
speeding.xml Conversion complete; compatible with ELAN
three pigs.xml This is split over two eaf files --
three pigs_ben_story_443_3.eaf and
three pigs_ben_story_445_0.eaf
whitewater.xml Conversion complete; compatible with ELAN

Notes


  • The directory structure needs to follow the model exactly or you'll have to search for the videos.

  • Please direct questions or comments to Rosalee Wolfe wolfeNoSpam@cs.depaul.edu


Christian Vogler's SignStream XML parser made this project possible. For more information see

Vogler, Christian. (N.D.) SignStream-XMLParser. Retrieved August 3, 2013 from ASLLRP: http://www.bu.edu/asllrp/ncslgr-for-download/signstream-parser.zip

Also see

Carol Neidle and Christian Vogler [2012] "A New Web Interface to Facilitate Access to Corpora: Development of the ASLLRP Data Access Interface," Proceedings of the 5th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon, LREC 2012, Istanbul, Turkey.