Source of this document:
http://hdl.handle.net/11041/sldr000735/export-en.txt
Version française :
http://hdl.handle.net/11041/sldr000735/export-fr.txt
Dated 15 April 2015
Notes on the Grindmill songs databases
http://hdl.handle.net/11041/sldr000735
Thematic server : http://ccrss.org/database/
PROJECT STATUS
The Grindmill songs team (Asha Ogale, Rajani Khaladka, Jitendra Maid, Bernard Bel) is currently working on:
• Translating song texts to English and French
• Editing Marathi text and checking the consistency of spellings (transcriptions of an unwritten language)
• Checking the consistency of repeated words such as names of places, castes etc.
• Converting Devanagari text to Roman Devanagari
• Extracting sound files classified by identification numbers
I have a personal commitment with Hema Rairkar and Guy Poitevin to preserve and publish this monumental cultural resource they started collecting 30 years ago. Exporting files in Unicode format is the first step towards constructing a comprehensive interface for worldwide academic and general audience.
Bernard Bel
bernarbel(arobase)gmail.com
http://en.wikipedia.org/wiki/User:Belbernard
------------------
FOR DOCUMENTALISTS
The DATABASES folder contains all grindmill songs databases in formats suited for long-term preservation and reuse:
• Tabulated text (TAB)
• Comma-separated values (CSV, compliant with RFC 4180)
• XML
For instance,
• "SONGS.fp5" is the source file for song texts (Filemaker™ Pro 5-6, not available)
• "SONGS_export.tab" is its export in tabulated text
• "SONGS_export.xml" is its export in XML
• "SONGS_export.csv" is its export in CSV
• "SONGS_export.txt" is a plain-text export of transcriptions, translations and references of sound recordings
Samples of exports (containing the first 75 records) are also available, for example "SONGS_export_sample.xml". These will be helpful for setting-up import procedures as the full files may be very large.
Encoding of all export files is Unicode Text Encoding (8 bits) = UTF8.
Export files are directly accessible via their persistent identifiers, for example:
• http://hdl.handle.net/11041/sldr000735/CLASSIFICATION_export.xml
• http://hdl.handle.net/11041/sldr000735/LOCATIONS_export.tab
• http://hdl.handle.net/11041/sldr000735/PERFORMERS_export.csv
Export files contain both Devanagari and Roman Devanagari or English and French. Spelling in Devanagari is the most reliable one because all source material is in Marathi. However, owing to the automatic transcoding technique, spellings in Roman Devanagari (ISO 15919, strict nasalization) are perfectly consistent with the original Devanagari. Therefore, queries and studies may be performed on Roman-transcoded versions of texts.
Example:
• कावळा करी कोट एवढ बाभळीच्या बुडी / अस्तुरी येडी जात माया पुरुषाला थोडी
• kāvaḷā karī kōṭa ēvaḍha bābhaḷīcyā buḍī / asturī yēḍī jāta māyā puruṣālā thōḍī
• The crow builds a nest at the foot of a huge acacia / Womankind is naïve! Man has little affection
• Le corbeau fait son nid au pied d'un énorme acacia / Sotte race de femme ! L'homme a peu de tendresse
We use UK English for the spelling of translations and comments.
The database contains 109997 songs out of which 4595 have been recorded. Recordings are available on this archive:
http://hdl.handle.net/11041/sldr000717
Links to the recordings (both AIFF and MP3 formats) are contained in fields 'recording_AIFF_URL' and 'recording_MP3_URL' of the SONGS and RECORDINGS export files. These links are based on persistent identifiers. All links will become valid after completing the extraction of sound files.
For instance:
• http://hdl.handle.net/11041/sldr000717/uvs-01_07.mp3
• http://hdl.handle.net/11041/sldr000717/uvs-05_03.aif
MP3 links are public and may be imbedded in streamer's code to create related pages.
Links to pictures (extracted from the database records) are included in the PERFORMERS and LOCATIONS tables.
For instance:
• http://hdl.handle.net/11041/sldr000717/PERFORMERS_194.png
• http://hdl.handle.net/11041/sldr000717/LOCATIONS_72-h2.png
Every record in an export file contains a modification date. This date should be carefully checked because files may be updated without notice while their persistent identifiers remain unchanged. Therefore, when reusing this material, make sure that modifications dates are mentioned in publications.
---------------
FOR TECHNICIANS
Since it is no longer possible to type Devanagari (ISCII) directly into a Filemaker 5-6 database, all recent corrections have been entered in UTF8 with TextEdit in a file named "SongCorrections.txt". These corrected texts are used in replacement of text exported from the database for the concerned songs. Similarly, corrections in Roman Devanagari have been entered in UTF8 with TextEdit in a file named "RomanCorrections.txt".
Devanagari texts (in Marathi language) had been stored using ISCII encoding in source databases. (This standard was used by Apple for the encoding of all Indian languages.) I implemented a converter (in Bol Processor BP2, http://hdl.handle.net/11041/sldr000753) to transcode from Devanagari/ISCII to Roman Devanagari with the MyTymes font supplied by École française d'Extrême-Orient (EFEO). The export procedure further converts Devanagari/ISCII and Roman/MyTymes to UTF8, which makes all texts accessible to modern computers whatever the script.
ISCII to UTF8 conversion has been made easy thanks to the "iscii2utf8" PHP class designed by Sunish K. Kurup (sunish_mv@rediffmail.com).
Direct transliteration from Devanagari/UTF8 to Roman Devanagari is further required for corrected songs; we use Vinodh Rajan's ‘Aksharamukha’ (http://www.virtualvinodh.com/aksharamkh/aksharamukha.php) to this effect. Strict nasalization for ISO 15919 is achieved by a set of additional rewrite rules.
All HTML line breaks (such as "
" or "
") are converted to "
".
Linefeeds (LF = ASCII 11) are kept unchanged in XML files whereas they are replaced with "
" in TAB and CSV files.
Each record in the SONGS database contains at least a linefeed between the two lines of the song text in Devanagari, Roman Devanagari and translations.
If a field is declared 'repeat' in the settings, linefeeds are interpreted as a repeat-value separator, thereby producing multiple instances of the element in the XML file. This is the case with 'photo_landscape_url' and 'photo_portrait_url' in LOCATIONS.
In all exported files the first field is the primary key of the table. That field will be found under the same name in related tables. Check the fifth line of:
• http://hdl.handle.net/11041/sldr000735/SONGS-settings.txt
• http://hdl.handle.net/11041/sldr000735/PERFORMERS-settings.txt
• http://hdl.handle.net/11041/sldr000735/LOCATIONS-settings.txt
• http://hdl.handle.net/11041/sldr000735/RECORDINGS-settings.txt
• http://hdl.handle.net/11041/sldr000735/TUNES-settings.txt
• http://hdl.handle.net/11041/sldr000735/CLASSIFICATION-settings.txt
The last field of each record is its modification date: 'date_modified'.