SETI Research UK


Background

To date, no decipherment, of an ancient or unknown language, has been achieved by a cryptographer, crypto-palaeographer or linguist, by using robust scientific methods or without the aid of a crib: decipherment has typically relied on the insights and good guesses of hobbyists from unrelated disciplines.  This illustrates how ‘courageous’ a strategy of relying on established algorithms and brute-force cryptanalysis techniques, to attempt a decipherment, is likely to be.   Articles, books and even the film Contact, portray post-detection signal decipherment as a task for ‘imported’ cryptographers.  However, such expertise and methodologies are reliant on the premise that the data is an encoded form of a known ‘system’, which has a high probability of not being the case here.  It is therefore submitted that such a ‘signal’ is likely to present the antithesis of the expected norm for decryption techniques: a plaintext representation of an unknown system.

To enable a realistic attempt at a decipherment of an unknown language, it is the universal attributes and behavioural characteristics of language structure that first need to be modelled and understood (Elliott et al 2001; Elliott 2002).  This will then provide the necessary methodologies and algorithms to detect such hierarchical layers, which comprise intelligent, complex communication.   As part of this ‘toolkit’, it is submitted that all known systems (language parameters) need to be structurally analysed to ‘place’ their ‘system’ within a language matrix.  This will need to include all known languages, whether ‘living’ [in current use] or ancient; this must also include endeavours to incorporate yet undeciphered scripts, to provide as complete a picture as possible.  In creating such a relational matrix, post-detection decipherment will be assisted by a structural ‘map’ that will have the potential for ‘placing’ an alien communication with its nearest known ‘neighbour’, to assist subsequent categorisation of basic parameters as a precursor to decipherment. 

Historically, in any attempt at deciphering an unknown language, a preliminary step is first to compile a catalogue of all the different characters, which occur in the script.  This is important as the number of characters that comprise a script’s symbol set provides a clue to whether it is an alphabet, syllabary or logography: usually in the order of 20 to 30, 70 to 120 or 300 plus respectively: “Nearly all successful decipherments have involved clues through the script being a language that was familiar or very like a known language” (Daniels & Bright 1996).   Where such clues do not exist or the familiar clues are too few to provide a useful key, decipherment has generally relied on the discovery of bilingual/multilingual inscriptions, where the main task is then to correctly ‘map’ the known meaning to the unknown. Exemplars of such ‘keys’ are the ancient Egyptian Hieroglyphs, which was aided by the bilingual inscriptions of the Rosetta stone and the Cuneiform script, where a trilingual inscription provided the crib.  

In addition, the Hittite language was deciphered after a ‘good guess’ as to the nature of its related languages and the Creto-Mycenaean inscriptions were deciphered on the assumption that language was Greek.  Even the decipherment of Linear B was ultimately assisted by the realisation of its similarities with Greek. Additional scripts deciphered on the basis of their relationship with a familiar language or related script were the Brahmi script of Ashokan India, the Cypriote syllabary and the Himyartic script of Southern Arabia (Pope, 1999; Daniels & Bright 1996). 

These intuitions, cribs and ‘good guesses’ cannot be the foundations of either good practice or a sound theoretical base to devise strategies for subsequent detection and decipherment.  Nevertheless, the particular relevance of these past experiences is that some of the techniques used to aid disambiguation and realisation of correspondences are achieved using tools more appropriate for the computational task set.  In particular the utilisation of contextual, morphological and distributional analysis of characters and words has been of great benefit in identifying and classifying vowels, consonants and determiners (word classifiers, such as in the Egyptian Hieroglyphs).  The decipherment of Ugaritic, Linear B and the Turkic runes particularly benefited from such textual analysis before ultimately relying on familiar keys.  It is therefore these techniques that I will adopt as candidate tools to assist in my research.  Unfortunately, the successful decipherments documented have predominantly been reliant upon hobbyists using their intuitions on an ad hoc basis, devoid of any rigorous formal methods.  

This is NOT a strategy we can afford to perpetuate, if we are to contemplate solving this problam and ultimately preparing ourselves for contact.

©2003 SETI Research UK