Rebuttal of Sproat, Farmer, et al.’s supposed “refutation” by Rajesh Rao

July 13, 2010

Rebuttal of Sproat, Farmer, et al.’s supposed “refutation”

[Updated: July, 2010]

This article is reproduced here, with due acknowledgements, as it has bearing on the Dravidian researches going on here in Tamilnadu.

Particularly, Asko Parpola had delivered his lecture at Coimbatore and Chennai, but full details are not provided to general readers, as these issues affect them socially and politically.

In 2004, Steve Farmer, Richard Sproat, and Michael Witzel published a paper in “Electronic Journal of Vedic Studies” (entitled “The Collapse of the Indus-Script Thesis: The Myth of a Literate Harappan Civilization”) claiming that the Indus valley civilization was illiterate and that Indus writing was a collection of political or religious symbols.

The publication of our paper in Science elicited hostile reactions from them, ranging from off-the-cuff dismissive remarks such as “garbage in, garbage out” (Witzel) to ad-hominem attacks (labeling us “Dravidian nationalists”) and a vicious campaign on internet discussion groups and blogs to discredit our work. Their first knee-jerk reaction was to call the two artificial control datasets in our study “invented data sets” (Farmer). This was followed by Sproat and others on a blog claiming to have constructed “counterexamples” to our result. Sproat has even attempted to publicize his claims using an article in Computational Linguistics and a web page entitled “Why Rao et al.’s work proves nothing”(!), despite the fact that our work has now been published in journals like Science, PNAS, PLOS One, and IEEE Computer.

Here, we respond to their arguments in a point-by-point fashion. First, their arguments:

(1) Two datasets, used as controls in our work, are artificial.

(2) Counterexamples can be given, of non-linguistic systems, which produce conditional entropy plots like those presented in our Science paper.

(3) Conditional entropy cannot even differentiate between language families.

(4) The absence of writing material and long texts is “proof” that the Indus people were illiterate.

We view arguments (1)-(3) as arising from a misunderstanding of our approach and an overinterpretation of the conditional entropy result. Some of these arguments are made with a narrow computational linguistics point of view without considering other properties of the Indus script and the Indus civilization (see below). The last argument has been controverted by several other researchers as discussed below.

Here is the point-by-point rebuttal:

(1) As stated in our Science paper, the two artificial data sets (which Farmer et al. call “invented data sets”) simply represent controls, necessary in any scientific investigation, to delineate the limits of what is possible. The two controls in our work represent sequences with maximum and minimum flexibility, for a given number of tokens. Though this can be computed analytically, the data sets were generated to subject them to the same parameter estimation process as the other data sets. Our conclusions do not depend on the controls, but are based on comparisons with real world data: DNA and protein sequences, various natural languages, and FORTRAN computer code. All our real world examples are bounded by the maximum and the minimum provided by the controls, which thus serve as a check on the computation.

(2) Counterexamples matter only if we claim that conditional entropy by itself is a sufficient criterion to distinguish between language and non-language. We do not make this claim in our Science paper. As clearly stated in the last sentence of the paper, our results provide evidence which, given the rich syntactic structure in the script (and other evidence as listed below), increases the probability that the script represents language.

The methodology, which is Bayesian in nature, can be summarized as follows. We begin with the fact that the Indus script exhibits the following properties:

  • The Indus texts are linearly written, like the vast majority of linguistic scripts (and unlike nonlinguistic systems such as medieval heraldry or traffic signs);
  • Indus symbols are often modified by the addition of specific sets of marks over, around, or inside a symbol. Multiple symbols are sometimes combined (“ligatured”) to form a single glyph. This is similar to later Indian scripts which use such ligatures and marks above, below, or around a symbol to modify the sound of a root consonant or vowel symbol;
  • The script obeys the Zipf-Mandelbrot law, a power-law distribution on ranked data, which is often considered a nec­essary (though not sufficient) condition for language (see our PLOS One paper);
  • The script exhibits rich syntactic structure such as the clear presence of beginners and enders, preferences of symbol clusters for particular positions within texts etc. (see References), not unlike linguistic sequences;
  • Indus texts that have been discovered in Mesopotamia and the Persian Gulf use the same signs as texts found in the Indus region but alter their ordering. These “foreign” texts have low likelihood values compared to Indus region texts (see our PNAS paper), suggesting that the script was versatile enough to represent different subject matter or a dif­ferent language in foreign regions.

Given that the Indus script shares the above properties with linguistic scripts, we claim that the similarity in conditional entropy of the Indus script to other natural languages provides additional evidence in favor of the linguistic hypothesis.

We have recently extended the result in our Science paper to block entropies for sequences of up to 6 symbols (see IEEE Computer paper for details):



The language-like scaling behavior of block entropies in the above figure, in combination with the other properties of language enumerated above, could be viewed in a Bayesian framework as further evidence for the linguistic nature of the Indus script.

The above figure also addresses objections raised by some (e.g., Fernando Pereira) who felt conditional entropy (which considers only pairwise dependencies) was not a sufficiently rich measure.

Let us now consider the nonlinguistic systems that have been suggested:

  • Mark Liberman, Sproat, and Cosmo Shalizi in a blog constructed artificial examples of nonlinguistic systems whose conditional entropy was similar to the Indus script but their examples have no correlations between symbols – these examples do not exhibit the entropy scaling property exhibited by the Indus script and languages in the above figure, let alone other language-like properties like those exhibited by the Indus script.
  • Two natural nonlinguistic systems that have been suggested, medieval heraldry and traffic signs, are not even linear, nor do they exhibit other script-like properties such as those listed above.
  • The Vinca markings on pottery are linear but scholars have established that the symbols do not appear to follow any order – the system thus can be expected to fall in the maximum entropy range (MaxEnt) in the above figure.
  • The carvings of deities on Mesopotamian boundary stones are also linear but the ordering of symbols appears to be more rigid than in natural languages, following for example the hierarchical ordering of the deities. This system can thus be expected to fall closer to the minimum entropy (MinEnt) range in the above entropy scaling figure than to natural languages.

We therefore believe that the new result above from our IEEE Computer paper, showing that the block entropies of the Indus script scale in a manner similar to natural languages, when viewed in conjunction with the other language-like properties of the script as described above, adds further support to the linguistic hypothesis.

(3) Sproat has endeavored to produce a plot where languages belonging to different language families have similar conditional entropies, thereby claiming that the conditional entropy result “proves nothing.” This claim is once again based on an overinterpretation of the result in our Science paper. We specifically note on page 10 in the supplementary information that “answering the question of linguistic affinity of the Indus texts requires a more sophisticated approach, such as statistically inferring an underlying grammar for the Indus texts from available data and comparing the inferred rules with those of various known language families.” In other words, conditional entropy provides a quantitative measure of the amount of flexibility allowed in choosing the next symbol given a previous symbol. It is useful for characterizing the average amount of flexibility in sequences of different kinds. We do not make the claim that it can be used to distinguish between language families – this requires a more sophisticated measure.

(4) With regard to the length of texts, several West Asian writing systems such as Proto-Cuneiform, Proto-Sumerian, and the Uruk script have statistical regularities in sign frequencies and text lengths which are remarkably similar to the Indus script (Details can be found in These writing systems are by all accounts linguistic. Furthermore, the lack of archaeological evidence for long texts in the Indus civilization does not automatically imply that they did not exist (“absence of evidence is not evidence of absence”). There is a long history of writing on perishable materials like cotton, palm leaves, and bark in the Indian subcontinent using equally perishable writing implements (see Parpola’s paper below). Writing on such material is unlikely to have survived the hostile environment of the Indus valley. Thus, long texts may have been written, but no archaeological remains are to be found.

As regards the argument for literacy from the point of view of cultural sophistication of the Indus people, we believe Iravatham Mahadevan has addressed this adequately in his op-ed piece below (see also Massimo Vidale’s entertaining article).


Ontogeny, Phylogeny and Epigeny: Or the revival of Race, racism and Racialism?

July 12, 2009

Ontogeny, Phylogeny and Epigeny:

Or the revival of Race, racism and Racialism?

Acknowledgement: I am an independent researcher and freelancer and I get data and information from others and from which applying thinking processes, I am writing based on what I have understood. Therefore, naturally, but for interpretation, I cannot claim anything of my own in forming any new hypothesis, theory or established law, so that it could be applicable. Indian tradition has been to acknowledge the source of knowledge and wisdom and that is why in every text, it is submitted that so-and-so told this to so-and-so and such knowledge system has come to me and I am passing on to the progeny for the betterment[1]. This type of acknowledging the source research methodology is strikingly same in the ancient Sanskrit and Tamil literature[2] and therefore, the readers could find such concurrence, confluence and continuance of tradition, heritage, culture and civilization of India. Thus, in the stratum and continuum of languages in spite of their Aryan-Dravidian dichotomy, they fall in the same phylum without any epigenetic variance.

However, the non-Indian methodology has been to claim “everything is mine” attitude. And not only that, the dangerous practice has been borrowing, copying or even carrying out complete plagiarism, they destroyed the sources to deny the credit to the original people. This has happened many times in Indian history[3].

I know that under the guise of scientific principles, we are dealing with pseudo-scientific pursuits that were condemned and consigned to dustbins some 60 years ago[4]. The expressions – anthropology, ethnology, philology, linguistics, etymology, lexicography, morphology, brachycephalic / mesocephalic / dolicocephalic indices, phrenology, prognathism, etc., and the connected concepts including blood theory (blue blood, plebeian blood), endogamy, exogamy, eugenics, miscegenation, cross-breeding, are not at all new for any Indologist, serious researcher or Sanskrit student in India or elsewhere[5]. However, it is clear that some scholars, group of pundits or groups of chosen experts have decided to revive such race, racial and racialist hypotheses and theories to divide people, create misunderstanding and pit people against each other for conflicts. I am afraid to note that the Oxford dictionaries themselves started giving different meanings, as I have been consciously using the dictionaries used by family of four generations[6].

Introduction: It is not a confused noise or noises made by a number of voices[7] or any divine or evolutionary force made the owners of voices or the noise-producers to get confused the languages[8] during their hypothesis building processes. A group of inter-disciplinary scholars have joined together and revived the 100-150 years old linguistics, philology, race theories based on blood etc., almost in the same terminology with certain changes under the guise of modern and scientific data. When the god-believing racists and racialists joined ontology, teleology, autology, autogenesis, homogenesis, heterogenesis and such other expressions were also used. Thus, besides the same expressions like etymology, morphology, phonology, phonetics, etc., ontogeny, phylogeny, epigeny and other expressions are used. They are discussed.

Ontogenesis: In biological connotation onto + genesis, it is nothing but nature + genesis, i,e, naturally producing or produced, thus, explained as development of an individual organism or anatomical or behavioural feature from the earlier stage to maturity.

Thus, when it is borrowed, transported and transformed to linguistic studies,  it is the development of a language or a language group produced from people with specific individual organism or anatomical or behavioural feature from the earlier stage to maturity. Ontogenesis is the development of a language or a language group produced from people with specific individual organism or anatomical or behavioural feature from the earlier stage to maturity.

Ontogeny deals with ontogenesis. Here, the meaning of ontology has to be analyzed in the context, as it differs from time to time as used by the westerners themselves. The ontological argument was used by the Archbishop Anselm of England and Descartes, the French philosopher to prove the existence of God. According to this argument, the very subjective notion of God, implanted in our minds by God Himself, is enough to justify God’s objective existence. Thus, they related it to mind or something subjective emanating from the mind and thinking. Ontology was defined as the department of metaphysics concerned with the essence of things or being in the abstract[9](1934). However, now the meaning of ontology has been “the branch of metaphysics concerned with the nature of being[10] (1999).

Phylogenesis: Phylum + genesis = class + origin and thus, again, in biology, phylogenesis is the evolutionary development and classification of species or group of organism [The Greek phule means race or tribe].

Phylum: The Greek etymology of Phylum connotes several meanings in the context and as well as how they are used to get familiarized and understood.

From the different connotation of Phul / pul, and its combination, it is known that phylum is the race, class, division, family originating from a particular phallic or phullon or combination thereof[11]. Phuletes = tribes-man

Phullon = leaf, female sexual organ

Phulon, = race,

phule = tribe

pule = gate

puloros = gate keeper

phusallis = bladder

phallic = male sexual organ

Thus, its meaning is given as follows:

  1. Zoology: It is a principal taxonomic category that ranks above class and below kingdom, equivalent to the division in botany.
  1. Linguistics: It is a group of languages related to each other class closely than those forming a family.

Phylogeny: Generally, it is explained as the branch of biology concerned with phylogenesis. Phylogenesis is the evolutionary development and classification of a species or group of organism. Here, the meaning of phylum as explained has to be taken into consideration carefully.

Thus, again importing into linguistics, it is the evolutionary development and classification of a language of group of languages spoken by or still being spoken by certain people or group of people. Phylogenesis is the evolutionary development and classification of a language of group of languages spoken by or still being spoken by certain people or group of people.

Epigeny: epi + geny = upon / above forming / producing / originating, i.e, something is created / produced / originated upon/ above another.

Epigenesis: In biology, it is progressive development of an embryo from an undifferentiated egg-cell.

Thus, it is progressive development of a language or group of languages spoken by a group of people of groups of people.

However, epigenetic has different connotations as follows:

  1. Biology: resulting from external factor rather than genetic influences.
  1. Biology: of or relating to epigenesis.
  1. Geology: formed later than the surrounding or underlying rocks.

So extending to incorporate such concept, it is metamorphoses to the following exigencies:

  1. External factor: It is a process of progressive development of a language or group of languages spoken by a group of people of groups of people resulting from external factor
  1. Genetic influence: It is a process of progressive development of a language or group of languages spoken by a group of people of groups of people resulting genetic influences.
  1. Based on others: It is a process of progressive development of a language or group of languages spoken by a group of people of groups of people formed later than the surrounding or underlying linguistic influences.

Incidentally the OED (1934) defines[12], “epigenesis as the formation of organic germ as a new product; theory of epigenesist, that the germ is brought into existence, not merely developed, in process of reproduction”.

Hypogeny, hypogenesis, hypogenetic processes: In contrast to epigeny, epigenesist and epigenetic processes proposed, discussed and debated, the hypogeny, hypogenesis, hypogenetic processes should also be considered in the case of language and language origins.

Autogeny, monogeny and heterogeny: In this context, the following processes should also be considered:

Autogeny: auto + geny = generated itself, arising from within or forming a thing itself and so on. Accordingly, the meanings of autogenous, autogenesis, autogenetic etc., should be construed and applied.

Monogeny: mono + geny = generated, produced or created from only one source / place. Accordingly, the meanings of monogenous, monogenesis, monogenetic etc., should be construed and applied.

Heterogeny: hetero + geny = generated, produced or created from more than one place or many places. Accordingly, the meanings of heterogenous, heterogenesis, hetrogenetic etc., should be construed and applied.

As pointed, out these expressions have not been new or something coined now, as they were used consciously with purpose and purport to convey race, racial and racialist hypotheses and theories using pseudo-scientific concepts and precepts. Such race, racist and racialist hypotheses and theories led to the two world wars killing millions of people.

Michael Witzel conferences 2009

Witzel, Darwin and Bible:  The Asiatic Association[13] has declared the so-called “conferences” of Michael Wizel: “Dr. Michael Witzel, Wales Professor of Sanskrit at Harvard University (USA), will deliver a set of three conferences in India. A very proper event in the year of the commemoration of Charles Darwin (1800-1882), the famous scientist who was opposed by the Christian church for so long time. Prof. Witzel is well known scientist whose thesis on the script of the Indus Valley Civilization raised a lively debate among the radical Hindus. This is the calendar of his lectures:

  • 8 July 2009. The Madras Sanskrit College, Chennai organized by Indus Research Centre, Roja Muthaiah Research Library (Jubliee lectures).
  • 9 July 2009. Nehru Memorial Library, or Jawaharlal Nehru Institute of Advanced Study, New Delhi.
  • 10 July 2009. India International Centre, New Delhi.”
  • Dr. Michael Witzel, Wales Professor of Sanskrit at Harvard University (USA), will deliver a set of three conferences in India.

Delivering set of conferences has been the tradition of the Christians and later they declare that they had such conferences with the heathens and won them through debates.

  • A very proper event in the year of the commemoration of Charles Darwin (1800-1882), the famous scientist who was opposed by the Christian church for so long time.

One should note the language used here as such conferences have been to commemorate Darwin who was opposed by the Church or the church that opposed Darwin! If the church opposed Darwin, it is intriguing to note that Wizel to carryout his linguistic phylogenetic trees planting in India.

  • Prof. Witzel is well known scientist whose thesis on the script of the Indus Valley Civilization raised a lively debate among the radical Hindus.

How the website uses such offending expression? Why his intellectual hypothesis should have raised “a lively debate”? Has he been a rabble-rouser then? The radical Hindus include all types so-called Hindutvawadis – fundamental fanatics, extreme activists and die-hard militants. That is why he was given police protection that equaled or exceeded the strength of audience! Ironically, among the “radical Hindus” who attended have been Iravatham Mahadevan, N. Mahalingam, Sankaranarayanan, Ramakrishnan, Prof. Dass, and many others. In fact, I. Mahadevan openly declared that he has fundamental difference with Witzel, about his assertion that IVC script does not have any language system at all.

  • This is the calendar of his lectures

However, the calendar and program changed, covered up and even perhaps kept secret, because of security reasons!

The communal, parochial and fundamental psyche of the accusers: About Michael Witzel, many allegations and accusations have been made[14]. His friends and colleagues like Steve Farmer and others could support and carry out propaganda blaming Indians as Hinduvawadis, right-wing ideologists and so on, but they themselves have exhibited more such characteristics and qualities by vomiting out foul language, which they cannot claim[15], “since god created the world through the power of language” (Genesis 1:28 of Bible), “we are also using”. When Steve pounces upon Kanchan Gupta ferociously, he should eat his own words what he is writing about others. Without arguing honestly or intellectually, the same Steve Farmer prevented and blocked my postings in his and ran away[16]. His another collaborator – Francesco Brighenti also used to adopt the same tactics. Now, Michael Witzel has done the same thing in India. Without answering, he simply moved away from place to place.  If Steve could use expressions like “right-wing”, “closely associated with Hindutva right”, and could use “Hindu radicals[17] etc., they could be also easily identified and held responsible for such cognizable writings, speech and acts. Therefore, the accusers themselves now stand accused for their blatant exhibited qualities. Under the cover of “Wales Sanskrit Professor”, “Harvard University” etc., these learned scholars have been blaspheming and denigrating Indians and definitely such attitude is unbecoming of the status, they have. In deed, the Chennai “Conferences” has exposed the “scholarship” of “Wales Sanskrit Professor” at “Harvard University, USA”. Though, the Professor has given his visiting cards to the questioner[18], he has not so far sent his papers or clarified the questions raised in front of selected Sanskrit professors, Pundits and scholars of Chennai and Madras University.

Taking Darwin to beat Darwin to with-hold the babel of tower: In his presentation at “Darwin” conference, he began with a brief overview of opinions about the origin of human language and the controversial question of Neanderthal speech. Quickly moving from the language of the ‘African Eve’ to the specific ones of the subcontinent, a brief overview is given of the prehistoric and current South Asian language families as well as their development over the past c. 5000 years. The equivalents of phylogeny and epigenetics in linguistics are then dealt with, that is, the successful (Darwinian style) phylogenetic reconstruction of language families (as ‘trees’), which is interfered by the separate wave-like spread of certain features across linguistic boundaries, even across language families. A combination of both features lead to the emergence of the current South Asian linguistic area (sprachbund). This development has made the structure of Indo-Aryan, Dravidian or Munda similar to each other but it could not eliminate most of their individual characteristics.

  • Thus, just like “liberation theology”, parallel “Darwin model” is proposed to have a Black Jehovah, Black Mary, Black Joseph and of course a Black Jesus, Black Christ and then merging to Black Jesus Christ!
  • As Marxists already proposed to accept the invasion but stick to migration, the migration or movement of language-speaking people were considered.
  • He was also using the expression “Father Heaven[19] instead of the usual “Father” in his linguistic interpretation to show the “unity” of Indo-Aryan language group.
  • Now, the migration, settlement, stratification and solidification of language stratums would be identified to interpret language continuum as para-X, Pro-X, epi-X, hypo-X, hybrid-X , where X could be any language as hypothesized.

Refutation of such hypothesized ideology: Formation of languages based on phylogenetic and epigenetic hypotheses are refuted as follows:

1.  The monogenic origin of humanity refutes the anthropological studies and heterogenic origins question the linguistic studies that hold that there has been one first language from which all other languages evolved. However, both have been the creation of ontological exercises according to the hidden theological concepts.


2. Chronologically, it was accepted that Sanskrit was the ancient language and then, it was mentioned as Indo-Germanic languages of which Sanskrit was the ancient language. Then, from “Germano-centric”, it was converted to “Euro-centric” to name Indo-Indo-Aryan and then Indo-European.

3. Whatever, the phylogenetic and epigenetic processes and movements experimenting with the languages of chimpanzees and neandarthals, Ramaapithacus always confronted others.

4. With the geological, economic and other factors, people move, migrate and settle and accordingly not only the material stratification, but also the ontological stratification.

5. But as the stratigraphical pattern at a particular place and time could not be correlated and corresponded with any other pattern on the earth in a continent, country or at the same site say 100 metres away, any attempt of asserting that the pattern could have happened exactly like this is quite incorrect.

6. As horizontal and vertical excavations have not been done even at the historical sites to tell exactly what happened in all histories of the ancient civilizations, in the ontological, phylogenetical and epigenetic stratums, no such vertical or horizontal excavations could be conducted to analyze and come to any specific conditions.

7. We have to wait and see how many chimpanzees and Neanderthal men are converting into Rigvedic phyla to learn and speak Sanskrit that is not Paninian or Kalidasa!

Alan Bersin, and michael witzel

Fund-raising Witzel in India and global economic meltdown: Really, India has become so rich that it could fund to Wales Professor of Sanskrit at Harvard University! So far India hired lascars and ships to the Europeans, supplied slaves to European farms, skilled workers to their factories, etc., from 1600 to 1900, but now skilled mental workers of both soft and hard categories are sent. Thanks to the epigenetic processes instead of phylogenetics!

One American friend clarifies, “I knew that Witzel was in financial difficulty and that Harvard was unhappy with his antics. Note that Alan Bersin who was California Education Secretary at the time Witzel launched his campaign was also on the Board of Overseers of the Harvard Corporation (which owns Harvard University). The main function of the Board is fund raising. So Witzel must have promised Bersin that his California campaign would raise funds for Harvard, especially for the beleagured Sanskrit Department which Witzel had made a mess of when he was Chairman. (Please see attachment.) He tried to raise funds for himself also. He advertised his services in Pakistan (in the Internet version of The Dawn) as an anti-Hindu lobbyist. Note that most years Witzel teaches summer courses in elementary Sanskrit that does not go much beyond teaching the Devanagari script. His campaign in India this summer suggests (my guess only) that the Sanskrit Department has no money for summer programs. This could be a reason that he is India. Harvard has lost nearly $8 billion in the financial meltdown or nearly 30% of its assets. It has postponed some important projects including a new science center in Boston. The last thing the Harvard administaration wants or needs is a madcap like Witzel going around antagonizing an affluent and highly educated community like Americans of Indian origin.

As the “white-man’s burden” increased politically and economically, and they wanted to exploit the coloured and blacks some 200 years back and now also they try to exploit accordingly. However, the Whites have Yellowphobia and Hindophobia and thus to check, counter and contain such surges, they would apply the traditional middle-east affiliation against the Chinese and Indians by creating problems then and there. Incidentally, when Witzel has been in India, Indians get news that –

  • China is laying road through POK to reach out poet in the Arabian Sea.
  • However, converted Chinese Mohammedans started rioting killing non-Mohammedans there in the Chinese province.
  • A Chinse ship is coming to anchor at Calicut port.
  • Indians are killed Afganishtan, as the Taliban attacked kafiri-workshops.
  • US President is visiting African countries as a Black with much publicity given.

Thus, the Micheal mania is overshadowed by this Micheal mania and the poor Hindu radicals were simply carried away by such manias.



