Manipulating Google Scholar Citations and Google Scholar Metrics


http://digibug.ugr.es/bitstream/10481/20469/2/scholar_en.pdf








Manipulating Google Scholar Citations and Google Scholar Metrics:

simple, easy and tempting

Emilio Delgado López-Cózar1

, Nicolás Robinson-García1

y Daniel Torres-Salinas2

EC3: Evaluación de la Ciencia y de la Comunicación Científica

1Universidad de Granada

2Universidad de Navarra

edelgado@ugr.es; elrobin@ugr.es; torressalinas@gmail.com

ABSTRACT

The launch of Google Scholar Citations and Google Scholar Metrics may provoke a

revolution in the research evaluation field as it places within every researcher’s reach tools

that allow bibliometric measuring. In order to alert the research community over how easily

one can manipulate the data and bibliometric indicators offered by Google’s products we

present an experiment in which we manipulate the Google Citations’ profiles of a research

group through the creation of false documents that cite their documents, and consequently,

the journals in which they have published modifying their H-index. For this purpose we

created six documents authored by a faked author and we uploaded them to a researcher’s

personal website under the University of Granada’s domain. The result of the experiment

meant an increase of 774 citations in 129 papers (six citations per paper) increasing the

authors and journals' H-index . We analyse the malicious effect this type of practices can

cause to Google Scholar Citations and Google Scholar Metrics. Finally, we conclude with

several deliberations over the effects these malpractices may have and the lack of control

tools these tools offer

KEYWORDS

Google Citations / Google Scholar Metrics/ Scientific Journals / Scientific fraud / Citation

analysis / Bibliometrics / H Index / Evaluation / Researchers

Referencia bibliográfica recomendada

Delgado López-Cózar, Emilio; Robinson-García, Nicolás; Torres Salinas, Daniel (2012).

Manipulating Google Scholar Citations and Google Scholar Metrics: simple, easy and tempting.

EC3 Working Papers 6: 29 May, 2012

1. INTRODUCTION

If the launch of Google Scholar in 2004 (a novel search engine focused on retrieving any

type of academic material along with its citations) meant a revolution in the scientific

information market by allowing universal and free access to all documents available in

the web, the launch of Google Scholar Citations (hereafter GS Citations)(a tool for

measuring researchers' output and impact (Cabezas-Clavijo y Torres-Salinas, 2012)) and

Google Scholar Metrics (hereafter GS Metrics) (a scientific index of journals ranked

according to their impact (Cabezas-Clavijo y Delgado López-Cózar, 2012)) may well be a

historical milestone for the globalization and democratisation of research evaluation

(Butler 2011). As well as constituting an obstacle to the traditional bibliographic

databases and bibliometric indexes offered by Thomson Reuters (Web of Science and

JCR) and Elsevier (Scopus and SJR), ending with their monopoly and becoming a serious

competitor; Google Scholar's new products project a future landscape with ethical and

sociological dilemmas that may entail serious consequences in the world of science and

research evaluation. Delgado López-Cózar, Robinson-García & Torres-Salinas. Manipulating Google Scholar … 2

Without considering the technical and methodological problems that the Google Scholar

products have, which are currently under study (Jacsó, 2008, 2011; Wouters y Costas,

2012; Aguillo, 2012; Cabezas-Clavijo y Delgado López-Cózar, 2012; Torres-Salinas,

Ruiz-Pérez y Delgado López-Cózar, 2009) and which will be presumably solved in a near

future, its irruption ends with all kinds of scientific control or filters of researchers'

activity, becoming a new challenge to the bibliometric community. Since the moment

Google Scholar automatically retrieves, indexes and stores any type of scientific material

uploaded by an author without any previous external control (repositories are only a

technical filter as they do not review the content), it allows unprincipled people to

manipulate their output, impacting directly on their bibliometric performance.

Because this type of behaviour by which one modifies its output and impact through

intentional and unrestrained self-citation is not uncommon, we consider necessary to

analyse thoroughly Google's capacity to detect the manipulation of data.

This study continues the research line started by Labbé (2010). In his paper he

transformed a faked researcher called Ike Antkare ( ‘I can’t care’) into the most prolific

researcher in history. However, in this case we will enquire over the most dangerous

aspects of gaming tools aimed at evaluating researchers and the malicious effects they can

have on researchers' behaviour. Therefore our aim is to demonstrate how easily anyone

can manipulate Google Scholar's tools. But, contrarily to Labbé, we will not emphasize

the technical aspects of such gaming, but its sociological dimension, focusing on the

enormous temptation these tools can have for researchers and journals' editors, eager to

increase their impact. In order to do so, we will show how the bibliometric profiles of

researchers and journals can be modified simultaneously in the easiest way possible: by

uploading faked documents on our personal website citing the whole production of a

research group. It is not necessary to use any type of software for creating faked

documents: you only need to copy and paste the same text over and over again and upload

the resulting documents in a webpage under an institutional domain. We will also analyse

Google's capacity to detect retracted documents and delete their bibliographic records

along with the citations they make.

This type of study by which false documents are created in order to evidence defects,

biases or errors committed by authors has been used many times in scientific literature,

especially in the research evaluation field. The reader is referred to the works of Peters &

Ceci (1990), Epstein (1990), Sokal (1996, 1997) or Baxt et al. (1998) when demonstrating

the deficiencies of the peer review method as an objective, reliable, valid, efficient and

free of errors quality control tool over content published in scientific journals. Or Scigen1

,

a programme created by three students from the MIT for generating random papers in the

Computer Science field including graphs, figures and references. All of these works

raised an intense debate within the research community.

Therefore, this paper is structured as follows. Firstly we described the methodology

followed; how were the false documents created and where were they uploaded. Then we

show the effect they had on the bibliometric profiles of the researchers who received the

1

http://pdos.csail.mit.edu/scigen/Delgado López-Cózar, Robinson-García & Torres-Salinas. Manipulating Google Scholar … 3

citations and we emulate the effect these citations would have had on the journals affected

if GS Metrics was updated regularly. We analyse the technical effects and the dangerous

these tools entail for evaluating research. Finally we conclude emphasizing their strengths

and some concluding remarks.

2. MANIPULATING DATA: THE GOOGLE SCHOLAR EXPERIMENT

In order to analyse GS Citations’ capacity to discriminate academic works from those

which aren’t and test the grade of difficulty for manipulating output and citations in

Google Scholar and its bibliometric tools (GS Citations and Metrics), we created false

documents referencing the whole research production of the EC3 research group (Science

and Scientific Communication Evaluation) available at http://ec3.ugr.es in the easiest

possible way. This way we intend to show how anyone can manipulate its output and

citations in GS Citations.

Figure 1. Fake documents authored by the non-existent researcher MA PantaniContador

Following the example set by Labbé (2010), we created a false researcher named Marco

Alberto Pantani-Contador, making reference to the great fraud the Italian cyclist became

at the end and the accidental causes that deprived the Spanish cyclist from winning the

Tour. Thus, Pantani-Contador authored six documents (figure 1) which did not intend to

be considered as research papers but working papers. In a process that lasted less than a

half day’s work, we draft a small text, copied and pasted some more from the EC3

research group’s website, included several graphs and figures, translated it automatically

into English using Google Translate and divided it into six documents. Each document

referenced 129 papers authored by at least one member of the EC3 research group

according to their website http://ec3.ugr.es. That is, we expected a total increase of 774

citations. Delgado López-Cózar, Robinson-García & Torres-Salinas. Manipulating Google Scholar … 4

Afterwards, we created a simple webpage under the University of Granada domain

including references to the false papers and linking to the full text, in order to let Google

Scholar index the content. We excluded other services such as institutional or subjectbased repositories as they are not obliged to undertake any bibliographic control rather

than a formal one (Delgado López-Cózar, 2012) and they were not included in the aims of

this study.

The false documents were uploaded on 17 April, 2012. Due presumably because it was a

personal website and not a repository, Google indexed these documents nearly a month

after they were uploaded, on 12 May, 2012. At that time the members of the research

group used as study case along with the three co-authors of this paper, received an alert

from GS Citations pointing out that some MA Pantani-Contador had cited their Works.

The citation explosion was thrilling, especially in the case of the youngest researchers

where their citation rates were multiplied by six, notoriously increasing in size their

profiles.

Figure 2. Citations increase for the authors of this paper

Emilio Delgado López-Cózar

WHOLE PERIOD SINCE 2007

BEFORE the

experiment

AFTER the

experiment

BEFORE the

experiment

AFTER the

experiment

Citations 862 1297 + 435 560 995 + 435

H-Index 15 17 + 2 10 15 + 5

i10-Index 20 40 + 20 11 33 + 22

Nicolás Robinson-García

WHOLE PERIOD SINCE 2007

BEFORE the

experiment

AFTER the

experiment

BEFORE the

experiment

AFTER the

experiment

Citations 4 29 + 25 4 29 + 25

H-Index 1 4 + 3 1 4 + 3

i10-Index 0 0 0 0 0 0

Daniel Torres-Salinas

WHOLE PERIOD SINCE 2007

BEFORE the

experiment

AFTER the

experiment

BEFORE the

experiment

AFTER the

experiment

Citations 227 409 + 182 226 408 + 182

H-Index 9 11 + 2 9 11 + 2

i10-Index 7 17 + 10 7 17 + 10

Figure 2 shows the increase of citations the authors experienced. Obviously, the number

of citations per author varies depending the number of publications of each of the member

of the research group used as study case as well as real citations received during the study

period. Thus, the greatest increase is for the less-cited author, Robinson-García, who

multiplies by 7.25 the number of citations received, while Torres-Salinas doubles it and

Delgado López-Cózar experiences an increase of 1.5. We also note the effect on the H-Delgado López-Cózar, Robinson-García & Torres-Salinas. Manipulating Google Scholar … 5

index of each researcher. While the most significant increase is perceived in the less

prolific profile, the variation for the other two others is much more moderate, illustrating

the stability of the indicator. Note how in Torres-Salinas’ case, where the number of

citations is doubled, how the H-index only increases by two. On the other hand, we

observe how the i10-index is much more sensitive to changes. In Torres-Salinas’ case, the

increase goes from 7 to 17, and in Delgado López-Cózar’s case it triples for the last five

years, going from 11 to 33.

Figure 3. Effects on the manipulation of the citations in one of the authors

BEFORE THE EXPERIMENT

AFTER THE EXPERIMENT

Also, it is interesting to analyse the effect this citation increase may have on the h-index

for journals indexed in GS Metrics. For this, we have considered the two journals in

which the members of the research group have published more papers and therefore, more

sensitive to be manipulated. These are El Profesional de la Información with 30 papers

published in this journal and Revista Española de Documentación Científica, with 33

papers. In table 1 we show the H-indexes for El Profesional de la Información and

Revista Española de Documentación Científica according to Google and the increase it

would have if the citations emitted by Pantani-Contador had been included. We must alert

the reader that this tool, contrarily to the rest of Google’s products, is not automatically

updated and that data displayed dates to the day of its launch, that is, 1 April, 2012

(Cabezas-Clavijo y Delgado López-Cózar, 2012). We observe that El Profesional de la

Información would be the one which would be more influenced, as seven papers would

surpass the 12 citations threshold increasing its H-index and ascending in the ranking for

journals in Spanish language from position 20 to position 5 if the index was updated

today. Revista Española de Documentación Científica would slightly modify its position,

as only one article surpasses the 9 citations threshold that influence its h-index. Even so Delgado López-Cózar, Robinson-García & Torres-Salinas. Manipulating Google Scholar … 6

and due to the high number of journals with its same h-index, it would go up from

position 74 to 54.

Table 1. Effect of the manipulation of citations over journals

Journal H-Index (GS

Metrics)

Art > 12

citations

Manipulated HIndex

El Profesional de la Información 12 7 19

Revista Española de Documentación

Científica 9 1 10

After proving the vulnerability of Google’s products when including false documents and

showing its effect at the researcher-level and journal-level, on 17 May, 2012 we deleted

the false documents and webpage in order to see if Google Scholar would delete the

records and the citations received according to GS Citations. However, until this date (29

May) and 17 days after they were removed from the Internet, no modifications have been

made whatsoever. The records of the authored documents by our faked researcher are still

available when searching its production and, despite being broken links, there is a version

of the documents saved by Google.

3. TECHNICAL CONSIDERATIONS

The results of our experiment show how easy and simple it is to modify the citation

profiles offered by Google. This exposes the dangers it may lead to in the hands of editors

and researchers tempted to do “citations engineering” and modify their H-index by

excessively self-citing their papers or, in a most refined way, sending citations only to the

hot zone of their publications, that is, those which can influence this indicator. In the case

of El Profesional de la Información, it is 16 documents with between 10 and 12 citations

for the time period analysed by GS Metrics (2007-2011) the ones that could modify this

journal’s position by having from 1 to 3 citations more.

Coming back to more technical issues, firstly, we must emphasize how easy it is to

manipulate, not just output, - previously stated by Labbé (2010), - but also citations. This

is raises serious concerns over the lack of Google Scholar to discriminate false documents

from those which are not. Although Google Scholar is only meant to index and retrieve

all kinds of academic material in its widest sense, the inclusion of GS Citations and GS

Metrics, which are evaluating tools, must include the introduction of monitoring tools and

the establishment more rigid criteria for indexing documents. Google Scholar offers

access to a wide range of document types, becoming a much more attractive database, not

just because of its “magic formula” for retrieving information, but because of the richness

of the data it handles. However, leaving such a controlled environment as journals leads

to many dangers in the research evaluation world.

On the other hand, it is interesting to observe the stability of the h-Index when affecting

experienced researchers, even if the number of citations is doubled. This may bring a

sense of relief, however, unfortunately there are many ways for manipulating this

indicator through self-citation (Bartneck and Kokkelmans, 2011). Also, regarding

journals and the most likely updating of GS Metrics, which was included on Google

Scholar`s homepage a few days ago, devious editors can easily modify their journals’ H-Delgado López-Cózar, Robinson-García & Torres-Salinas. Manipulating Google Scholar … 7

index. Also, we observe how notable is the variation of the i10-Index, especially for

experienced researchers.

Regarding the effect these malpractices may have over the rankings presented by Google,

it would obviously be significant, especially for those journals with small figures, on

which the slightest variation can have a great impact on their performance.

The impossibility of editing citations in GS Citations pointing out the wrong ones and

indicating those which have not been detected, highlights this shortcoming, therefore we

alert as it has previously been done (Cabezas-Clavijo and Torres-Salinas, 2012) of the

dangers the use of these tools for bibliometric purposes entail. The last part of the

experiment will be to see if the records of the deleted documents will be erased form

Google Scholar, along the citations the emit. This has not still happened and, if it doesn’t

occur, it will emphasize an important the general search engine also has, its impossibility

to exercise our “right to be forgotten” (Gómez, 2011).

Figure 4. Results from Google Scholar

Now, it is important to emphasize the visibility these tools offer and the transparency the

allow, facilitating the detection of these practices by the community, as we have

witnessed over the elaboration of this experiment. Many of the co-authors affected by the

malpractices of devious Pantani-Contador detected his reproachable behaviour and

enquired over the issue.

On the other side, it is interesting to see how papers published over the same template are

indexed differently by Google. This shows once again, the lack of normalization it has.

Therefore we see naming variations over the six false documents uploaded (figure 4).

3. FINAL THOUGHTS AND CONCLUSIONS Delgado López-Cózar, Robinson-García & Torres-Salinas. Manipulating Google Scholar … 8

Even if we have previously argued in favour of Google Scholar as a research evaluation

tool minimizing its biases and technical and methodological issues (Cabezas-Clavijo,

Delgado López-Cózar, 2012), in this paper we alert the research community over how

easy it is to manipulate data and bibliometric indicators. Switching form a controlled

environment where the production, dissemination and evaluation of scientific knowledge

is monitored (even accepting all the shortcomings of peer review) to a environment that

lacks of any kind of control rather than researchers’ consciousness is a radical novelty

that encounters many dangers. (Table 2).

Table 2. Control measures in the traditional model vs. Google Scholar’s products

Traditional model Google Scholar’s tools

Databases select journals to be indexed It indexes any document belonging to an

academic domain

Journals select papers to be published Any indexed document type emits and

receives citations

There is a control between citing and cited

documents

Fraudulent behaviours are persecuted

It is not possible to alert over fraudulent

behaviours or citation errors

Putting on researchers’ hand, which are humans, the tools that allow manipulating output

and citations may have unforeseen consequences or make these tools useless. The lack of

control that characterises these tools is their strength but also their weak point. It is so

easy to manipulate GS Citations that anyone can emulate Ike Antkare and become the

most productive and influential researcher in its specialty. Let alone editors, if GS Metrics

is finally incorporated, they can be tempted to use unethical techniques to increase the

impact of their journals.

These free and accessible products, do not only awaken the Narcissus within researchers

(Wouters; Costas, 2012), but can unleash malpractices aiming at manipulating the

orientation and meaning of numbers as a consequence of the ever growing pressure for

publishing fuelled by the research evaluation exercises of each country. There are many

cases of editors’ frauds where they manipulate through editorial policies researchers’

behaviours in order to increase the impact factor, as described by Falagas and Alexiou

(2008). Many journals are excluded every year from the Web of Science because of their

fraudulent behaviour (http://adminapps.webofknowledge.com/JCR/static_html/notices/notices.htm). There are many

examples, such as the one reported by Dimitrov et al. (2010) with the resounding case of

revista Acta Crystallographica A which surprised everyone when increasing its impact

factor from 2,38 to 49,93 in a year. It seemed that from the 5966 citations received in Delgado López-Cózar, Robinson-García & Torres-Salinas. Manipulating Google Scholar … 9

2009 by the 72 papers published in 2008, 5624 belonged just to one article. This paper

was in fact responsible of such an anomalous behaviour. Another example can be found

in Opatrný (2008).

Currently there are no controlling or filtering systems for avoiding fraud rather than

researchers’ ethical values. In this sense, we must point out the role of institutions such as

the Committee on Publication Ethics (http://publicationethics.org/) and other similar

organizations devoted to pursuing fraud within the traditional research communication

model, that is, journals. We may be witnessing a new revolution of the scientific

communication model and it may be just a matter of time to see other similar organization

working in this new environment. For our part, we conclude our experiment and we await

patiently the retraction of our inexistent researcher by Google, following our example and

deleting the faked citations from our profiles. Google’s effort on the creation of new

evaluation tools forecasts many changes in the research evaluation world. Not just

because these tools are cost-free, but because of their great coverage, immediacy and ease

of use. We will just have to wait to see which path will Google follow in their attempt to

put a stop to those numbers that are devouring science (Monastersky 2005).

SUPLEMENTARY MATERIAL

More information is available http://www.ugr.es/~elrobin/pantani.html.

REFERENCES

Aguillo, I. (2012). Is Google Scholar useful for bibliometrics? A webometric analysis.

Scientometrics 91; 2: 343-351.

Bartneck, C.; Kokkelmans, S. (2011). Detecting h-index manipulation through selfcitation analysis. Scientometrics 87; 1: 85-98.

Baxt, W. G.; Waeckerle, J. F.; Berlin, J. A.; Callaham, M. L. (1998). Who reviews the

reviewers? Feasability of using a fictitious manuscript to evaluate peer review

performance. Annals of Emergency Medicine 32; 3: 310-317.

Butler, D. (2011). Computing giants launch free science metrics. Nature 476; 18:

doi:10.1038/476018a 2

Cabezas-Clavijo, Á; Delgado López-Cózar, E. (2012). El impacto de las revistas según

Google, ¿un divertimento o un producto científico aceptable? EC3 Working Papers 1.

accesible en http://eprints.rclis.org/handle/10760/16836

Delgado López-Cózar, E. (2012). Los repositorios en Google Scholar Metrics o ¿qué hace

un tipo documental como tú en un lugar como ese? EC3 Working Papers 4: 3 de abril de

2012.

Dimitriv, J. D.; Kaberi, S. R.; Bayry, J. (2010). Metrics: journal's impact factor skewed by

a single paper. Nature 466: 179, Delgado López-Cózar, Robinson-García & Torres-Salinas. Manipulating Google Scholar … 10

Epstein, W. M. (1990). Confirmational response bias among social work journals. Science,

Technology & Human Values 15; 1: 9–38.

Falagas, M. E. y Alexiou, V. G. (2008). The top-ten in journal impact factor manipulation.

Archivum Immunologiae et Therapiae Experimentalis 56; 4: 223-226

Gómez, R. G. (2011). Quiero que Internet se olvide de mí. El País 7 de enero, 2011.

Jacsó, P. (2008). The pros and cons of computing the h-index using Google Scholar.

Online Information Review 32; 3: 437-452.

Jacsó, P. (2011). Google Scholar duped and deduped – the aura of ”robometrics”. Online

Information Review 35; 1: 154-160.

Labbé, C. (2010). Ike Antkare, one of the greatest stars in the scientific firmament. ISSI

Newsletter 6; 1: 48-52.

Monastersky R. (2005)The Number That’s Devouring Science. The Chronicle of Higher

Education 52; 8: A12. Disponible en: http://chronicle.com/free/v52/i08/08a01201.htm

Opatrný, T. (2008). Playing the system to give low-impact journal more clout. Nature

455: 167.

Peters, D. P.; Ceci, S. J. (1990). Peer-review practices of psychological journals – the fate

of accepted, published articles, submitted again. Behavioral and Brain Sciences, 5; 2:

187-195.

Sokal, A. D.; Bricmont, J. (1997). Impostures Intellectuelles. Editions Odile Jacob.

Wouters, P.; Costas, R. (2012). Users, narcissism and control – tracking the impact of

scholarly publications in the 21st century. SURFfoundation. Accesible en:

http://www.surf.nl/nl/publicaties/Documents/Users%20narcissism%20and%20control.pdf

Research Tools Box - The Effective Use of Research Tools Box

Research Tools Box - The Effective Use of Research Tools Box: "
"


How to increase h-index; “Advertise and disseminate publications” B...

How to increase h-index; “Advertise and disseminate publications” B...: "
"

Critical article on the H-Index How to become a successful scientist. » Survival Blog for Scientists

Critical article on the H-Index How to become a successful scientist. » Survival Blog for Scientists:


Papers

Tags: ,
Posted in Presentations qualityTechnical (ms word, tex)Tipsuseful software
When you are doing research, you tend to collect a lot of papers. I remember that at the end of m PhD, when I moved to another continent to do a postdoc, I dumped a huge box of photocopies in my parents’ basement. A few years ago, I had collected two cupboards full of photocopies. It was getting seriously out of hand. Then, of course, journals started putting everything online as PDFs and the same process started all over again but this time filling up hard disk folders instead. I used to have subject-based folders, which sort of worked until something fit within 2 or 3 or 4 of my subjects. Searching for some old paper you had read a few years back became more and more nightmarish. Then somebody showed me Papers.papers thumb Papers
Papers is a program that acts like your iTunes library for PDFs. It also connects to online search engines such as Web of Science (WoS), Scopus, PubMed, arXiv, JSTOR, Google scholar, and half a dozen other ones. That means that you can search from within Papers, retrieve, and store PDFs directly into your library but more importantly, Papers keeps a record of bibliometric data such as title, authors, abstract, doi, keywords, notes, etc. You can set up collections of papers by topic either by hand or by using smart folders that use automated searches of the bibliometric data. I like it that you can nest topic folders so you can either view PDFs from a sub-topic or go one level up to see all PDFs about that topic. My only complaint is the lack of Boolean and wildcard searches.
However, the best thing by far is the quick search function, which makes a real-time selection that narrows down as you type in more keywords. For example, you once read this paper, it was about X, one of the authors was Y, maybe it was published in Z. Once you type in X and Y, usually the selected list has narrowed down enough that you can see journal Z and a few seconds later the paper you were looking for. This is amazingly good! Last but not least, you can export all the data into an EndNote XML library (or BibTeX  etc.) and use it to make reference lists.
There is one tiny little problem: it only works on a Mac (or your iPhone but that’s not really that useful). The reason for this is that OS 10.5 has a PDF API built in (a subroutine library that can be used by any software) whereas MS is trying to push its own version of PDF. Personally, I was so hacked off with Windows Vista last year and so enamoured with Papers that I switched to the Mac. It’s the best decision I have made in a long while: I haven’t sworn at my computer since!

Every scientist should have a Researcher ID How to become a successful scientist. » Survival Blog for Scientists

Every scientist should have a Researcher ID How to become a successful scientist. » Survival Blog for Scientists:


Every scientist should have a Researcher ID

Tags: 
Posted in Tipsuseful softwareWeb 2.0
Unique author identification is a longstanding issue in scientific publishing. Currently there are a number of systems under development that promise a variety of functionalities. I am not going to give here an extensive overview of this wide range of systems, an up to date article can be found here. While a universally recognized standard such as the ISO standard International Standard Name Identifier (ISNI) system will undoubtedly be useful as a way to categorize any type of authors, artists and scientists, the practical use of an author identifier will be strongly related to the availability of linked information such as lists of publications.
Writing from my own experience I would like to discuss a particular unique author identification system which has developed into a fully functional tool: ResearcherID. The ResearcherID system has been developed by Thomson Reuters as a feature to their Web of Knowledge database. Although it can be argued that the commercial nature of this database limits its use as a standard, the system has a very clear advantage for scientific research and assessment as the resulting profile is made available in the public domain. Since summer 2011, ResearcherID has achieved arguably the most important functionality of an author identification system, namely full integration with a complete database of publications and citation metrics.
The information obtained from Web of Science can be assembled by a researcher who makes a ResearcherID profile. A limiting factor here is the requirement of access to the services of Thomson Reuters, although it is possible to upload a RIS-formatted file. Most importantly, it is possible to link your ID to all your papers including those with variations in last name and/or initials. Information assembled by the researcher can be accessed through a personal profile webpage which includes an up-to-date publication record synchronized weekly with Web of Science, and a graphical representation of citations per year and h-index. This information is now publicly available, i.e. does not require a subscription to Thomson Reuters services. Here is an example of my ownResearcherID page. Authors sharing the same name, such as James Smith, can be easily distinguished once they have registered their own unique details. These ResearcherID profiles are fed back into Web of Science where they are available as Distinct Author Sets.
So why are not all scientists yet on ResearcherID? Perhaps relatively few scientists are aware of this option, or maybe some are not inclined to cooperate with a commercial company or do not have access to the database. For people with a commonly occurring or otherwise ambiguous name, ResearcherID is probably the best way at the moment for disambiguation of their publication record in one of the major databases. As ResearcherID is now as complete as Web of Science, it can be used for job interviews or grant applications. In my opinion every scientist should get their ResearcherID as soon as possible.
More information about ResearcherID and how it links to other unique author systems can be found here. Other unique author identifier systems which are being developed are the Scopus Author Identifierand the public domain ORCID. I would be interested to hear about other experiences with these systems and what you believe will be the best option in the long run.

Do we need a WYSIWYG editor for Tex, LaTex, and AmsTex? How to become a successful scientist. » Survival Blog for Scientists

Do we need a WYSIWYG editor for Tex, LaTex, and AmsTex? How to become a successful scientist. » Survival Blog for Scientists:


Do we need a WYSIWYG editor for Tex, LaTex, and AmsTex?

Tags: 
Posted in Technical (ms word, tex)Tipsuseful software
I still remember in the 1980′s how impressed we physicists were when we discovered Tex. The program was written by Donald Knuth. The macro package Tex is so good and complete that all new developments are mere front ends and user interfaces to Tex, of which Latex and AmsTex packages are the most popular. Newer distributions deal with newer hardware, new fonts and better font management, and pdf creation, but the fundament is still Tex.
vlindervanger 232x300 Do we need a WYSIWYG editor for Tex, LaTex, and AmsTex?Those scientists, like chemists and biologists, that use an occasional mathematical formula can do without Tex. All kinds of handy add-ins allow incorporating math formula’s in standard office documents. However, if your paper has many math formula’s the Tex-way is the only solution. In the rest I will limit myself to LaTex.
Compiling
A typical Latex cycle is a source code ascii file (extension usually .tex) that is compiled by a “latex” program into a dvi (device independent file) that subsequently can be viewed or printed. The learning curve for LaTex is quite long. Opponents of the Tex-approach always complain about the lack of a WYSIWYG (What You See Is What You Get)  editor. They are used to MS Word, or alike, with a powerful Graphical User Interface (GUI). I am not going to start a discussion here about whether or not WYSIWYG and GUI’s is the best way. My opinion is that I agree with the Unix world that a graphical user interface and a WYSIWYG approach is in general inefficient for experienced users.
Math from a scratch?
A typical LaTex source file looks horrible for inexperienced users. To give an example:
\[ \int_0^\infty \sum_{l=0}^\infty\frac{ A_l ({\bf x})}{2 \pi}\]
will generate the formula:
math example Do we need a WYSIWYG editor for Tex, LaTex, and AmsTex?
Some of my students really develop math using the LaTex. I find this very difficult. What I often do is: use a pen write math, correct it, and correct it, and correct it, and then put it in LaTex.
If we would have a good WYSIWYG editor we would be able to develop math immediately from scratch into a usable tex file.
Requirements for a WYSIWYG LaTex editor
  1. The GUI interface should be user friendly, and compiling and printing should be transparent to the user. The help system should focus on how to use the editor, not on explaining LaTex, because enough good documentation exist for that.
  2. An acceptable WYSIWYG editor would have to be backward and forward compatible: that is to say it should be able to import LaTex files from any old or new Latex version.
  3. It should be able to export clean LaTex files, that is without relying on macros not being part of the standard LaTex distribution.
  4. If not open source, the licensing should be reasonable
Requirements 2 and 3 will allow authors to switch between any editor they like. Some co-authors might want to use the WYSIWG interface, others might want to use of the raw ascii interface.
The only two WYSIWYG editors for LaTeX I know of are the open sourceLyx and the commercial Windows program Scientific Word sold by software company MacKichan.
Lyx
lyx logo Do we need a WYSIWYG editor for Tex, LaTex, and AmsTex?Lyx is open source with a Unix taste, and as it is free users should hold back with their  complaints. Well, Lyx is horrible. At least on Windows. In my case the install procedure hung time after time on missing LaTex packages. The documentation is awful, unclear, scattered, inconsistent, ugly .. The printing of a document is terrible. Apparently it helps when you install Cygwin, a Unix-environment for Windows. So Lyx violates my requirement 1. It also violates requirement 2, as I tested it with at least ten bona-fide LaTex files, that were accepted by scientific journals in the past. In all cases Lyx could not handle therm and told me there were fatal errors in them. The Lyx people advise to write papers specially for the Lyx system and indeed Lyx stores the Latex information in a non-Latex file. Horrible.
Scientific Word
Scientific Word is much better than Lyx. Requirement 1 is fulfilled and requirement 2 is also fulfilled. Requirement 3 is only partly fulfilled. Rather than removing all SciWord stuff when cleanly exporting, it comments their own directives out in the source file. In addition it rearranges the LaTex original file. Moreover users that import a file, do not change it, and export it again will discover that it does not compile anymore because the proprietary file tcilatex.tex is needed. This is the wrong way. The SciWord developers should have developed a standard Latex package, perhaps call tcilatex and made it part of any Latex distribution.
I have uses SciWord a lot, but I am about to abandon it because of its licensing conditions. It is way too expensive, $525 for academic use and $180 for students. Happily one of my affiliations has a site-wide license. However, the licensing scheme is cumbersome. It is connected to one computer and it is per year. I use it on four computers. So every three months one is expiring, without issuing a warning. It always happens to me in a weekend or on a conference and then I am out of working program.
My Solution
winedt logo Do we need a WYSIWYG editor for Tex, LaTex, and AmsTex?My solution is the fountain pen again. I write my math. After it is done I use the WinEdt ascii editor with the MixTexdistribution as backend. It is fast and robust.
3 votes Cast your vote now!

Elsevier is going the wrong way How to become a successful scientist. » Survival Blog for Scientists

Elsevier is going the wrong way How to become a successful scientist. » Survival Blog for Scientists:


Elsevier is going the wrong way

Tags: 
Posted in Getting publishedWeb 2.0
Summary
elsevier logo 270x300 Elsevier is going the wrong wayReed-Elsevier’s daughter Elsevier has introduced as an experiment a new way of publishing science. The “paper” is now basically a website, in which the idea of a linear text is abandoned. The web interface implements access to text fragments, graphs, supplementary material, interview with an author, through hyperlinked tabs and mundane hyperlinks. In my opinion this development is a step backward and scientist should avoid publishing their material this way.
Elsevier’s solution to a non-existing problem
ibm balls thumb Elsevier is going the wrong wayScientist agree that way too many papers are being published. In addition commercial publishers keep on launching new journals in an already overcrowded market. The desktop-publishing  innovation has radically improved the productivity of scientists. There are many factors that hamper the progress of science, but the alleged inadequacy of present-day science publishing is not one of them.
Elsevier and the InternetFor about ten years, until 2002, I  have been an editor of the Elsevier journal Physics Letters. I have good memories of that time. Capable journal publishers with full scientific training and a journal with a beautifulflat tire thumb Elsevier is going the wrong waytypography. In those days Elsevier was lagging behind as far as using the Internet for its communication with authors, referees and editors was concerned. I remember vividly that I warned them regularly to take the world wide web seriously. I am sure many of my fellow Elsevier editors uttered the same anxiety.  I couldn’t understand why Elsevier did not react. This lack of perception on my side shows my ignorance about how multinational companies cope with major technological advances: they do not reinvent the wheel, no they just buy it. I do not know how many web-developing companies Elsevier Reed acquired, but it must have been quite a number. All the back-issues of their journals were scanned and put online. As a result Elsevier has a reasonable web performance. My respect for the management of Reed Elsevier.
I say reasonable, because it is still not great. The interface of the Elsevier portal ScienceDirect is clumsy and ugly. And if you really want to see a distasteful page, it is here: the official Elsevier web page for the press.
Elsevier and Dutch reportersEvery Dutch journalist gets a bitter taste in his mouth when the discussion comes to Elsevier. One of Reed-Elsevier’s daughters,Dagbladunie, once owned a few high-quality Dutch newspapers with a return on investment of about 13%. This return on investment was considered to be too low by the management of Reed Elsevier in 1995. The publishing company was used to higher returns with science publishing, So the company sold the Dutch newspapers. From then on these Dutch newspapers did not fare well as far as their finances were concerned. Amongst other mishaps they suffered from a raid by private equity firm APAX.
Problem 1 for science publishers: open access
The open access movement is gaining ground. The general public is getting interested in open door free access 224x300 Elsevier is going the wrong waythe issue. Why should public scientific libraries pay a fortune to get access to papers in which results are is reported of research financed by the tax payer?  It is useful to make a distinction between journals, like Nature, which are published by pure commercial publishing houses and journals, like Science, that are published by learned societies or other not-for-profit publishers. I must add that some learned societies, for instance the Optical Society of America, are in their behavior not any longer discernable from pure commercial enterprises.
The non-commercial companies have much fewer problems with open access as the commercial ones. However all publishers realize that full open access will become a fact of life.
Problem 2 for science publishers: open standards
Standardization is always a hot issue in a market economy. The market leader likes to impose his standards on the market. From the moment onprinter cartridges for less thumb Elsevier is going the wrong waythat the company has succeeded itravel plugs thumb1 Elsevier is going the wrong wayt will start to continuously change its standard, making it very cumbersome for their competitors to get a market share.These lagging companies will complain and ask governments to interfere and are continuously trying to to influence public opinion. But as soon as the situation is reversed, and one of the plaintiffs gets dominance over the market he would start to behave exactly in the same way in protecting his own standard.
Adobe is an admirable company. It has been in computer graphics and typography right from the start. It has produced postscript, a computer language to drive laser printers. Postscript was a revolution in desktop publishing. As postscript was not open-source, companies making laser printers had to pay Adobe a license fee to be able to use postscript in the firmware of the laser printer. These licensing fees made the company Adobe big.
But Adobe also introduced – in 1993 –  the portable document format(pdf). An absolute blessing for science. This format can easily deal with graphs, figures, mathematical formulas, chemical formula’s etc. Its linear and inter-page independency makes for very fast web viewing. Although Adobe still holds patents for the pdf standard it is now officially an open standard.  A pdf file is by the way the only file format that can be reasonably protected with encryption and passwords. In this respect it is superior to all Microsoft Office products.
All scientific papers are these days available as pdf files. A “reprint”  of a scientific paper is identical to a pdf file copy. A publishing company that starts a new journal will have to supply its articles in pdf format. Submitting authors regularly have to submit their paper, including figures, list of references, supplementary material as a pdf file. Referee reports are sent as pdf files. This free exchange of scientific information through pdf files is an ideal situation for scientists. But it is a nightmare for companies like Elsevier. Monopoly seekers would like to control the scientific market and they would like to impose their own standard and get rid of the pdf standard. In the present experiment Elsevier still supplies the pdf version of the paper. But for how long? This new effort of Elsevier, if successful – which god forbade -  would mean you cannot send reprints around any longer. You will have to supply your colleague with a web address of a commercial company, with – in the future very likely – paid  access to be able to access the “web-paper”.
Context of discovery
Scientists are human beings. Some get heir inspiration in church, and others while watching a ball game, or by going to a conference. One can write novels about the life of a scientist. Richard Feynman wrote amusing books about his life. By reading such literature one can learn a lot aboutfermat thumb2 Elsevier is going the wrong waythe psyche of a scientist and the sociology of the scientific community. These activities are part what is called the context of discovery. The proof of Fermat’s last theorem made John Wiles famous and a cult figure, featuring in many tv programs. All these accounts will never make it into a physics, chemistry or biology journal. And happily so.  But Elsevier’s experiment is an attempt to compromise the hard core of science. Any scientist can give an interview. The next thing is a scientist talking about his religious feelings as an explanatory introduction to his paper.
Context of justification
The body of accepted knowledge, that is the content of scientific papers when reproduced, when survived many challenges, and when finally widely accepted as true, belongs to the context of justification. Progress in science concerns increase of this knowledge. Brilliant scientific discoveries are part of this context. An interview with a scientist is not.
Didactics
If a proof of a theory is known and accepted it often can be simplified.teacherdorisday thumb Elsevier is going the wrong wayInitial mathematical derivations can take tens of pages and after a couple of years simplified proofs can be produced that take only a page or two. This simplification can be part of the context of justification. But didactics, defined as expressing the same thing in a simpler form without adding any new science, is not part of the context of justification. If the content of a paper could indeed have been better presented and explained as being done in the paper itself the authors wrote the wrong paper. Explaining the content of a scientific paper for lay people is not part of the context of justification and should be kept separate.
Mathematics has no video
I recently bought a new wireless router. I like the brand Linksys, but this company has been acquired by Cisco, so now the brand is Linksys-Cisco. The contribution of Cisco is certainly that the “improved” and more “timely” manual has become of terrible quality. The manual is extremely modern, so it is not just a simple pdf file that I could read and use to install my router. No the manual has become modernized: it is a video. I had to run this video maybe twenty times. The reason was that something in the vido was unclear, at least to me. So I had to go through the whole video over and over again. Oh I would loved to have had a linear text. With a linear text I would just have gone to the specific location in the text and read it, may be several times. That would be done in seconds in stead of the rerunning the video costing me half an hour.
The ultimate dull linear text is a pure mathematical treatise. This is a sequence of lemma’s and proofs. No interview with the author. No video. 
What is wrong with linear text?A virtue of a linear text is its extreme inflexibility. The first sentence is supposed to be the first sentence and the whole text is a serial line of arguments and presentations. Inflexibility can be a great virtue. On toll ways in France there are almost no exits. This inflexibility makes transport along these toll ways very effective. In my country, the Netherlands, every village requires its highway exit and gets supported by Parliament. As a result maximized flexibility but speed zero.
Technical problems
Present day (x)html rendering is still poor. This is easily seen in theexperimental text of Elsevier. I will give just one example. The web text of the Cell paper uses tens of times the chemical formula “Ca2+”, whereas the pdf version tells us that it should be “Ca2+”. As you can see in this post the superscript is possible in html but then ugly varying line spacings are introduced.
The text lines in the Elsevier’s web texts are much to long. This makes reading tiresome. Narrowing the window does not help as the Elsevier developers have prevented the text from wrapping. In addition the text is not fully justified but ragged right. It is well-known that fully justified text can be read quicker.
Elsevier is afraid of open discussionI am not at all saying that the context of discovery is not important for science. In this respect I like the forum discussions and comments in which scientists participate. Elsevier started its new experiment, “the article of the future” and says it welcomes feedback. Reactions can be given in two ways: through a web form or via email. But these are old-fashioned one-way communication channels. Why not open up  a forum and allow people to discuss openly? The company that claims to have invented the article of the future communicates with the community in a previous-century way.
Related Posts:

Popular