3/8/2024 - Technology and Innovation

Everything we do on the Internet will be the input of an AI.

By Felipe Morales

We are often surprised by the answers provided by AIs (Artificial Intelligences), mainly generative ones. We ask ourselves and ask ourselves again in what way it will be done and on the basis of what information. Moreover, at this very moment I am thinking that I could stop writing and turn to one of these AIs to see what it tells me, but NO, I will continue thinking, reflecting and thinking like a simple human.

Continuing with the thread of the story, it is important to mention that we find answers to ALMOST all the queries (prompts) that we write to him, I include the ALMOST, since obviously he will only limit himself to answer that which does not go against good manners, that does not imply harm to third parties or their safety, and several other topics that are forbidden for moral and ethical issues. But really how does he manage to respond to ALMOST everything. Evidently it has information and somehow processes it. This is where experts mention the use of hard sciences such as mathematics, probabilities, statistics and others.

Examples of Prompts

Going to the concrete and more specific: how can you answer for example if a certain query or prompt can be qualified as positive, negative or neutral. Let's go to concrete examples, in the form of a query in the context of a class lecture on any subject:

AI model used

All the above ratings I obtained online from the link cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual, based on the model described at https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest (this is a RoBERTa-based model trained on about 124 million tweets from January 2018 to December 2021, and tuned for sentiment analysis with the TweetEval benchmark. The original Twitter-based RoBERTa model can be found here and the original benchmark document is TweetEval. This model is suitable for English, with models trained in multiple languages).


It is worth mentioning that "TweetEval is an initiative that aims to evaluate and improve natural language processing (NLP) systems in the analysis of tweets. It consists of a series of evaluation tasks covering different aspects of tweet analysis, such as emotion identification, irony detection and sentiment analysis. These types of evaluations are important for advancing the development of NLP algorithms and models that are able to effectively understand and analyze the language used in social networks."

Some of what we do on the Internet

The above paragraphs force us to think that at least this AI model has been trained with some 124 million tweets, and these tweets emerged through what millions of people have written, reacted, opined and/or commented on. In other words, everything that Twitter users in the period mentioned above, posted from their accounts, in one way or another was used to train these models. This does not necessarily imply a breach of the terms of use of this product, since it is very likely that it has included some section mentioning different possible uses.

Our contribution to AI

Moving forward in the reasoning, we see that each tweet in particular makes a contribution to AI training, and all that opinion or tweets expressed individually, becomes part of a large volume of information that is used for the training of any AI. And this is where the importance of each individual opinion lies, not only of what we have done, because we can no longer do too much, but what is important is, with this information, what we are going to do in the future.

This is just one documented example of the use of individual information, which becomes part of a whole and hence the multiple uses that can be given to them. Perhaps today we are not aware of the implications of each of our actions in any of the social networks.


This leads us to the inexorable conclusion that "Everything we do on the Internet, will be the input of some AI", either for good or for bad.

It is worth mentioning that the usual use that each of us give to the internet is very varied, but not only do we upload content in text format, but also images, audios, among others. And yes, all of this will also be fed to some AI.

Felipe Morales

Felipe Morales

I was born in Benito Juárez, Province of Buenos Aires. I live in La Plata. Camila's father.

Graduated in Computer Science at the Universidad Nacional de La Plata, Facultad de Ciencias Exactas.
Teacher at UNLP, UNAJ, UNAB, IPAP, ISFDyT N. 210, Superprof, Acamica, CoderHouse, among others. Volunteer at PMIBA La Plata Community.
CTO at Julasoft S.A.
Directorate of Municipal Economic and Financial Information Systems. Ministry of Economy. Province of Buenos Aires.
Experience in software development and computer systems. Teaching. Project management. Operations management. Programming, Artificial Intelligence and Database.

