Artificial Intelligence

How to use the AI Llava model on the computer and the benefits (for us and society)

Llava is a large multimodal model with 'visual' capabilities. This is how it works

by Alessandro Longo

Le illusioni ottiche dell’intelligenza artificiale, Llava e le foto fake. Lezione 10

3' min read

3' min read

Our personal image analyser with artificial intelligence, put on our computer, at our complete disposal. This is what we would get by loading Llava, a large multimodal model with 'visual' capabilities, locally, i.e. on a computer.

Because Llava on the computer

Loading...

We tested it on the Lm Studio programme. It is true that multimodality has made great strides in recent months and is now available free of charge with OpenAi's new Gpt 4o, even on mobile apps. There are, however, several advantages to bringing such a system onto a computer, in addition to educational purposes, i.e. to study how these models work (they underpin a growing number of services, so practising with them is a good idea for the future).

Meanwhile, Gpt 4o is also available free of charge, but is limited in its functions for users who do not pay for a subscription. Some users also have an interest in keeping data and images they want to feed to the AI private on their PCs, instead of having them travel on the cloud.

Le illusioni ottiche dell’intelligenza artificiale, Llava e le foto fake. Lezione 10

With a programme like Lm Studio, this can be done quite easily. It is also a good way in general to try Llava, which when it came out (in October 2023) was also conveniently accessible for free via the web. The various online services offering it now, however, unlike last year, are overloaded and force long queues.

Let us add that Llava is open source (unlike popular AI models such as Gpt or Google's Gemini).

How to put Llava on your computer.

After downloading LM Studio, click on the magnifying glass on the left and type in Llava. Three results will come up. We download a 'model file' (the heaviest supported by our machine, as will be evident from the message on LM Studio) and the 'vision adapter'.

Then we click on AI Chat (the little cloud on the left) and add an image to the chat box (by clicking on the picture box).

What to do with Llava on a PC

We can do various things on the uploaded image.

Conversation: simple and short statements about the content of the picture. Detailed description: detailed and lengthy descriptions of the image content. Complex reasoning: plausible and logical reasons for the content of the image. This is the most complex and requires the model to follow a step-by-step logical reasoning process.

For example, we can ask him to describe a scene and then ask him if the image he sees is plausible or if it could be an image created with AI. Ask him where he thinks a certain picture was taken, to identify a place or monument. What food he sees inside the image of a refrigerator. Ask him about possible explanations for a behaviour we see in the picture (a dog barking at a car).

One possible use is in schools. Students can use it as a tool - in support of the teacher - to improve their visual reasoning skills.

Users with disabilities, such as the visually impaired, may also have particular advantages in having an image described by artificial intelligence.

It is still early days, but the potential of these models is enormous. Just think that Llava gave birth to Llava-Med, which is specialised with a corpus of medical information, to answer questions on images (x-rays for example), to help doctors or other personnel in making diagnoses. It is still experimental, but the path is promising. There are also large companies, such as Google, working on specialised artificial intelligence models for medical use, capable of analysing test images or answering questions related to a problem or possible therapy.

Many experts envisage its use especially in areas where there is a shortage of healthcare personnel. These are the same areas where a good internet connection may be lacking. This confirms the usefulness of models capable of running on computers, without a network, for various uses such as technical assistance or support for maintenance and operations on facilities. For more critical uses, such as medical ones, the open source nature of a model like Llava is an added value. It favours code transparency, protecting against possible errors and distortions that can lead to incorrect, potentially lethal results in this field.

The presence of open models is also an antidote against the concentration of power in the hands of AI big tech, in an area of public interest such as healthcare

Last point: for very sensitive images such as medical images, the privacy that can be achieved by using a model locally, instead of via the cloud, is an advantage.

We can get a first taste of this possible revolution we are heading towards by installing Llava, or other similar models, on Lm Studio on the home PC.

Copyright reserved ©

Brand connect

Loading...

Newsletter

Notizie e approfondimenti sugli avvenimenti politici, economici e finanziari.

Iscriviti