One of the things I find that takes quite a long time is writing captions for my images. So I decided to see if AI could be used to do this for me. (Short answer is no).

Using AI for Image Captioning

(Above image was AI generated in Photoshop using the prompt An AI writing a caption for a photo).

AI seems to be a marmite thing where people either love it or hate it, though I am pretty ambivalent about it. I have noticed that Google's search results AI summaries now often seem to be quite good. Whereas when they introduced them, they were nearly always full of incorrect information. So it does seem to me like it could potentially be useful.

For testing I used a visible light photo and an ultraviolet photo of a flower (for testing I used full size originals without any border / watermark):

Vis

Malva sylvestris (Common Mallow) flowers

The caption I wrote for this image was:

Purple flowers of a Common Mallow (Malva sylvestris) plant. The flowers have 5 light purple coloured petals with darker purple veins running down them. They are hermaphrodite, with the pollen bearing anthers surrounding the stigmata at the centre of the flower.

UV

Malva sylvestris (Common Mallow) flowers

The caption I wrote for this image was:

Flowers of a Common Mallow (Malva sylvestris) plant photographed in reflected ultraviolet light. The flowers have a similar tonality to that in visible light, with lighter petals that have darker stripes running down them. The base of the flower is darker in UV, and the pollen is also much darker.

You can compare my captions above to what the AI generated for the same images below. I just added the image and a prompt asking it to caption the image. Each test used a fresh context window, so the AI wasn't taking account of any previous prompts / answers.

Llama3 via Ollama

I started using Ollama with llama3, not realising that llama3 does not include any 'vision' ability, and so was responding to my textual prompt only and ignoring the image. Still, I am including what it gave me here anyway.

UV:

My prompt:

Write a botanical description for this image of Malva sylvestris flowers photographed in ultraviolet light. Include information on how the flowers' appearance differs from when viewed in visible light.

The response:

Botanical Description:

Malva sylvestris, commonly known as Common Mallow or Wild Mallow, is an herbaceous perennial plant native to Europe and Western Asia. The UV-illuminated image showcases the flowers' unique morphology under this specialized light.

Flower Morphology:

In visible light (400-700 nanometers), Malva sylvestris flowers typically display a dull greenish-yellow color with five petals, each approximately 1-2 cm in length. However, when exposed to UV radiation (360-400 nanometers), the flowers' appearance undergoes significant changes.

UV-Revealed Characteristics:

In this image, the Malva sylvestris flowers exhibit a striking, iridescent sheen, with petals displaying a range of hues from pinkish-purple to blue-violet. This coloration is a result of the presence of specific UV-absorbing compounds, such as flavonoids and anthocyanins, which are not visible in visible light.

Key Features:

  • Petals: The UV-illuminated petals appear more vibrant and elongated than their visible counterparts, with a subtle sheen that suggests the presence of microscopic ridges or hairs.
  • Sepals: The sepals, which are typically greenish-yellow in visible light, now display a deep blue-violet coloration, possibly due to the concentration of UV-absorbing pigments.
  • Stigma and Stamen: The stigma appears more prominent and bright yellow under UV illumination, while the anthers have transformed into a vibrant orange-red hue.

Significance:

The dramatic change in flower appearance when exposed to UV light highlights the importance of considering the specific lighting conditions under which flowers are viewed. This phenomenon may be relevant for plant-insect interactions, as some insects possess UV-sensitive vision and may perceive plants differently than we do.

By studying the effects of UV light on floral morphology, botanists can gain a deeper understanding of the complex relationships between plants, pollinators, and environmental factors.

The response is completely wrong, but this is not surprising since it couldn't actually see the image. Still, it would be more helpful if it would just say it doesn't have the ability to see the provided image. It is also odd that it decided the petals appear more... elongated as if a change in lighting would affect the shape of the petals.

Trying with a more basic prompt:

Write a caption for this image of Malva sylvestris.

I got the response:

Sweet serendipity! This enchanting image captures the beauty of Malva sylvestris, also known as Common Mallow or Hedge Mallow. With its delicate white petals and vibrant green foliage, this wildflower is a treat for the eyes. Native to Europe, Asia, and North Africa, Malva sylvestris thrives in open areas and along hedgerows, spreading its charm far and wide.

Which is not really suitable for an image caption, it is more like flowery (excuse the pun) marketing language.

Vis:

While I did test this with the visible light image as well, there's no point including the results here since the model can't see the image.

llava-mistral via Ollama

Next I tried with a vision based model, llava-mistral.

Vis:

My prompt:

Write a caption for this image of Malva sylvestris

The response:

I apologize for any confusion. The image provided does not appear to be related to Malva sylvestris (Common Mallow).

Malva sylvestris, or Common Mallow, is a flowering plant with showy, five-petaled flowers that are typically pink or purple in color. The calyx is greenish-yellow and consists of five sepals that are fused together. The corolla is composed of five petals, which are often slightly irregularly shaped.

The center of the flower features a prominent cluster of yellow or cream-colored stamens. Flowers are borne on long, slender stems (peduncles) that can reach up to 10 cm in height.

If you have any specific questions about Malva sylvestris or its botanical characteristics, please feel free to ask! I'd be happy to help you with more information based on your interests or requirements.

Please note that the image provided doesn't show any Common Mallow flowers. If you would like an accurate botanical description of a specific type of flowering plant, please provide the corresponding image.

I'm not sure why it didn't think the image was of M. sylvestris?

llama3.2-vision via Ollama

Then I tried with a version of llama3 that has been adjusted to add vision capability.

Vis:

My prompt:

Write a caption for this image of Malva sylvestris

The response:

The image presents a close-up view of a cluster of purple flowers, likely from the mallow family, with a blurred background. The flowers are small and delicate, with five petals each and a white center. They are arranged in a tight cluster, with some of the flowers facing towards the camera and others facing away. The stems are green and leafy, with small leaves scattered throughout the cluster.

  • The flowers are small and delicate, with five petals each and a white center.
  • They are arranged in a tight cluster, with some of the flowers facing towards the camera and others facing away.
  • The stems are green and leafy, with small leaves scattered throughout the cluster.
  • The background is blurred, but it appears to be a natural setting, possibly a garden or a field.
  • The overall effect of the image is one of beauty and serenity, with the delicate flowers and green stems creating a sense of calm and tranquility.

The image is a beautiful and peaceful representation of nature, with the delicate flowers and green stems creating a sense of calm and serenity. The blurred background adds to the sense of depth and distance, making the flowers appear even more delicate and fragile. Overall, the image is a lovely and calming visual representation of the natural world.

Again, we are getting a description of the image that is much more 'marketing speak' than what we would expect for a standard image caption. It is also overly long, a caption is normally just a couple of lines.

My prompt:

Write a botanical description for this image of Malva sylvestris flowers

The response:

The image depicts a cluster of Malva sylvestris flowers, also known as common mallow or mallows. The flowers are characterized by their:

  • Color: The flowers are a deep purple color with a lighter purple stripe in the center.
  • Shape: The flowers are shaped like a five-petaled flower, with each petal having a rounded tip.
  • Size: The flowers are relatively small, measuring around 2-3 cm in diameter.
  • Stamen: The stamen are long and thin, with a white pollen.
  • Pistil: The pistil is long and thin, with a white stigma.
  • Leaves: The leaves are small and oval-shaped, with a pointed tip.
  • Stem: The stem is thin and wiry, with a slightly hairy texture.

Overall, the Malva sylvestris flowers in the image are small, delicate, and have a distinctive purple color with a lighter purple stripe in the center. They have a five-petaled shape, with long and thin stamen and pistil. The leaves are small and oval-shaped, and the stem is thin and wiry.

While this isn't too bad, it's still not really quite what I was looking for. Personally I wouldn't describe the flowers as having a lighter purple stripe in the center.

UV:

My prompt:

Write a botanical description for this image of Malva sylvestris flowers photographed in ultraviolet light. Include information on how the flowers' appearance differs from when viewed in visible light

The response:

Botanical Description: Malva sylvestris in Ultraviolet Light

Flower Appearance

  • The flowers exhibit a unique, almost iridescent appearance in ultraviolet light.
  • The petals display a deep, rich purple color with a subtle sheen.
  • The ultraviolet light accentuates the delicate, almost translucent quality of the petals.

Petal Structure

  • The petals are thin and delicate, with a slightly wavy edge.
  • The texture is almost velvety, with a subtle sheen in ultraviolet light.

Stamen and Filaments

  • The stamen are long and thin, with a bright yellow color that stands out against the purple petals.
  • The filaments are equally long and thin, with a slightly darker yellow color.

Overall Appearance

  • The flowers appear more delicate and refined in ultraviolet light, with a subtle, almost ethereal quality.
  • The deep purple color of the petals is accentuated, making the flowers appear more vibrant and intense.

Comparison to Visible Light

  • In visible light, the flowers appear more subdued, with a more muted purple color.
  • The petals appear slightly thicker and less delicate, with a less pronounced sheen.

Conclusion

  • The Malva sylvestris flowers in ultraviolet light exhibit a unique and captivating appearance, with a deep, rich purple color and a subtle, almost iridescent sheen.
  • The delicate, almost translucent quality of the petals is accentuated, making the flowers appear more refined and ethereal.

Again, very flowery language. And it doesn't really reflect what we see in the image, the petals don't appear almost iridescent to me, and the stamens definitely don't have a bright yellow color.

My prompt:

Write a caption for this image of Malva sylvestris

The response:

The image features a close-up of a cluster of small, light purple flowers with five petals each, arranged in a tight cluster. The flowers are surrounded by dark green leaves, and the background is a blurred, dark green color.

  • The flowers are small and have five petals each.
  • They are light purple in color.
  • They are arranged in a tight cluster.
  • The leaves are dark green and surround the flowers.
  • They are small and delicate.
  • They add a pop of color to the image.
  • The background is dark green and blurred.
  • It provides a subtle contrast to the light purple flowers.
  • It helps to focus the viewer's attention on the flowers.

Overall, the image is a beautiful and intimate portrayal of a small cluster of flowers. The use of light purple and dark green colors creates a sense of contrast and visual interest, while the blurred background helps to focus the viewer's attention on the flowers themselves.

I'm not sure why it thought the background was dark green. I assume the model is just trained to think leaves are green and stamens are yellow, and so hallucinates these colors even when they are not in the image.

Florence2, GPT4o-mini, Qwen2-VL combined

This was a lot more complicated to set up than adding a model in Ollama, but all the techinical work in getting the different models to talk to each other was already done. It is using 'image_agent' from Robert Martin-Short, which you can download from rmartinshort/image_agent: Use Langraph to create a basic multimodal AI agent.

The accompanying article that describes more how the code works can be found at: A Multimodal AI Assistant: Combining Local and Cloud Models.

Vis:

My prompt:

Write a caption for this image of Malva sylvestris

The response:

A bunch of purple flowers are in the middle of the picture. There are green leaves around the flowers. The flowers have small white stamens on them.

The best so far, though it does read more like 'alt' text for an image than a caption.

My prompt:

Write a botanical description for this image of Malva sylvestris flowers

The response:

Malva sylvestris, commonly known as common mallow, features distinctive, five-petaled flowers that are typically purple or pink with darker veining. The flowers are about 2-3 cm in diameter and grow in clusters. The leaves are palmate and lobed, contributing to the plant's overall lush appearance

Again, pretty good, though I'm not sure about the plant's overall lush appearance.

UV:

My prompt:

Write a botanical description for this image of Malva sylvestris flowers photographed in ultraviolet light. Include information on how the flowers' appearance differs from when viewed in visible light

The response:

Malva sylvestris, commonly known as common mallow, features five-petaled flowers that are typically pink to purple with darker veins. In ultraviolet light, these flowers may appear more vibrant or exhibit patterns not visible in the human-visible spectrum, enhancing their attractiveness to pollinators.

Not too bad, but I feel the response is pretty generic and not really taking into account what is shown in the supplied image.

My prompt:

Write a caption for this image of Malva sylvestris

The response:

A bunch of purple flowers are in the middle of the picture. The flowers have five petals and a small center. The background is very dark. The picture was taken in the day time.

Again, not too bad, but it reads more like alt-text than an image caption. Still, I much prefer the style to the long marketing style spiel produced by llama3.

Conclusion

Maybe with a specially trained model, or a lot of time spent carefully testing and crafting prompts to get what you want out of a specific model, AI could be used for image captioning. But as it is currently, I would say it is not suitable.

The combination 'image_agent' approach might be suitable for creating alt-text to help search engines understand your images. But I don't think it's worth going to the trouble of using AI for this yourself. I would expect that Google et al will be implementing their own machine vision AI into their web crawlers soon, if they haven't already.

Leave a Reply

Your email address will not be published. Required fields are marked *