We recently got the news the Google Bard AI allows us to upload images and do AI stuff with them.
This is great and long overdue for ChatGPT, they bragged about it a while back but they still didn’t release it to the public. Remember the video with the napkin turned into a website? Well, it looks like Google Bard can do that now.
How does it work? You upload an image and ask it to do things related to it.
Does it work? Yes. Sometimes.
Does it tell you when it doesn’t work? NO.
How is this a problem? Well, it acts very confidently and it seems that it knows what it’s talking about. So people will have a very hard time telling when it’s wrong.
Have a look at this hilarious video I made to see for yourselves.
Video transcript below
“Hello everyone, welcome to Cyber War. I’m Victor. Currently now Bard, which is the Google language model, can handle images, which is something that ChatGPT was supposed to do. They announced it and they even showed the movie a long time ago, but um, nothing yet. It’s not available for the public, at least. So, this is the image I uploaded, ‘How to say tree in Europe’. Um, it actually managed. Let’s see if it can actually tell the difference. Okay, pretty decent. Let’s see what else you can do in terms of images.
Okay, let’s give it something uh more interesting. So, I uploaded the chess so, a chess quiz. Let’s see how that goes.
Um, so I gave it the chess, um, board, and I’m curious how it handles it. Um, looks pretty decent.
Let’s see how good the optical character recognition is. So Rook Knight Bishop, no it’s, it’s not, so this doesn’t really work.Let’s try the next thing. No, I’m curious if he recognizes this. It’s from an old news in 2014. Nope.
Let’s try this again, some of you might recognize this is from 2014. It’s from the Netherlands when they were making a funeral for um, victims over that airplane, um, that was attacked in Ukraine. Yeah, he got it, he got it correct.
Okay, so we can do map thing. You can understand its chess, but it’s not really, it doesn’t really determine what it is, but it can see historically events. It can recognize the photos.
The abstract reasoning test, that would be interesting. Okay, so it took all of it as uh, it didn’t understand that this is the question and this is the answer. So it made rules. If they Point towards a anything so it’s, it’s too smart for its own good. Maybe we can explain it. Maybe I can explain it but uh, okay, it’s really not good for this.
So, up down right left so it should be down, and it should be one two three four five six uh arrows so it’s probably, A., I would go for a. Yeah, it made a total mishmash out of it. Um, it doesn’t read it so the OCR doesn’t see it correctly. Which is, okay.
Hey, but at least they can have a conversation about the stupid stuff it reads with optical character recognition. Let’s look at that. So this is an Eurostat table. So it’s about gross domestic product, and AIC per capita. So for the audience and for me, actualized income something or, no I don’t think that’s what it is here. Anyway, um, it’s just making stuff up because it didn’t. So it read correctly GDP and AIC but how did where did it see the United States, I don’t get it. What is this? So guys, this is hallucinating, just as much as ChatGPT. It seems to be a competition on who puts out more. I don’t want to say it but .. unrealistic text. That’s it so I don’t call it %$!@#!@#.
Okay, let’s go back to something more creative because, with clear and correct data I don’t think it works. So let’s see. I gave it a photo for a dish. It’s called Mici. It’s Mici with mustard and potato and Fries. This is like a sausage without the skin and grilled. Mititei yeah same Mititei = Mici. Okay, peace. Okay, let’s get the recipe now. Okay, looks like it can do what ChatGPT does like give me recipes. What, um, other than that, let me think of something else to ask it. One second.
Okay, I’m gonna test if it follows the news. So I, I put a photo of the recently attacked Crimea Bridge, Kerch Bridge. Thanks that this is a pancake. Okay, so it thinks that the Crimea Bridge is a crepe. Then he thinks it’s a tortilla. Okay, this is funny. Is it really like a, okay so it doesn’t follow the news. Yeah, so it’s totally unaware of who’s what.
Maybe you like sports.
Alright, alright. The last one let’s play with the optical character recognition again. So this is the first, the document that is written in Romanian but it’s using old church slavonic old chirilic alphabet and it actually. No, you didn’t do that. What he did he recognized the image. So, he can’t do optical character recognition on that text. He just found the image in his database in its database and he and it, oh, if you describe that. It’s not accurate.
Yeah, so it’s uh, it’s completely hallucinating which is great.
Okay, this was great. Thank you all for watching and, it’s like not a disappointment but I I was almost certain it’s hard to, it’s something really useful out of this except for creating large disinformation campaigns, large search engine optimization spam. A lot of useless, useless content that we will be bombarded with. And you won’t be able to tell who has a point in who’s BSing you without extra efforts. Alright, have a great day everyone and please subscribe if you didn’t.”
Another funny benefit of all this, now I can ask ChatGPT to summarize the transcript, here’s what it said:
From the transcription, Victor conducted a series of tests to evaluate the capabilities of Google’s Bard AI, which now handles images. Here’s a summary of his findings:
Language Identification and Translation: The model correctly identified and translated the phrase “How to say three in Europe.”
Chess Board Analysis: The model was able to recognize a chessboard image but did not interpret the board’s state or perform any analysis of the game.
Optical Character Recognition (OCR): Victor found the OCR capabilities to be inconsistent. In some cases, it was unable to accurately read and interpret text from images.
Historical Event Recognition: Bard AI correctly identified an image related to an airplane attack in Ukraine from 2014.
Abstract Reasoning Test: The model struggled with interpreting an abstract reasoning test presented in an image. It did not correctly differentiate the question from the answer and didn’t provide a proper solution.
Statistical Data Interpretation: When presented with a Eurostat table image concerning gross domestic product (GDP), the model failed to accurately interpret the data. Victor even used the term “hallucinating” to describe the incorrect extrapolations made by the model.
Food and Recipe Interpretation: When presented with an image of a dish (Mici with mustard and potato fries), the model identified the dish and provided a corresponding recipe, similar to what ChatGPT might do.
News Event Recognition: The model failed to identify a recent news event – an attack on the Crimea Bridge. It incorrectly identified the image as a crepe and a tortilla.
Old Text Recognition: The model failed to perform OCR on an old document written in Romanian using the old church Slavonic alphabet. Instead, it recognized the image from its database and inaccurately described it.
Overall, Victor’s tests suggest that while Bard AI Image feature has some utility in handling images, its performance is inconsistent. It showed aptitude in language translation and historical event recognition, but it struggled with tasks like abstract reasoning, accurate OCR, and correctly recognizing recent news events or interpreting old texts. Victor expressed concerns that while the technology has potential, its current shortcomings could contribute to disinformation campaigns and generate irrelevant content.