Thought Ladder
Posts
Thought Ladder 5: Latest AI news from Meta, Google and Hugging Face

Thought Ladder 5: Latest AI news from Meta, Google and Hugging Face

AI competition is heating up among the top tech companies

Omer Khalid (PhD)
May 12, 2023

12 MAY 2023

TL;DR

In the last few weeks, there has been some major developments happening in the AI space so here is a quick drill down and recap of major developments.

Meta released its segment anything and ImageBind models
Google released its Bard experimentation platform similar to ChatGPT
HuggingFace releases StarCoder, the next-generation LLM for code generation

Each of these represents major advancements in the machine learning and AI space.

Deep Dive

Meta

Meta recently announced several of its AI contributions namely; “segment anything model” (SAM) and “ImageBind” model for computer vision. Both these releases represents phenomenal progress that has been happening lately.

SAM model can cut out any object, in any image, with a single click. It’s built and launched as a promptable segmentation system without the need for additional training.

SAM model in action showcasing its object selection capabilities. Source: Meta

This is a amazing development in image recognition and object cut capabilities without the need for additional training, and there are so many application use cases of SAM especially in computer vision.

For example, it can be used for urban topography to cut out buildings, bridges etc, retail marketing to show case latest products in customer’s own homes and these objects can then also act as inputs to other models. And it’s built in decoupled fashion where one-time image encoder runs separately from lightweight mask-decoder that takes in prompt via web browser. Read the full paper and try the demo here.

ImageBind on the other hand is a model that learn from multiple forms of information across six modalities such as text, image/video, audio, sensors that records (depth, thermal and inertial measurement units).

Previously, models have existed that optimised for each of the modality but by incorporating multiple modalities into one model, it enables the machines to analysed multiple forms on information together which in many ways humans do. This is yet another step in achieving human-level machine intelligence (HLMI) which I previously wrote about.

Google

Source: Google

Google finally began to catch up with ChatGPT and released its own version of conversational generative AI chatbot based on LaMDA family language models called Bard. Bard allows you to search using images, previously text only, and connects with other Google apps as well which previously done through chrome extensions with ChatGPT.

This is all powered by Google’s PaLM 2 language model which is rolled out across several Google products (such as Gmail, Docs, Sheets etc) and can be many ways considered as a true competitor to OpenAI’s GPT-4.

Some interesting facts about PaLM 2 (more details):

built on initial work of PaLM released back in 2022
stronger in logic and reasoning due to broad training using scientific papers and web pages with mathematical expressions
trained on multilingual text spanning 100 languages which enabled it solve the harder problem of textual nuances which is common to all languages
pre-trained on large sets of open source code datasets and can excel at programming languages such as Python and JavaScript

Source: Google

Hugging Face

HF has previously released StarCoderBase, which is the one of the biggest open-source Code-LLM, trained on licensed data from GitHub in 80+ programming languages.

HF trained and fine-tuned the StarCoderBase with billions of parameters for Python language and its called StarCoder. HF benchmarked their model against the code LLM from OpenAI (code-cushman-001) and matched/surpassed its performance.

Thanks for reading Thought Ladder! Subscribe for free to receive new posts and please consider supporting my work with a voluntary subscription.

Reply

or to participate.