Generative AI pricing: 3 major considerations + an AI glossary

SADA Says | Cloud Computing Blog

By Simon Margolis | Associate CTO, AI & ML

At SADA, we’re working aggressively with generative AI, and we’re excited about the potential that this technology has to transform the way we work. Google recently made some big announcements in this space, but there’s still a lot of uncertainty about the competitive landscape and economics of generative AI. 

Making decisions around generative AI pricing are part of a smart and comprehensive FinOps strategy. Let’s start by demystifying some of the main questions about generative AI pricing in particular.

1. Tokens vs. characters

Generative AI APIs are billed based on the volume of input they receive and the volume of output they generate. Different providers measure these volumes differently, with some measuring tokens and others measuring characters, but the concepts remain the same.

It’s important to understand the difference between a token and a character in order to properly compare pricing across providers. Per the OpenAI pricing documentation, a token is a unit of measure, representing approximately 4 characters. A character is a single letter, number, or symbol. For example, the word “hello” has five characters and may be slightly larger than one token.

Google’s generative AI service is priced based on the number of characters that are  input and generated. OpenAI’s generative AI service is priced based on the number of tokens consumed and generated. This is important to understand when comparing costs between these platforms.

2. Comparing models: Google’s PaLM 2 text-bison-001 vs. OpenAI’s GPT-4

Following are some examples of pricing between two models: Google’s PaLM 2 text-bison-001 and OpenAI’s GPT-4. For the sake of simplicity, we’re focusing on the input costs only. 

At Google, input and output costs are identical, whereas, at OpenAI, input costs are typically less expensive than outputs (though notably, ChatGPT has one price for both input and output; more on that later). We’ve also simplified the comparison by using a single measurement: characters. For OpenAI’s APIs, we’re using their documented reference of ~4 characters per token.

ServicePrice per 1k Characters
Google text-bison-001$0.001
OpenAI GPT-4 (8k context)$0.0075

As you can see, Google’s generative AI service is priced significantly lower than OpenAI’s. In the case of text generation, Google’s “bison” version of the PaLM 2 model is ~7.5x less expensive than the OpenAI model in the same class, GPT-4. These orders of magnitude differences in pricing for the text can make a significant impact on teams that are doing everything from testing and development to viral-app production once Google is no longer 100% discounting the PaLM API.

3. Comparing models: Google’s PaLM 2 chat-bison-001 vs. OpenAI’s gpt-3.5 turbo

A slightly simpler example is to compare the two chat APIs: Google’s PaLM 2 chat-bison-001 and OpenAI’s gpt-3.5-turbo.

ServicePrice per 1k Characters
Google chat-bison-001$0.0005
OpenAI gpt-3.5-turbo$0.0005

When it comes to the chat API, Google’s offering comes in at almost exactly the same cost. At face value, this may seem to indicate that the cost of adopting either API for chat would be identical, but that misses a key consideration–these APIs don’t exist in a vacuum. Rather, they are part of rich application environments consisting of data, application services, and–in the case of Google Cloud–the capacity to invoke a pipeline of multiple, purpose-built, additional AI APIs. These holistic application environments have been Google Cloud’s core competency for over a decade.

The (very near) future of generative AI

It’s still too early to say who will win the battle for the generative AI market. However, the early signs suggest that Google is in a strong position. While not the first to market with a text or chat generative AI API, Google has been developing AI services for many years, ranging from sentiment analysis to generative text across search, Gmail, and Google Docs. 

Generative AI has already begun to transform the way we work, and we’re thrilled to be at the forefront of this revolution alongside innovators who will define this space for years to come. 

Learn more about SADA’s generative AI services and schedule a consultation on how to best incorporate generative AI into your business. We’ll be happy to hear from you. 

Bonus: Generative AI glossary

Getting up to speed on all things related to generative AI? We’ve put together this handy glossary to stay current with all the new terminology. 

Artificial Intelligence (AI): Artificial intelligence (AI) is the ability of machines to perform tasks that are typically associated with human intelligence.

Algorithm: A set of instructions that a computer follows to solve a problem.

Attention: A mechanism in neural networks that allows them to focus on specific parts of an input sequence.

Autoencoder: A type of neural network that can learn to reconstruct its input.

Backpropagation: An algorithm used to train neural networks.

Batch size: The number of examples that are processed at once during training.

Bias: AI bias is the tendency of AI algorithms to produce results that are systematically different for certain groups.

Bleu score: A measure of the similarity between two sequences of text.

Boilerplate: A set of common phrases or words that are used in a particular context.

Brute force: A method of solving a problem by trying all possible solutions as quickly as possible

Character: A single letter, number, or symbol.

Chunk: A group of tokens that are processed together by a neural network.

Codex: A collection of texts or documents.

Confusion matrix: A table that shows how often a model correctly classifies examples.

Convolution: A mathematical operation that is used to extract features from an image.

Cost function: A measure of how well a model is performing.

Cross-entropy: A loss function that is used in classification tasks.

Data augmentation: A technique that is used to increase the size of a dataset by artificially creating new examples.

Dataset: A collection of data that is used to train a model.

Decoder: A neural network that takes as input a sequence of tokens and outputs a sequence of tokens.

Denoising autoencoder: An autoencoder that is trained on corrupted data.

Discriminator: A neural network that is used to distinguish between real and fake data.

Distributional semantics: A way of representing the meaning of words based on how they are used in context.

Embedding: A vector representation of a word or phrase.

Encoder: A neural network that takes as input a sequence of tokens and outputs a vector representation of the input.

Epoch: One complete pass through a dataset during training.

Error: The difference between a model’s predictions and the true values.

Embedding layer: A layer in a neural network that maps words or phrases to vectors.

Feature: A characteristic of an object or event.

Feature extraction: The process of identifying features from data.

Feature vector: A vector representation of a feature.

Generative AI: A type of artificial intelligence that can create new content, such as text, images, and music. It does this by learning from a large amount of data and then using that data to generate new examples that are similar to the data it has seen.

Gradient descent: An algorithm for finding the minimum of a function.

Gradient descent with momentum: A variant of gradient descent that uses a moving average of the gradients to improve convergence.

Gradient tape: A mechanism that records the gradients of a loss function with respect to the model’s parameters.

Hallucination: False, inaccurate, or misleading output from generative AI. This can happen when a program is trained on a dataset that contains incorrect or misleading information. 

Hyperparameter: A parameter that is used to control the training of a model.

Image captioning: The task of generating a natural language description of an image.

Imitation learning: A type of machine learning in which an agent learns to behave by observing a human or another agent.

Inference: The process of using a model to make predictions on new data.

Instance: A single example in a dataset.

Loss function: A function that measures the error of a model.

Machine learning: A field of computer science that gives computers the ability to learn without being explicitly programmed.

Manifold learning: A type of dimensionality reduction that maps data points to a lower-dimensional manifold.

Maximum likelihood: A method for estimating the parameters of a model.

Mean squared error (MSE): A loss function that measures the squared difference between a model’s predictions and the true values.

Mel-spectrogram: A representation of an audio signal that is used in speech recognition.

Model: A mathematical representation of a system.

Neural network: A type of machine learning model that is inspired by the structure of the human brain.

Normalization: A process of transforming data so that it has a mean of 0 and a standard deviation of 1.

One-hot encoding: A way of representing categorical data as vectors.

Optimizer: A function that is used to update the parameters of a model during training.

Overfitting: A problem that occurs when a model learns the training data too well and is unable to generalize to new data.

Parameter: A value that is used to control the behavior of a model.

Perplexity: A measure of how well a model is able to predict the next word in a sequence.

Phrase: A group of words that are used together to form a meaningful unit.

Prompt: A piece of text that is used to generate a response from a model.

Recurrent neural network (RNN): A type of neural network that can process sequences of data.

Regularization: A technique that is used to prevent overfitting.

Reinforcement learning: A type of machine learning in which an agent learns to take actions in an environment in order to maximize a reward.

Representation: A way of encoding data so that it can be used by a model.

Retrieval-based model: A type of generative model that generates text by retrieving similar text from a dataset.

Sequence-to-sequence model: A type of neural network that can translate sequences of data from one form to another.

Skip-gram: A type of word embedding that is trained on a corpus of text.

Softmax: A function that is used to normalize a set of probabilities.

Speech recognition: The task of converting spoken language into text.

Stochastic gradient descent: A variant of gradient descent that uses a random subset of the data to update the model’s parameters.

Supervised learning: A type of machine learning in which the model is trained on labeled data.

Token: A single unit of text, such as a word, character, or punctuation mark.

Training set: A set of data that is used to train a model.

Transfer learning: A technique that is used to transfer knowledge from one task to another.

Unsupervised learning: A type of machine learning in which the model is not trained on labeled data.

Value function: A function that represents the expected reward for taking a particular action in a particular state.

Word embedding: A vector representation of a word.

Word2vec: A technique for learning word embeddings.

Zero-shot learning: A type of supervised learning in which the model is not trained on examples of the target class.

Solve not just for today but for what's next.

We'll help you harness the immense power of Google Cloud to solve your business challenge and transform the way you work.

Scroll to Top