Artificial Intelligence

Author: Juan Angel Giraldo

As programmers or data professionals, we always use tools to automate our clients’ processes or specific flows within our work, but what happens when our tasks in writing code also become repetitive? In this opportunity, we will study the concept of code generators for data tasks, their variations, and the application of these ideas to data product development.

Generative programs

Automatic programming is a programming technique through which computer programs can be created using different abstraction mechanisms. These procedures allow us to naturally extend routine tasks in software development to abstract, high-level programs. In this way, programs can be created that support us in the process of “manufacturing” other software components in an automated and systematic way.

These generative programs serve different purposes and are implemented with various methods and tools. For example, a code generator can be used to build database access code or external service access layers.

Some of how these generative programs are implemented are:

– Code Templates: predefined code snippets that can be reused and customized for specific functionalities or patterns.

– Code Scaffolding (based on code templates)

– Domain Specific Languages (DSL)

– Model Driven Development tools (MDD)

– Metaprogramming (run-time modification)

Code generators por medio de templates

This exploration will focus on the technique associated with code templates. In this case we present four use cases where this technique is useful to speed up development times in projects where the objective is to develop data-driven products.

The strategy for creating programs with code templates is as follows:

– Establish a file template: You develop the template from which you want to create more programs systematically.

– Create a config file (optional): The template must have space to be configured according to parameters set when it is designed. The assignment of values to these parameters can be done directly in the code generator program or modularized in a config file.

– Create the code generation program: This program acts as a factory to create new programs from the established template and the determined parameters. This program receives the configuration values and returns a program that is ready to run.

Dynamic DAG generation for Airflow

Data engineering and task orchestration

Occasionally, crafting DAGs manually proves impractical. Perhaps you’re dealing with hundreds or thousands of DAGs, each performing analogous tasks with only a few parameters distinction. In such a scenario, dynamically generating DAGs emerges as the more logical approach.

File template

from airflow.decorators import dag

from airflow.operators.bash import BashOperator

from pendulum import datetime

@dag(

dag_id=dag_id_to_replace,

start_date=datetime(2023, 7, 1),

schedule=schedule_to_replace,

catchup=False,

)

def dag_from_config():

BashOperator(

task_id=”say_hello”,

bash_command=bash_command_to_replace,

env={“ENVVAR”: env_var_to_replace},

)

dag_from_config()

Config file

{

“dag_id”: “dag_file_1”,

“schedule”: “‘@daily'”,

“bash_command”: “‘echo $ENVVAR'”,

“env_var”: “‘Hello! :)'”

}

Code generator

import json

import os

import shutil

import fileinput

config_filepath = “include/dag-config/”

dag_template_filename = “include/dag-template.py”

for filename in os.listdir(config_filepath):

f = open(config_filepath + filename)

config = json.load(f)

new_filename = “dags/” + config[“dag_id”] + “.py”

shutil.copyfile(dag_template_filename, new_filename)

for line in fileinput.input(new_filename, inplace=True):

line = line.replace(“dag_id_to_replace”, “‘” + config[“dag_id”] + “‘”)

line = line.replace(“schedule_to_replace”, config[“schedule”])

line = line.replace(“bash_command_to_replace”, config[“bash_command”])

line = line.replace(“env_var_to_replace”, config[“env_var”])

print(line, end=””)

In this case, the parameter substitution logic was set up manually in the generator program. However, to make the templates more flexible and improve the consistency of the generator programs, we can use tools such as Jinja.
Jinja templating is a popular template engine for Python. It allows you to embed Python-like expressions and statements within templates, which are then rendered into dynamic content. Jinja templates contain placeholders, or variables, enclosed within double curly braces {{ }}, along with control structures like loops and conditionals.
When the template is rendered, these placeholders are replaced with actual values based on the context provided. This enables dynamic content generation for web pages, emails, configuration files, and more, making Jinja templating widely used in web development, automation, and content generation tasks.

Data quality checks for dataframes

Data engineering and data quality

An essential part of our work as data stewards is ensuring consistency and quality are maintained when moving and transforming data. This task is extensive and sometimes repetitive, depending on the source of the data and the processing. We can design a template to automate a first approximation, for example, the following piece of code does standard checks on a data frame:

File Template

import pandas as pd

data = pd.read_csv(‘{{ input_file }}’)

# Data validation functions

def check_completeness(column_name):

if data[column_name].isnull().any():

print(f”Validation failed: {column_name} contains missing values.”)

else:

print(f”Validation passed: {column_name} is complete.”)

def check_duplicates(column_name):

if data[column_name].duplicated().any():

print(f”Validation failed: {column_name} contains duplicate values.”)

else:

print(f”Validation passed: {column_name} has no duplicates.”)

def check_numeric_range(column_name, min_value, max_value):

if (data[column_name] < min_value).any() or (data[column_name] > max_value).any():

print(f”Validation failed: {column_name} values are outside the range ({min_value}, {max_value}).”)

else:

print(f”Validation passed: {column_name} values are within the range.”)

def check_string_length(column_name, max_length):

if data[column_name].str.len().max() > max_length:

print(f”Validation failed: {column_name} contains values exceeding the maximum length ({max_length}).”)

else:

print(f”Validation passed: {column_name} values are within the length limit.”)

# Perform validations

{% for column in columns_to_validate %}

check_completeness(‘{{ column }}’)

check_duplicates(‘{{ column }}’)

check_numeric_range(‘{{ column }}’, 0, 100)

check_string_length(‘{{ column }}’, 50)

{% endfor %}

Code generator

from jinja2 import Template

# Define the validation template

with open(“Template_file.template”) as f:

validation_template = f.read()

# Define parameters for template rendering

validation_params = {

‘input_file’: ‘data_to_validate.csv’,

‘columns_to_validate’: [‘column1’, ‘column2’, ‘column3’]

}

# Render the template with parameters

rendered_validation_code = Template(validation_template).render(validation_params)

print(rendered_validation_code)

Pipeline Scaffolding

Data engineering and Data pipelines

The task of designing a pipeline to feed a statistical analysis or machine learning model can vary in its steps and needs. However, there are certain steps that can be standardized and would follow a pattern as follows:

File template

import pandas as pd

from mymodules import create_engine # universal database connection

# Extract

{% if extract_from_csv %}

data = pd.read_csv(‘{{ source_file }}’)

{% elif extract_from_database %}

{% if database_engine == ‘postgresql’ %}

engine = create_engine(‘{{ source_database_url }}’, database=”postgresql”)

{% elif database_engine == ‘mysql’ %}

engine = create_engine(‘{{ source_database_url }}’, database=”mysql”)

{% endif %}

query = ‘{{ source_query }}’

data = pd.read_sql(query, con=engine)

{% endif %}

# Transform

###

# Load

{% if target_database_engine == ‘postgresql’ %}

engine = create_engine(‘{{ target_database_url }}’, database=”postgresql”)

{% elif target_database_engine == ‘mysql’ %}

engine = create_engine(‘{{ target_database_url }}’, database=”mysql”)

{% endif %}

data.to_sql(‘{{ target_table }}’, con=engine, index=False, if_exists=’replace’)

Code generator

from jinja2 import Template

with open(“Template_file.template”) as f:

etl_template = f.read()

# Define parameters for template rendering

etl_params = {

‘extract_from_csv’: False,

‘extract_from_database’: True,

‘database_engine’: ‘postgresql’,

‘source_database_url’: ‘postgresql://username:password@localhost:5432/source_database’,

‘source_query’: ‘SELECT FROM source_table WHERE date > \’2022-01-01\”,

‘target_database_engine’: ‘postgresql’,

‘target_database_url’: ‘postgresql://username:password@localhost:5432/target_database’,

‘target_table’: ‘target_table’

}

# Render the template with parameters

rendered_etl_code = Template(etl_template).render(etl_params)

print(rendered_etl_code)

Documentation Scaffolding

Finally, another task that can save us time and repetition effort is the creation of documentation for our projects. A simple version of this template could be:

Template file

# {{ script_name }} Documentation

## Purpose

## Usage

“`bash

python {{ script_name }} {{ usage_arguments }}

“`

## Dependencies

– Python 3.x

– Dependencies: {{ dependencies }}

## Configuration

– Describe any configuration settings or environment variables.

## Example

– Provide an example of how to run the script with sample inputs.

## Contributing

– Explain how others can contribute to the development of the script.

## License

– Specify the license information for the script.

Code generator

from jinja2 import Template

def get_dependencies():

try:

with open(‘requirements.txt’, ‘r’) as file:

return file.read().strip()

except FileNotFoundError:

return ‘No dependencies found’

def generate_documentation(script_name, purpose, usage_arguments):

with open(“Template_file.template”) as f:

documentation_template = f.read()

# Define parameters for template rendering

documentation_params = {

‘script_name’: script_name,

‘purpose’: purpose,

‘usage_arguments’: usage_arguments,

‘dependencies’: get_dependencies()

}

# Render the template with parameters

rendered_documentation = Template(documentation_template).render(documentation_params)

print(rendered_documentation)

if __name__ == “__main__”:

generate_documentation(

script_name=’example_script.py’,

purpose=’This script performs a specific task.’,

usage_arguments=’input_file.txt output_file.txt’

)

CONCLUSION

Although simple in implementation, these demonstrations bring to the table a whole world of possibilities for process standardization in the data-driven product or service creation space. The use of these techniques allows us to take ownership of the procedures and mold them to the needs of the project or our organization.

References

– https://en.wikipedia.org/wiki/Automatic_programming (Automatic programming)

– https://docs.astronomer.io/learn/dynamically-generating-dags (Dynamically generate DAGs in Airflow)

Juan Angel Giraldo – Data Engineer

EQUINOX

¿Qué es IA?

Descubre cómo la IA puede revolucionar tu negocio

Authors: Laura Mantilla – Santiago Ferreira

The purpose of this article is to show the results and process of our work, where we had the objective of creating an abstractive question-answering system. This means that we wanted to feed an AI with a specific topic, and this AI would be able to answer questions about such topic eloquently, responding in a way that does not use pre-imputed answers but can create an answer depending on the text it was trained upon.

To do this, we will first show some fundamental concepts and tools necessary to understand our process: large language models, Fireworks AI, LangChain, Embedingg generation models, Pinecone and Retrieval Augmented Generation. After this, we will show the methodology used to combine these things into our system and workflow. Lastly, we will show our results and conclusions.

We have reached a Minimum Viable Product (MVP) without requiring substantial computational power or financial resources.

Foundations (Key Concepts)

Large Language Models

A Large Language Model (LLM) is an artificial intelligence (AI) model designed to understand and generate human-like text. These models are trained on vast amounts of textual data from the internet, books, articles, and other sources. The training process involves exposing the model to a diverse range of language patterns, structures, and contexts to learn to predict what comes next in a given sequence of words. LLMs have several uses, for example, chatbots, content creation, and language transformation; in programming, these models can help complete or correct code. (Karpathy, A. (2023)).
Some examples of LLM’s are Meta’s LLaMa, Google’s PalM2, and OpenAI’s GPT-3 and GPT-4, which are the models that Chat GPT runs.
To create such models, it is necessary to use a great deal of data. Andrej Karpathy, a founding member of OpenAI, emphasises that the pre-training phase of the training stages of a GPT assistant takes most of the computation efforts and time. This process can take months to complete and thousands of GPUs. After this process ends, you will have the base model; you’ll have to use supervised fine-tuning, reward modelling and reinforcement learning to match your specific needs. (Microsoft Developer & Karpathy, A. (2023)).

Image1: Taken Karpathy, A. (2023) State of GPT

FIREWORKS AI

Fireworks.ai is a powerful platform designed to facilitate the utilisation of large language models (LLMs) for solving complex challenges. This platform is a valuable resource offering the tools to run, fine-tune, and share LLMs for optimal problem-solving. (LangChain Blog. (2023)).

This platform provides access to high-performance open-source models (OSS), efficient LLM inference, and state-of-the-art foundational models that can be fine-tuned to meet specific requirements. (LangChain Blog. (2023)).

Integration with LangChain:

Integrating Fireworks.ai models into the LangChain Playground simplifies access to the best-performing open-source and fine-tuned models, enabling developers to create innovative LLM workflows. (LangChain Blog. (2023)).

LangChain

LangChain is an open-source framework that makes developing abstractive question-answering (QA) systems easy. The framework provides a variety of components that can be used to create QA systems of different types.

LangChain can be used to create abstractive question-answering systems in the following ways: [4]

To generate documents: LangChain can be used to generate text documents, which can be used as answers to open or challenging questions.
To search for information: LangChain can be used to search for information from various sources, including the web, databases, and documents. This information can be used to support the answers generated by LangChain.
To measure similarity: LangChain can measure the similarity between generated answers and found information, ensuring that answers are accurate and relevant.

Embedding Generation Models

Embedding generation models are essential in modern natural language processing (NLP) and information retrieval systems. These models are designed to transform textual data, such as sentences or documents, into high-dimensional vector representations.
These vectors capture the semantic information, context, and meaning of the text, allowing for similarity comparisons, clustering, and various forms of analysis. This functionality is crucial for applications like search engines, sistemas de recomendación, and abstractive question-answering systems. (Hugging Face. (2022)).

PINECONE

Pinecone is a text vector database. A text vector database is a collection of words and phrases represented as numerical vectors. These vectors are used to measure the similarity between words and phrases. (Pinecone. (Extracted 2023) I).

In addition, this vector database uses a variety of distance metrics to measure the similarity between words and phrases. Some of the most common distance metrics that Pinecone uses are: (Pinecone. (Extracted 2023) II).

Euclidean: This metric measures the distance between two points in a plane. It is one of the most commonly used distance metrics.
Cosine: This metric is often used to find similarities between different documents. The advantage is that the scores are normalised to the [-1,1] range.
Dot product: This metric is used to multiply two vectors. It can be used to tell us how similar the two vectors are. The more positive the answer is, the closer the two vectors are in terms of their directions.

Abstractive question answering or Retrieval Augmented Generation (RAG)

Using “out of the box” LLM models for context-depending tasks will result in the model’s poor performance due to these models being “stuck in time” and lacking domain-specific knowledge. A way to tackle this problem is by fine-tuning an existing model, but this can be complicated if you don’t have the computing resources and you don’t have sufficient data. Another way to solve this problem is by implementing Retrieval-Augmented Generation (RAG). (Proser, Z. (Extracted 2023)) & (Riedal, S. et al. (2020)).
RAG is an AI framework for retrieving facts from an external knowledge base, allowing an improvement of the quality of LLM-generated responses by grounding the model on external sources of knowledge to supplement the LLM’s internal representation of information.
RAG consists of two phases: retrieval and content generation. In the first, snippets of relevant information for the user’s prompt are retrieved by search algorithms. After this, an augmented prompt is created by appending the external knowledge to the user’s prompt. In the later generative phase, the augmented prompt is fed into the LLM input, allowing it to synthesise an answer based on the model’s external knowledge and internal representation. This answer can be more engaging to the user and can be exposed, for example, in a chatbot interface. (Martineau, K. (2023)).

Methodology for Abstractive Question-Answering System: Integrating ETL, Pinecone, and Generative Model

In developing our abstractive question-answering system, we have followed a comprehensive methodology that combines data extraction, transformation, and loading (ETL), the efficiency of Pinecone for document retrieval, and the power of generative models to produce coherent abstract answers. Here is a detailed overview of our methodology:

1. ETL Phase: Data Preparation for Efficient Retrieval

The ability to efficiently extract, transform, and load data is essential for natural language processing projects. We will explore the ETL (Extraction, Transformation, and Loading) process behind this project and how to gather relevant information from the web, normalise it, and prepare it for efficient search and retrieval.

1.1 Extraction: Navigating the Web for Relevant Data

The first stage of our journey takes us to data extraction. Using tools like Selenium y Beautiful Soup in Python, we scour the web for information related to our abstractive question-answering system.

We define a primary URL and employ a CSS selector to identify and extract pertinent links. Each link is organised into a list of dictionaries with a unique key and the corresponding URL. We then access each extracted URL and extract the relevant text content using another CSS selector.

The data is stored in a list of dictionaries that includes a unique title and the extracted text from each page.

1.2 Transformation: Preparing Data for Processing

In the transformation phase, we bring the extracted data into a suitable form for further processing, cleaning and normalising the text content to ensure consistency and uniformity. Using custom functions, we create text files (.txt) from the extracted data and organise them in a specific folder. We normalise the text by removing special characters, converting it to lowercase, and eliminating duplicate spaces. Additionally, we use the NFKD technique to normalise accented characters. The normalised data is written to .txt files in a format that facilitates further processing.

1.3 Loading: Preparing Data for Search and Retrieval

The loading phase is critical to convert the normalised text files into actionable documents and prepare the data for search and retrieval. We use the LangChain library to load text documents from the .txt files into a suitable structure for processing. Long documents are split into smaller fragments to facilitate search and processing. Next, we generate embedding vectors for each text fragment using a pre-trained model from Hugging Face. Finally, we created a search index in Pinecone that enables efficient searches based on vector similarity.

2. Retrieving Relevant Documents with Pinecone

Pinecone is the powerhouse for retrieving relevant documents that lay the groundwork for generating precise and abstract answers. We leverage its retrieval capabilities to find documents that closely match the queries, and here’s how it’s implemented:

2.1 Retrieval Mechanism

The heart of our retrieval mechanism lies in a Python function explicitly designed for this purpose. This function bridges user queries and Pinecone’s vector database, ensuring that we retrieve the most contextually relevant documents.

2.2 Querying Pinecone

When a user poses a question, the query is given to Pinecone. The Pinecone index, previously prepared during the loading phase, is now ready to perform efficient searches based on vector similarity. 3

2.3 Retrieval Workflow

Here’s an overview of how the retrieval process unfolds:

User Query: A user submits a question to our system, seeking an abstract answer.
Pinecone as a Retriever: We configure Pinecone as a retriever, specifying a search type for similarity matching. This setting ensures that the retrieved documents closely match the query.
Top-k Retrieval: To offer a comprehensive response, we define the top-k parameter, determining the number of top-relevant documents we aim to retrieve. Pinecone excels in providing accurate results even when dealing with extensive datasets.
Query Execution: The query is executed in Pinecone, and the retriever swiftly identifies the most contextually similar documents, ensuring high precision.
Extracting Text Content: Once the most relevant documents are identified, we extract and compile the text content from these documents. These text snippets are invaluable for the subsequent step of generating abstract answers.

3. Abstract Answer Generation: Transforming Retrieved Information into Coherent Responses

From the description of RAG in the critical concepts section, we learned that this process consists of two parts: the retrieval, described in the last paragraph, and the generation. We can use this method to answer the user question with the documents retrieved from the Pinecone database.
In the RAG article published by Meta, they propose using the Bidirectional and Auto-Regressive Transformers (BART) model as the seq2seq generator component of the system. In our case, we use the Llama-2-chat model, which is a fine-tuned version of the model that is optimised for dialogue and comes in a range of parameter sizes (7B, 13B, and 70B).
Llama2 can be loaded via Hugging Face (which requires authentication and verifying Meta granted access to the models) or by using the llama.cpp chain from LangChain.
Either way, these models are too big to be used only with CPU, and to use, for example, Google Colab, you must use quantisation to fit the model into the free tier GPU (T4 GPU). As a solution for this, we found that Fireworks AI contains all the different versions of Llama2, and in the free plan, it allows us to make 10 requests/min, which was perfect for our prototype development. (Briggs, J.(2023)) ,(fireworks.ai. (Extracted 2023)) & (LangChain. (Extracted 2023).
Also, as we already explained, Fireworks API can be used with the LangChain API, as it will be described in the next section.

3.1. LangChain LLM chain

One of the chains that LangChain offers is the LLMchain. An LLMChain is a simple chain that adds some functionality around language models. It is used widely throughout LangChain, including in other chains and agents.

An LLMChain consists of a PromptTemplate and a language model (either an LLM or chat model). It formats the prompt template using the input key values provided (and memory key values, if available), passes the formatted string to LLM and returns the LLM output.

Calling the chain would look like:

“llm_chain = LLMChain(prompt=prompt_template, llm=gen_model)“

Note: The prompt template is the one we’ll explain in the following section, and the gen model is the model loaded from Fireworks ai.

3.2. Langchain prompts

Langchain contains a class called PromptTemplate, which, as its name states, is a prompt template for a language model. It consists of a string template that accepts a set of parameters from the user that can be used to generate a prompt for a language model. The template can be formatted using f-strings (default) or jinja2 syntax.
Llama2 can receive a prompt where you can specify the behaviour of the model and an instruction for the model like summarise, answer or another task. The format of this prompt will be:

“[INST]<<SYS>>\n System prompt \n<</SYS>>\n\n Instruction [/INST]”

Taking advantage of the f-strings capability inside the LangChain prompts, the context will be inserted in the system prompt and the query will be inserted in the instruction, that way, generating our prompt will look like

“prompt_template=PromptTemplate(template, input_variables= [‘context’, ‘question‘]“

With the system prompt and the instruction, the generated prompt would look like:

“[INST]<<SYS>>\n You are a helpful assistant that works at the company \”Equinox AI Lab\”, a company specialised in Artificial Intelligence, Data Science and Design. \n You can use this information as help to answer but answer as if you already know it: {context} \n<</SYS>>\n\n Please provide an answer to the following question based on the context you know: {question} [/INST]”,

3.3. Fireworks.ai Llama2 model

Fireworks.ai supports all versions of the Llama2 model, so we use the biggest chat variant model, the llama-v2-70b-chat. To load it into our app, we need the model name and input some arguments for the model, like the number of tokens to generate or the temperature (to avoid hallucinations from the model; this is a number close or equal to 0).

“gen_model=Fireworks(model=model_name, model_kwargs={“temperature”:temperature}“

Workflow

Based on the methodology we have applied, our abstractive question-answering system is divided into three distinct phases:

First Phase: Building the Knowledge Base

In this initial stage, we extract relevant information from documents stored in Langchain. Using the documents as outlined in our methodology, we pass them through a specialised model to generate embeddings. These embeddings are then stored in a vector database managed by Pinecone.

Second Phase: Information Retrieval

When a user enters a question into the system, the second phase comes into play. Through a retrieval function, we conduct a similarity-based search within our Pinecone vector database. Allowing us to identify the most relevant documents in response to the user’s query.

Third Phase: System Response Generation

Finally, in the third phase, the system takes the retrieved documents and the user’s provided question. Subsequently, this information is fed into our language generation model. The model leverages these data to generate an eloquent and appropriate response presented to the user.

Image2: Workflow

Results (MVP)

Following the previously mentioned workflow, we implemented a prototype of an application that provides answers to questions in an abstractive way. This application was deployed using Streamlit to assess the results achieved through our applied methodology.

Once our knowledge base was prepared, users could input questions into the system. Additionally, we provided the flexibility to adjust various hyperparameters:

Top_k: This controls the number of context passages provided to the large language model for answer generation. In other words, it determines how many documents we want to retrieve from our information retrieval system.
Temperature: Users could select a value for this hyperparameter, influencing response generation. A lower value ensures more coherent and context-aligned responses, while a higher value produces more diverse responses.
LLM Model: Users had the option to choose the generative language model to be used for generating responses to their questions. We used the model “llama-v2” trained with 70 billion parameters in this case.

Image3: MVP

This prototype allows users to experiment and evaluate the system’s performance, as well as customise their preferences to obtain answers that better suit their needs.

Image4: MVP Results

CONCLUSION

In conclusion, the methodology and workflow for building an abstractive question-answering system demonstrate a well-structured and efficient approach to the development of advanced AI-powered systems. The combination of ETL processes for data preparation, Pinecone for precise information retrieval, and generative language models for eloquent responses offers a holistic solution for addressing complex user queries.
It is crucial to note that for the system to perform optimally, one must carefully evaluate the various methods of document retrieval to ensure the highest level of accuracy and relevance in responses. As we look to the future, further research and development in this field should focus on enhancing these retrieval techniques, expanding the system’s knowledge base, and continuously fine-tuning generative models for even more contextually aware and precise answers. The adaptability of this approach makes it applicable across various domains and industries, further underlining its versatility and immense potential impact.

References

Karpathy, A. (2023) State of GPT. https://karpathy.ai/stateofgpt.pdf

Microsoft Developer (Producer) Karpathy, A. (Speaker). (2023). https://www.youtube.com/watch?v=bZQun8Y4L2A

LangChain Blog. (2023). Bringing Free OSS Models to the Playground with Fireworks AI. https://blog.langchain.dev/bringing-free-oss-models-to-the-playground-with-fireworks-ai/#:~:text=Fireworks.ai%20provides%20a%20platform,foundation%20models%20for%20fine%2Dtuning

LangChain. (Extracted 2023). Introduction. https://python.langchain.com/docs/get_started/introduction

Hugging Face. (2022). Getting Started With Embeddings. https://huggingface.co/blog/getting-started-with-embeddings

Pinecone. (Extracted 2023). Overview. https://docs.pinecone.io/docs/overview

Pinecone. (Extracted 2023). Understanding indexes. https://docs.pinecone.io/docs/indexes

Proser, Z. (Extracted 2023). Retrieval Augmented Generation (RAG): Reducing Hallucinations in GenAI Applications. https://www.pinecone.io/learn/retrieval-augmented-generation/

Riedal, S. et al. (2020). Retrieval Augmented Generation: Streamlining the creation of intelligent natural language processing models. https://ai.meta.com/blog/retrieval-augmented-generation-streamlining-the-creation-of-intelligent-natural-language-processing-models/

Martineau, K. (2023). What is retrieval-augmented generation? https://research.ibm.com/blog/retrieval-augmented-generation-RAG

Briggs, J. (Producer & Director). (2023). Better Llama 2 with Retrieval Augmented Generation (RAG). https://www.youtube.com/watch?v=ypzmPwLH_Q4

fireworks.ai. (Extracted 2023). Pricing. https://readme.fireworks.ai/page/pricing

LangChain. (Extracted 2023). Llama.cpp. https://python.langchain.com/docs/integrations/llms/llamacpp

Laura Mantilla – Data Engineer

Santiago Ferreira – Data Scientist

EQUINOX

¿Qué es IA?

Descubre cómo la IA puede revolucionar tu negocio

Authors: Catherine Cabrera – Deivid Toloza – Yulisa Niño

From recommender systems to AI-generated content

Funny, crazy, colourful and even unthinkable images have taken over the internet in recent months, from new and exotic Pokemons, celebrity meetings that no one thought possible, Donald Trump being arrested, to Pope Francis dressing out of his aesthetic. The development of artificial intelligence for the generation of graphic content has made advances that have the world wondering if an image is real or was generated with a model such as DALL-E, Craiyon, Midjourney or Stable Diffusion by Hugging Face, the most popular so far. But beyond entertainment, how could these technological advances be used? Equinox, in its goal to build artificial intelligence solutions that deliver value, initiated a research and development project focused on generative artificial intelligence to explore the possibility of creating a model with an architecture that involves a lower computational cost compared to those mentioned above.

The main objective of this project was to create a smaller-scale model that could generate graphic pieces with Equinox’s visual identity having descriptive text as input. Thus, the first approach was to use the pre-trained models of stable diffusion offered by Hugging Face.

As a bit of context, diffusion models are generative models that work by successively adding Gaussian noise to the training data and then learning to recover the data by reversing this noise process. After training, the diffusion model can be used to generate data by simply passing randomly sampled noise through the learned denoising process.

Figure 1: Stable Diffusion Process representation. Taken from: https://medium.com/@jaskaranbhatia/summarizing-the-evolution-of-diffusion-models-insights-from-three-research-papers-6889339eba4

Latent diffusion models are a special class of diffusion models and were created to perform the diffusion process in a lower dimensional space called latent space. In latent diffusion, the model is trained to perform the same process of adding noise and learning to remove this noise in a lower dimension. [1] The main components of a latent diffusion model are [2]:

Variational Autoencoder (VAE): Consisting of two parts, an encoder that takes an image as input and converts it into a low-dimensional latent representation, and a decoder that takes the latent representation and converts it back into an image. The compression-decompression performed by the encoder-decoder is not lossless. Stable diffusion can be performed without the VAE, but the reason for its use is to reduce the computation time to generate high-resolution images.
Language model: It takes the text associated with an image, processes it and encodes it into tensors that represent it numerically while preserving its meaning.
U-Net: Latent diffusion uses U-Net to gradually subtract the noise in the latent space over several steps until the desired output is reached. With each step, the amount of noise added to the latent is reduced until the final noise-free output is reached. The U-Net model takes two inputs: a. Latents with noise are the latent produced by a VAE encoder (in case an initial image is provided) with noise added in the diffusion process. It can also take a pure noise input in case we want to create a new random image based only on a textual description. b. Text embeddings: Result of the language model.

Figure 2: Latent diffusion model architecture and components. Tomado de: https://arxiv.org/pdf/2112.10752.pdf?ref=louisbouchard.ai

Thus, the latent diffusion process is as follows: Using the VAE encoder, the images are converted to a low-dimensional latent representation. In this latent space, noise is added to the latent in the diffusion process. Then, the U-Net receives as input these latents with noise and their text embeddings, predicts the noise of the latent, subtracts it, and obtains the latent without noise as output. Finally, these noise-free latent pass through the VAE decoder to obtain images again.

Building our dataset

Data is the core of every machine learning application. For our text to image generation model, labeled images were crucial. We needed images that resemble the style, colors and identity of equinox. With this criteria, nine thousand and twenty-three images were extracted from the internet using RPA techniques, which together with our internal repository of a thousand and five images constituted our dataset with a final number of ten thousand and twenty-eight files.

LABELING

Labeling is a fundamental step for the construction of a text to image generation model. It provides the model with the ability to map text descriptions to corresponding images, and to generate accurate images for text inputs when using it for inference. We developed a system of labels with the purpose of making the model learn to distinguish fundamental features of an image, such as the background, the layout and size of the figures in it, the image saturation, whether there are logos or not, among others.

Because most of the extracted images contained text, we decided to use an Object Character Recognition python library (Pytesseract) to extract said texts and created a metadata JSON file with the results, in the interest to include them as part of the label.

Due to the poor quality of the obtained texts, a python script was created to allow the team to do manual labeling in a semi-automated way. We had a hard time labeling the data with a verbal description of the images using our label system, plus a transcription of the text that it contained. The task was so vast that we never really ended.

However, we used the raw images to train our variational autoencoder, which we will delve into in the next section.

Building a Variational Autoencoder

Let’s start this section with a premise: Compressing data means that the compressing algorithm learns its features. There are two main distinctions between this kind of algorithms: lossy compression algorithms like JPEG, where some of the data is removed in the compression, and lossless compression algorithms, like the one used by PNG, where no data is removed in the compression step, and all information is restored after decompressing. [4]

This results in lighter files for JPEG, but higher quality in PNG files, reason why designers prefer to work with this type of raster image file. Every time you open an image file you call the decompressing part of the algorithm used to store it.

In machine learning we like to call this data compression process dimensional reduction, a process where we reduce the number of features that describe some data.[5]

We can think of an Autoencoder as a compressing algorithm that uses neural networks for both the encoding and the decoding steps and adds middle layers to represent the encoded data. We call these middle layers the latent space. The goal of this network is to learn the encoding-decoding scheme that loses less information.

Figure 3. Illustration of the autoencoder architecture. Taken from: https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73

This is achieved through an iterative optimization process where we feed the network with the images in our dataset, and then compare the outputs with the initial images using a loss function, to then use backpropagation to update the weights of the network.

Loss functions, as the name might intuitively say, are mathematical functions that calculate the difference between the predictions of the model and the actual values of the images we fed it with. We say values because our images are represented as vectors. We tested with Mean Squared Error (MSE) and Binary Cross Entropy (BCE), two popular loss functions in this world of machine learning.

Backpropagation refers to an algorithm that calculates the gradient of the loss function (a vector of partial derivatives that indicates the error or cost of the output layer), and then propagates the gradients backwards, computing them layer by layer using the chain rule of calculus, which provides information about how the weights need to be adjusted in order to minimize the loss.[6]

Figure 4. Visual representation of backpropagation in a neural network Taken from: https://towardsdatascience.com/understanding-backpropagation-algorithm-7bb3aa2f95fd

Now, that’s for autoencoders but, what does it mean when we add the word variational?
You see, when we train an autoencoder in the end we have just an encoder and a decoder, but not a way to generate new content, which is the purpose of all this. Variational autoencoders use a probabilistic approach. “Instead of encoding an input as a single point, we encode it as a distribution over the latent space” [5]. Then, we sample a point from the latent space that comes from that distribution and use that as input for our diffusion pipeline.

Figure 5. Images with their latent representations (input in the diffusion pipeline)

In our approach, we developed a simple architecture consisting of a neural network with variations of convolutional and linear layers for the encoder, decoder and latent space. A series of experiments were conducted to achieve the final version of the variational autoencoder. In the next figure we present some of the results of some of those versions.

Figure 6. Experiments timeline in VAE architecture development.

Final architecture looks like this:

Figure 7. Final version summary

We played with the input sizes to achieve different latent dimensions in order to make the output of the sampling in the latent space suitable as input for the U-Net.

Utilizing U-Net with Attention Mechanism in Low Scale Latent Diffusion Models

Within the context of the low-scale latent diffusion model, the U-Net architecture, enhanced by an attention mechanism, played a pivotal role in processing and synthesizing information, and it was a key component to achieve image generation. In this section, experiments within the U-Net framework will be discussed.

As part of the original framework, the integration of U-Net architecture with cross-attention blocks to facilitate image generation was explored. To achieve this, a diffusion process was implemented with a linear scheduler. This process was seamlessly integrated with a U-Net architecture featuring four layers following an encoder-decoder structure originally configured to function with image inputs of 32×32 or 64×64 dimensions, the size of the latent space of the previously build VAE. The architecture of this U-Net is detailed in figure 8.

Figure 8. Original U-Net architecture

An integral aspect of this architecture was the presence of skip connections, used to establish a link between corresponding encoder and decoder layers, ensuring the exchange of high-level information. [7]. Additionally, a cross-attention mechanism was employed to elevate the model’s grasp of contextual information, which includes both image features and associated textual metadata. This mechanism involved query, key, and value operations, enabling the generation of diverse image outputs corresponding to the provided textual descriptions. [8]

Unluckily, image generation was not successful with the original architecture, so multiple experiments expanding the attention blocks to other encoder-decoder blocks trying to increase the grasp of contextual information and thereby improving the image generation was evaluated. Additionally, different hyperparameters including the learning rate, the timesteps of the diffusion process or the beta range of the U-Net were modified. Even though multiple architectures and combinations were explored in this experiment, with an example of these being figure 9, these modifications were not translated into a better image generation, making it necessary to explore new U-Nets architectures.

Figure 9. Modified U-Net with multiple attention blocks

One architecture explored where the utilisation of residual connections between different layers, both in the downsampling and upsampling paths, was applied. [9] These connections facilitated the flow of information between layers, as they capture the difference between input and output feature maps within each block, enabling the network to emphasise the learning of fine-grained details and allowing the model to retain and reuse important features. [10] Experimentally, this showed an improvement in generated images. However, further modifications were still needed to enhance the image generation.

Finally, a Context U-Net was implemented, which distinguishes itself from prior U-Net architectures through its unique approach to data processing. The Context U-Net incorporates context labels and timestep information into the data flow by using embedding layers to transform these contextual factors into features that can be combined with the input image. (Figure 10-A). This contextual integration occurs in both the down-sampling and up-sampling paths, enhancing the model’s ability to capture complex, context-dependent features. Furthermore, this U-Net also used residual connections in conjunction with embeddings derived from context and timesteps ensures the preservation of both low-level and high-level features. (Figure 10-B) A final crucial aspect of this U-Net was the addition of scaled additional noise before the U-Net got passed to the next time step. This adjustment ensured that the U-Net could maintain the normal distribution of noise required for generating images effectively (Figure 10-C). [11]

Figure 10. A) Context U-Net architecture. B) Context and time embeddings. C) Residual noise added. Modified from: https://learn.deeplearning.ai/diffusion-models/

This U-Net was able to generate adequate images ranging from 16×16 up 64×64 in resolution with slight changes in its architecture. The choice of specific sizes was tailored to align with the expected latent space of the VAE. The higher the size, the better the quality of generated images, as they can capture more intricate details and variations (Figure 11). What’s particularly noteworthy is that these image generations were thoughtfully conditioned with text, underscoring the model’s potential to achieve a low-scale latent diffusion process.

Figure 11. Image generated of different sizes using the prompt: “Racoon”

Furthermore, the integration of the U-Net with the VAE allowed the generation of images up to 800×800 pixels and also showcased its versatility in generating contextually conditioned images, demonstrating the model’s potential to facilitate a low-scale latent diffusion process. Even though image generation is still to be improved, the synergy between the U-Net and VAE not only allows for the generation of images using few computational resources but also opens new possibilities in combining deep learning techniques for complex generative tasks.

U-Net text conditioning with the contribution of language models.

The data collected for this project are pairs of images and descriptive text. When processed with the help of a data loader, the texts are grouped in tuples according to the value defined for the batch size (which defines the number of samples to be processed before updating the internal parameters of the neural network).This tuple of texts will condition the U-Net once it has been processed by a language model.

Pre-trained language models are machine learning models designed to understand, process and generate text. These models are pre-trained on large text sets to learn natural language representations that capture linguistic structures, relationships, and meanings. Text embeddings are a fundamental part of the applications of language models. Text embeddings are numerical representations that allow words or phrases to be used in machine learning models, capturing meaning and semantic context.

Initially, the language model selected to obtain the text embeddings of the descriptions associated with the images was CLIP (Contrastive Language-Image Pretraining), due to its versatility in understanding both pictures and texts and being used in multiple natural language processing and computer vision tasks. However, during the development of the experiments carried out, the use of CLIP required cutting the descriptive texts to a maximum of 35-38 words per description since in its configuration, the context_length parameter sets at 77 the maximum value of tokens that can be processed to obtain the text embeddings. Tokens are the result of dividing a text into smaller units that can be words, subwords or even characters, CLIP tokenizes by words and counting spaces as a token.

Thus, it was necessary to explore other alternatives that would allow the use of longer texts and thinking of text processing that consumes fewer resources; the TF-IDF vectorization was the first option. This is a technique used in natural language processing to convert text documents into numerical representations: It assigns a numeric value to each word in a document based on how frequent it is in that document (TF) and how rare it is in the document set (IDF). This technique was applied using the Scikit-Learn library and its TF-IDF vectorization tools, where each document was a description, and the collection of descriptions of all images was the document set. Other pre-trained language models, such as BERT y ROBERTA were used because they had no maximum number of tokens allowed and because of their efficiency in batch processing of long texts.

The result of applying these language models to the texts to train the U-Net with attention mechanisms was not as expected since the images generated did not present any of the characteristics of the images that composed the subset of training data. Replacing this U-Net with a Context U-Net was only possible to evidence text conditioning in image generation.

Using the Context U-Net and CLIP as a text model, it was proven that image generation is successful for descriptive texts of different lengths, taking into consideration the limit of 77 tokens set by CLIP. Below are the images used in the training and generated with descriptive texts of different lengths.

Figure 12: Comparison transición images generar herramientas from texts transición different lengths using CLIP as language model

Then, the same experiment was performed with BERT y ROBERTA as language models, evaluating their results with a different image, with texts in the image, and a description of 73 words and 145 tokens.

Figure 13: Comparison of images generated from the same text using BERT and ROBERTA as a text model

In the final step, ROBERTA was selected as the language model due to its performance in the texts processing. A final experiment was carried out, integrating the VAE constructed and trained for this project, using a latent dimension of 3 channels and an image resolution of 32×32. The results are the following:

Figure 14: Image generated using desarrollo VAE, ROBERTA and Context U-Net

The exploration of different pre-trained language models allowed the use of long texts to describe the images and their processing in batches, which was fundamental within the proposed architecture. The detailed visual description of the data and obtaining meaningful text embeddings was successful thanks to these multi-purpose models, which will continue to be a fundamental component in future enhancements.

What´s next?

Considering the goals achieved with this work, promising avenues remain for enhancing the low-scale latent diffusion. One such avenue involves a deeper exploration of the U-Net architecture itself, aiming to elevate its versatility and increase its quality for image generation. Further refinements in training its components (VAE, U-Net) can enhance the quality of generated images. Additionally, adopting novel scheduling strategies in the diffusion process presents an opportunity to raise the bar regarding image quality. Moreover, expanding the latent space within the VAE is a compelling path to enrich the image generation process further. These challenges represent exciting directions in which the journey of improvement and innovation can unfold.

References

[1]: Rombach, R. Blattmann, A. Lorenz, D. Esse, P. Ommer, B. (2022, April 13). High-Resolution Image Synthesis with Latent Diffusion Models. Available at:

https://arxiv.org/pdf/2112.10752.pdf?ref=louisbouchard.a

[2]: Agrawal, A. (2022, November 9) Stable diffusion using Hugging Face. Toward Data Science. https://towardsdatascience.com/stable-diffusion-using-hugging-face-501d8dbdd8

[4]: Adobe Creative Cloud, Comparison JPEG vs PNG (no date). Available at:
https://www.adobe.com/creativecloud/file-types/image/comparison/jpeg-vs-png.html

[5]: Rocca, J., Rocca, B. (2019, Sept 23), Understanding Variational Autoencoders (VAEs),
Building, step by step, the reasoning that leads to VAEs. Available at:
https://towardsdatascience.com/understanding-variational-autoencoders

[6]: Kostadinov, S. (2019, August 8), Understanding Backpropagation Algorithm. Learn the nuts and
bolts of a neural network’s most important ingredient. Available at:
https://towardsdatascience.com/understanding-backpropagation-algorithm-7bb3aa2f95fd

[7]: Papers with code – an overview of skip connections (no date) An Overview of Skip Connections | Papers With Code. Available at: https://paperswithcode.com/methods/category/skip-connections

[8]: Parikh, J. (2023, April 28). U-Nets with attention – Jehill Parikh – Medium. Medium. https://jehillparikh.medium.com/u-nets-with-attention-c8d7e9bf2416

[9]: dtransposed. (2023, February 6). Diffusion Models – Live Coding Tutorial 2.0 [Video]. YouTube. https://www.youtube.com/watch?v=S_il77Ttrmg

[10]: He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1512.03385

[11]: DeepLearning.AI. Short Courses. (2023) How Diffusion Models Work Available at: https://learn.deeplearning.ai/diffusion-models/

Catherine Cabrera

Deivid Toloza

Yulisa Niño

EQUINOX

¿Qué es IA?

Descubre cómo la IA puede revolucionar tu negocio

Author: Juan Camilo Sarmiento

The TV and Film industry, as well as many other industries, has been full of technological advances, including IA. Most recently, a trending topic has been Generative AI in Hollywood, which represents a paradigm shift that is transforming the essence of storytelling and filmmaking. This also comes with its ethical dilemmas, as we have been able to observe from the concerns of the different workers in this industry during the latest writers’ strike. As IA continues to evolve, the industry will do as well, making the people inside of it adapt and use it to their advantage rather than replacing the people inside of this industry.

In this article, we will explore some of the technological advances that Inteligencia Artificial has had in Hollywood and how this can affect the industry’s workers. As well as highlighting the ethics behind the use of IA in the industry.

From recommender systems to AI-generated content

The use of IA in Hollywood isn’t new. As a consumer of TV and film, you probably have interacted a lot of time with sistemas de recomendación, which recommend series or films that you are likely to watch based on previous content you have watched and liked. One example of this is Netflix’s recommendations system, which they describe as the core of its product, as it provides its members personalised suggestions to reduce the amount of time and frustration to find great content to watch. (Netflix,2023)

But sistemas de recomendación are not only for consumers. Sistemas de recomendación have been used and studied in the production side of this industry. Data from social media, box office, streaming and TV have been used to train models to predict the risk of producing a movie. (The verge,2023). In 2020, for example, Warner Bros signed a deal with Cinelytic, intending to use the recommendations for decision-making assistance. An important aspect to remark on is that Cinelytic’s CEO stressed that IA was only an assistive tool, as AI cannot make any creative decisions.

Apart from that, we have seen rapid adoption of IA Generativa tools as there is a low barrier to entry, and it also exploded in popularity this year (2023) as the tech industry’s hottest new thing. And as it is trending in all sectors, it has already been used in the creative industry, with some applications like the ones that we will see next.

Use cases of Gen AI in the creative industry

In the use of IA in the production of TV shows and movies, the applications have evolved more than just image processing, for example, for scaling images, and now, IA Generativa in Hollywood can be found in different areas like script writing and storytelling, editing and post-production, visual effects and animation, video generation and voice cloning among others.

Writing and storytelling

Writing can be a challenging task, and writing scripts is the base of the film industry. Scripts are an essential tool for decision-making, serving as a tangible representation of a film’s potential and viability. And writing a script requires the ability to weave words into engaging narratives, create complex characters and build intricate worlds.

For writing and storytelling use the tool AI movie script generator.

In the case of writing, large language models (LLMs) are the main tool to generate new text. By now, you may have already heard of ChatGPT, a chat platform in which an LLM answers your questions and can help you with different tasks. An example of ChatGPT being used for a movie script is the movie The Safe Zone, released by 28 Squared Studios and Moon Ventures, which was the first-ever film written and directed by AI.

In fact, not only was the script generated by ChatGPT but a shot list was also generated to know about the camera angle, lighting and other production aspects.

Video Generation and editing

For some years now, a company by the name Runway has made different IA tools that have been used by studios to edit parts of movies or to generate films.

An example of a movie where Gen AI in Hollywood was used is Everything Everywhere All at Once. In this movie, not only machine learning algorithms were used to automate editing processes like rotoscoping for green screens, but IA Generativa tools like Stable Diffusion were used to create some of the scenes. Stable Diffusion allowed the filmmakers to create a high level of visual complexity and an otherworldly scene. The film’s multiple parallel universes and timelines were achieved by seamlessly blending different images and videos.

Taking more about the tools for video generation, Runway created Gen1, el tool that generates videos out from existing ones, and Gen2 that has the capability of creating video from text, text and images or text and videos. Using Gen2, the production company Waymark was able to create the short film The Frost, which is completely generated by IA.

Voice Cloning

Another growing application of IA Generativa in Hollywood is voice cloning, which, as its name says, involves using Machine Learning to analyse and replicate a specific individual’s speech patterns, intonation and vocal characteristics. It allows for the creation of synthetic voices that can mimic the nuances and uniqueness of a person’s voice.

Voice cloning is helpful in multiple industries, for example, education, medical studies, video game development, music, TV and film. Inside the TV and film industry, it has many possible applications, as it can be used for advertising, dubbing and localisation, animation, or bringing back voices from actors who have passed away.

An example of voice cloning used in films and TV series is The Mandalorian from Disney+, where the character Luke Skywalker was set to make an appearance in the final episode of the second season of The Mandalorian. With Mark Hamill, who interpreted Luke in the original films, being 68 years old at the time, the tool Respeecher was used to clone Mark’s younger voice.

Ethics of Gen AI in the film industry

The increasing role of IA Generativa in Hollywood filmmaking industry raises important ethical considerations that the industry and its stakeholders must address. Some of these involve authorship and the impact of the usage of IA on creatives, representation and bias, privacy and consent and the regulation of the IA Generativa systems.

With the latest writers’ strike, one point of concern was the usage of IA Generativa, as it can affect the job of some of the creatives that are not part of the big companies in this industry. Considering that now you can write complete scripts using IA, another thing that raises concern is authorship. The creative process has long been associated with human ingenuity, and attributing creative works to IA can lead to debates over copyright, intellectual property, and recognition of human talent.

Regarding representation and bias, we must consider that the different models are trained with vast amounts of data, and these datasets can sometimes have biased data*. In the case of the film industry, it can lead to issues of stereotyping, discrimination and underrepresentation in films when it is not intended. This also comes with the privacy and consent issue, as the likeliness of some of the actors and creatives can be used in some scenarios that are not good and can be used unethically to create misleading content or invade an individual’s privacy.

*To learn more about cognitive bias read this article: Handy resources to avoid cognitive biases in Data Science.

Having said all of this, the most important thing is that as IA is a growing technology, a set of rules or indications of its use must be established, as only some of the developers and users of these technologies consider the ethical implications of it.

Conclusion

The impact of IA Generativa in Hollywood and the entertainment industry is undeniably transformative. While it offers immense potential for creative enhancement and efficiency, it also brings forth a myriad of ethical considerations that must be thoughtfully addressed.

It’s crucial to emphasise that IA is primarily an assistive tool in the creative process, not a replacement for human ingenuity and likeliness. As technology evolves, it will be used more in every industry, and specifically in the film industry, it will allow us to have new ideas, characters and stories to tell. Overall, the film industry should embrace AI as a tool for innovation while preserving the essence of storytelling and human creativity, and as in every industry, we should be informed on the technology and its use, not to misuse it and make a positive change in the world.

References

Hollywood’s writers are on strike. Here’s why that matters.

All About the Writers Strike: What Does the WGA Want and Why Are They Fighting So Hard for it?

What’s the Latest on the Writers’ Strike?

Is the Hollywood writers’ strike over? The provisional deal explained

‘Bargaining for our very existence’: why the battle over AI is being fought in Hollywood

‘Embrace it or risk obsolescence’: how will AI jobs affect Hollywood?

AI & YOU #19: AI’s Toolbox in Hollywood: From Voice Cloning to Digital Re-aging

AI & YOU #18: AI is Coming for Hollywood and the Industry Should Be Worried

HOW AI VIDEO TOOLS ARE CHANGING THE FILM INDUSTRY 2023

‘This Is an Existential Threat’: Will AI Really Eliminate Actors and Ruin Hollywood? Insiders Sound Off

Hollywood is replacing artists with AI. Its future is bleak. (from 2020)

The Hollywood writers strike is over. What’s next for the writers?

Welcome to the new surreal. How AI-generated video is changing film.

How AI is bringing film stars back from the dead

YouTube: The A.I. Dilemma – March 9, 2023

Recommendations (Netflix)

How Netflix’s Recommendations System Works

Hollywood is quietly using AI to help decide which movies to make

Warner Bros. signs AI startup that claims to predict film success

https://www.cinelytic.com/

Cinelytic CEO on How A.I. Is Changing the Film Industry

Generative AI Explodes: 77.8M Users In Just Two Years, Double The Rate of Tablets

Everything Everywhere All At Once: How AI is Revolutionizing Filmmaking

The Making of Everything Everywhere All at Once’s Rock World

Runway AI: Tech Behind Everything Everywhere All At Once

‘Hollywood 2.0’: How the Rise of AI Tools Like Runway Are Changing Filmmaking

https://www.thefrostpart.one/

https://www.respeecher.com/voice-cloning-film-tv

Respeecher synthesized a younger Luke Skywalker’s voice for Disney+’s The Mandalorian

https://www.youtube.com/watch?v=o8rlVrA6XZc

Juan Camilo Sarmiento

EQUINOX

¿Qué es IA?

Descubre cómo la IA puede revolucionar tu negocio

Author: Johnatan Zamora – Data Scientist

When we start a conversation about technology and innovation, we usually consider countries like the United States, Germany, or the United Kingdom as powerhouses in the field. Rarely do we think of countries beyond America or Europe, such as India or China, as leaders in technology and innovation. One hypothetical reason for this thinking might be rooted in the distance and limited information we have about these countries, coupled with the frequent exposure to technology and innovation initiatives from the United States and Europe.
Recently, I decided to pursue my master’s degree in Inteligencia Artificial outside Colombia (my country) and chose China as my destination for studies. This country has progressively become a global economic beast, and much of this progress has been driven by technology and innovation from local companies. This caught my attention because significant names in the technology industry, like Instagram, Meta, WhatsApp, and Amazon, are unknown or less popular within China’s industry.

This doesn’t mean that these services don’t exist within Chinese society; on the opposite, they do exist, but in their “Chinese versions”, built by local technology companies that maybe we don’t know in America or Europe, but that are giants companies with their own local market. Today, I would like to introduce you to five Chinese technology companies revolutionising the industry in China and beyond.

Some history

The global economy is defined as the set of national economies and non-state organisations within each nation that are linked by international economic relations with other nations or non-governmental organisations. The global economy is one of the foundations of our modern world.

Since more than 200 years ago, when the Industrial Revolution took place and gave way to the construction of factories and mass production, a model was introduced in which goods and services produced in a country can be marketed to other countries. Mining, oil, port and other companies started to be born.

In the middle of the 20th century, with the rise of integrated circuits, the first technology companies emerged, focused on building computers and other technological devices for businesses and homes; this happened in countries that were powers of that time, such as the United States, Germany and Japan.

Entering the 21st century and at the gates of a technological revolution led by Artificial Intelligence, new technology companies emerged. Names such as Amazon, Microsoft, Google, and Tesla become significant players in the global technology landscape.

All these companies have in common the investment in innovation and original ideas aimed at exploiting the latest technological advances. These differentiators have influenced them to become the most successful and innovative technology companies in the world so far.

China joins the game

Over the years, China has slowly become a technological powerhouse and could be the only competition for the dominant tech companies.

The economic evolution of the country, which in less than forty years has transformed its agriculture-based economy into one dominated by industrial production and infrastructure, has had the positive side effects of increased investment in technology and innovation.

This is reflected in the business evolution that happened in the last decades, where the major players in the local economy are characterised by belonging to the technology sector.

Big Technology Companies

Below, we can see the most representative technology companies within the Chinese landscape.

Alibaba Group

Alibaba Group is a multinational company dedicated to e-commerce, retail, and technology. It is one of China’s oldest and most significant technology companies, providing B2B, B2C, and C2C services inside and outside China. Additionally, it offers cloud services within the country, competing with external cloud services such as Amazon AWS or Microsoft Azure.

Currently, its businesses are divided into six main groups:

Cloud Intelligence: Focused on its entire cloud business and projects involving Artificial Intelligence.
Taobao and Tmall: Similar to what Amazon is in America and Europe, but focused on the Chinese market
Local Services: Manages financial businesses within China. The most famous example is Ant Financial Services, a fintech services group that resembles the functions of a bank but is not one. Among these services are loans, insurance, and credit cards, among the most relevant. This branch also operates one of the country’s two major electronic payment gateways, Alipay.
Cainiao Smart Logistics Network: Handles all the business of Cainiao, the country’s largest and most important logistics company.
Alibaba International Digital Commerce: Manages everything related to digital commerce outside China. Alibaba has an international presence, with services gradually gaining popularity outside of China, such as AliExpress, an e-commerce platform similar to eBay that has become very popular in Europe and America in recent years.
Digital Media Entertainment: It is the most recent branch of the multinational, oriented towards exploring businesses such as content creation, events, and online distribution of audiovisual content.

TENCENT

Tencent is another company that laid the foundation for China’s technological advance. It is widely recognised for owning the country’s most commonly used messaging apps, such as QQ y WeChat. The latter is considered a ‘super app’ because, in addition to being a communication application, it also integrates a payment gateway that competes with Alipay. WeChat also allows interaction with third-party applications known as ‘mini-programs.’ We could describe it as a WhatsApp with extended features.

Tencent excels in the field of social media and video games. In the gaming sector, it owns game development studios like Riot Games, the studio responsible for League of Legends, one of today’s most popular video games. Tencent also owns the majority of Epic Games, a game development studio known for its significant titles and for being the primary developer of the Unreal Engine, a game development engine widely used by developers due to its features and affordability.

Within China, Tencent competes with Alipay in the retail sector through JD (Jing Dong), another popular e-commerce application. Additionally, Tencent is involved in other sectors, including finance, where it owns WeBank, a neobank operating entirely online.

BAIDU

Again, a well-established Chinese technological multinational company has internet-related businesses and Inteligencia Artificial services as its primary niche. In a very superficial comparison, we could describe it as the Chinese Google because its search engine, Baidu, is the most used in China, just like its location service Baidu Maps. Like the American company, Baidu offers cloud storage, an online encyclopedia, translation, video, and music services.
In the Inteligencia Artificial field, Baidu has made significant efforts for industry development. Its most prominent contribution is its deep learning framework, PaddlePaddle (Parallel Distributed Deep Learning). This framework could be compared to PyTorch or TensorFlow but is more industry-oriented. It includes toolkits that make it easier for people with less experience in the field to approach IA.
Lastly, in recent years, Baidu has focused on perfecting its autopilot functionality to offer intelligent vehicles in the future.

ByteDance

Focused on the technological services sector, this company owns one of the most popular and controversial applications of recent times due to its problems with privacy and content. We’re talking about TikTok for the international audience and Douyin for the Chinese public. Moreover, within the Asian country, it also owns Toutiao, an intelligent search engine that competes directly with Baidu. Additionally, it has game development studios that might be lessy well-known in America or Europe but have a substantial player base in Southeast Asia.

Meituan

This last company is built on a local shopping platform that offers products and services such as entertainment recommendations, restaurants, and places, as well as coupons and promotions for establishments. It could be described as a hybrid between TripAdvisor, Groupon, and Deliveroo.

In recent years, Meituan efforts in Inteligencia Artificial can be seen in collaborations with Intel to optimise TensorFlow, allowing them to enhance the distributed training of their recommendation systems. Additionally, they have acquired various Chinese startups focused on Generative Artificial Intelligence and Large Language Models (LLM) development.

To Conclude…

Due to the limited information we have about the Asian giant, we still perceive China merely as the world’s factory and a country that stands out economically due to its manufacturing industry and low-cost labour.

However, this perspective has been changing over the past few decades. While it’s true that a significant part of China’s economy groups around goods production, the service sector has experienced a meteoric rise propelled by state investments in technology and contributions from major private corporations. Automation y Inteligencia Artificial have become main protagonists in recent years.

The Chinese industry is investing in intelligent solutions that stand out and break the mould, such as the focus on autonomous vehicles with specific functions (Smart Deliveries) or collaborations like the one between JDI (the world’s leading electric car and battery manufacturer) and Tencent to enhance their autopilot systems.

Indeed, in the coming years, it won’t be surprising to hear about technological advancements from the Asian giant, which is increasingly approaching the level of Western technology companies.

References

https://www.aspi.org.au/report/mapping-more-chinas-tech-giantshttps://www.fungyuco.com/blog/china-top5-techcom/

https://www.forbes.com/sites/forbeschina/2019/07/07/2019-forbes-china-50-most-innovative-companies-full-list/?sh=4af7098b2837

Why Investing in Tencent, can be Lucrative for Growth Investors

10 facts about Alibaba Group

Johnatan Zamora – Data Scientist

EQUINOX

¿Qué es IA?

Descubre cómo la IA puede revolucionar tu negocio

Author: Jorge Salgado – Data Engineer

Video Game development is a multidisciplinary challenge, since it typically requires knowledge in different areas: programming, music, design, and art, among others. Art, in particular, is especially challenging for some people, since it requires an extensive amount of training to do it proficiently. In the following article, we will explore how we can use IA Generativa to create art assets for video games.

To illustrate this point, the following graph shows a timeline of YouTube searches in the UK, where we compare the amount of searches for “Blender tutorial”, “Unity tutorial” and “Pixel Art tutorial” in the last year. As we can see, although Unity is the most recognised game engine in the market, most people are looking up how to create art assets with Blender.

Search Google trends for various Game Development topics. Made with: https://trends.google.com/trends/

Generative Adversarial Networks

So how can IA Generativa help us to create assets for videogames? One popular approach is Generative Adversarial Networks (or GANs). It is a deep learning model with two neural networks with two objectives. In this architecture, we have a Generator and a Discriminator.

The Discriminator’s goal is to correctly classify images as fake or real, while the Generator’s goal is to produce images that could trick the Discriminator into thinking they are real. After the discriminator labels the image as fake or real, it learns the correct answer and learns from the outcome. If the Generator did not trick the Discriminator, the Discriminator gives feedback to the Generator.

This is an unsupervised learning model, meaning you only need a dataset with tangible assets to train the Discriminator.

Conditional GAN

As described before, we do not have a way to influence the Generator’s output. Here is where conditional GANs come in handy. In addition to the real images, we could label the real images, so the Discriminator learns the label alongside the images.

The point of this is to pass the label to both the Discriminator and the Generator, so the Discriminator labels images as real only if the image and the label match, which will incentivise the Generator to create certain types of images for certain labels.

As an example, if we train our model with a set of images of numbers and apply conditional GAN to the model’s training, the Generator could produce a very convincing “6”, but if it is labeled as “3”, the Discriminator will deem it as fake. If we give this label of “6” to our Discriminator as a condition, the Generator will produce images of the number six, since it will learn that this is what the Discriminator wants.

RESULTS EXAMPLES

Diagram of Stable Diffusion process. Retrieve from: How to Make Game Asset Art with AI (Free and Easy) – Stable Diffusion Tutorial 2022

Midjourney output example. Retrieved from: https://aituts.com/midjourney-pixel-art/

Tools need a hand to wield them

While GANs are not the only (or best) tool to generate videogame art assets, no tool made so far will be able to replace real artists. These technologies do not intend to replace artists or diminish the value of human creativity but rather provide a means to expedite certain aspects of art asset generation. By leveraging Generative AI, game designers can streamline their workflow, generate preliminary assets, and focus on the more intricate and unique aspects of artistry that require their specialised skills and creativity. Game designers who may lack the skills to create scenarios or music can complement their skills with these tools, allowing them to prototype some version of what they envision. Artists and musicians may also use these skills to obtain inspiration for their artwork before committing to a piece.

References

https://ieeexplore.ieee.org/abstract/document/8712070

https://soar.suny.edu/handle/20.500.12648/8677

Jorge Salgado – Data Engineer

EQUINOX

¿Qué es IA?

Descubre cómo la IA puede revolucionar tu negocio