Byung-gon Chun’s FriendliAI Transforms Generative AI with PeriFlow

June Edition 2024

CIO Bulletin

StreetLight, transportation analysis platform

Byung-gon Chun, CEO Leads FriendliAI in Revolutionizing Generative AI Deployment with PeriFlow Cloud

FriendliAI is committed to driving innovation by providing cutting-edge inference serving engines tailored for generative AI, including LLMs. At the heart of their mission is the empowerment of clients to deploy their generative AI models effortlessly and cost-effectively, all while minimizing their environmental impact. In a world where accessibility to efficient generative AI model utilization is paramount, FriendliAI believes in democratizing this capability, making it accessible to all rather than a select few. Their range of services specializes in the automated and efficient deployment of generative AI models. FriendliAI's primary objective is to simplify the complexities associated with serving these models, enabling a wider range of companies to harness the potential of generative AI for their innovative endeavors.

PeriFlow: Revolutionizing Generative AI Serving

FriendliAI presents PeriFlow, the fastest generative AI serving engine currently available in the market. PeriFlow is designed to optimize GPU utilization and reduce costs, offering a powerful and versatile solution for serving generative AI models.

Unmatched Speed at Minimal Costs: PeriFlow stands out with its high-speed performance, ensuring efficient model serving at low operational costs. By implementing advanced multi-level optimization, scheduling, and batching techniques, PeriFlow redefines the landscape of generative AI model deployment.

Comprehensive Model Support: Benefit from PeriFlow's extensive support for various Large Language Models (LLMs) and diverse workloads. The engine is based on cutting-edge research and experience, making it a reliable choice for models such as ChatGPT, GPT-3, PaLM, OPT, BLOOM, and LLaMA, among others.

Patented Batching Technology: PeriFlow incorporates patented batching technology, safeguarded in the United States and Korea, enhancing its capability to efficiently serve generative AI models. This innovation contributes to a reduction in both costs and the complexity associated with model deployment.

Versatility in Supported Models, Decoding Options, and Data Types: PeriFlow supports an array of generative AI models, decoding options, and data types. From GPT and GPT-J to T5 and UL2, the engine accommodates models like GPT-NeoX, MPT, LLaMA, Dolly, OPT, BLOOM, CodeGen, FLAN, and more. With decoding options such as greedy, top-k, top-p, beam search, and stochastic beam search, PeriFlow ensures flexibility in model deployment. Supported data types include fp32, fp16, bf16, and int8.

Unrivaled Performance Metrics: Experience unparalleled performance as PeriFlow outperforms NVIDIA Triton+FasterTransformer in both latency and throughput when serving LLMs ranging from 1.3B to 341B. For instance, achieve a remarkable 10x throughput improvement for a GPT-3 175B model at the same level of latency.

PeriFlow by FriendliAI is poised to reshape the landscape of generative AI serving, making high-speed, cost-efficient deployment accessible for a diverse range of applications.

PeriFlow Cloud: Seamless Generative AI Model Deployment

FriendliAI introduces PeriFlow Cloud, a state-of-the-art platform tailored to simplify the deployment process of generative AI models. This platform leverages PeriFlow, FriendliAI's flagship Large Language Model (LLM) serving engine, renowned for its efficiency and versatility.

A Seamless Five-Step Deployment Process

Step 01: Creating Deployments

The deployment journey begins on the PeriFlow web interface, where users can initiate new deployments effortlessly. Each deployment is dedicated to managing the inference of a specific AI model, ensuring a focused and streamlined deployment process.

Step 02: Model Selection

PeriFlow Cloud offers users the flexibility to choose between uploading their own checkpoints or selecting from a variety of pre-existing models provided by PeriFlow. This user-friendly approach caters to diverse AI model requirements, allowing for a customized deployment experience.

Step 03: Cloud Resource Configuration

Tailoring deployments to specific project needs is made simple through the configuration of cloud resources. PeriFlow Cloud presents multiple virtual machine types across various regions, enabling users to align resources with the demands of their projects seamlessly.

Step 04: AI Model Interaction

The interactive playground provides users with a real-time testing ground to interact with their AI model. This feature facilitates dynamic testing, allowing users to gauge and refine their AI model's performance in a controlled environment.

Step 05: Deployment Monitoring

PeriFlow Cloud takes the burden off users by automating the monitoring process for deployments. Users can effortlessly track and analyze the performance of their AI models, benefiting from the supercharged engine of PeriFlow to ensure optimal results.

PeriFlow Cloud by FriendliAI promises a hassle-free and efficient experience in deploying generative AI models. With its intuitive interface and the robust PeriFlow engine, users can explore the full potential of their AI models seamlessly. Try PeriFlow Cloud today and revolutionize your approach to generative AI deployment.

About | Byung-gon Chun

Byung-gon Chun, the Founder and CEO of the company, currently holds the position of Professor in the Computer Science and Engineering Department at Seoul National University, where he is currently on sabbatical leave. With an illustrious career, he has also served as a Visiting Research Scientist at Facebook, a Principal Scientist at Microsoft, and a Research Scientist at both Yahoo! and Intel.

His academic journey includes earning a Ph.D. in Computer Science from the University of California, Berkeley, and obtaining an M.S. in Computer Science from Stanford University. With a wealth of experience and expertise, Byung-gon Chun brings a unique blend of academic and industry insights to the field of computer science and technology.