
Ollama or vLLM? How to choose the right LLM serving tool for your use case
Ollama makes it easy for developers to get started with local model experimentation, while vLLM provides a path to reliable, efficient, and scalable deployment.
Ollama makes it easy for developers to get started with local model experimentation, while vLLM provides a path to reliable, efficient, and scalable deployment.
Learn how to build a Model-as-a-Service platform with this simple demo. (Part 3 of 4)
Catch up on the most popular articles published on Red Hat Developer this year. Get insights on Linux, AI, Argo CD, virtualization, GCC 15, and more.
Learn how RamaLama's integration with libkrun and microVMs enhances AI model isolation, security, and resource efficiency for deployments.
Boost inference performance by up to 2.5X with vLLM's Eagle 3 speculative decoding integration. Discover how in this blog post.
Explore the architecture of a Models-as-a-Service (MaaS) platform and how enterprises can create a secure and scalable environment for AI models. (Part 2 of 4)
Discover how to communicate with vLLM using the OpenAI spec as implemented by the SwiftOpenAI and MacPaw/OpenAI open source projects.
Discover how model compression slashes LLM deployment costs for technical practitioners, covering quantization, pruning, distillation, and speculative decoding.
This article introduces Models-as-a-Service (MaaS) for enterprises, outlining the challenges, benefits, key technologies, and workflows. (Part 1 of 4)
Learn how to evaluate the performance of your LLM deployments with the open source GuideLLM toolkit to optimize cost, reliability, and user experience.
RamaLama's new multimodal feature integrates vision-language models with containers. Discover how it helps developers download and serve multimodal AI models.
Integrate Red Hat AI Inference Server with LangChain to build agentic document processing workflows. This article presents a use case and Python code.
Explore Red Hat Summit 2025 with Dan Russo and Repo, the Red Hat Developer mascot!
Discover how to deploy compressed, fine-tuned models for efficient inference with the new Axolotl and LLM Compressor integration.
Learn how to run vLLM on CPUs with OpenShift using Kubernetes APIs and dive into performance experiments for LLM benchmarking in this beginner-friendly guide.
Discover why Kafka is the foundation behind modular, scalable, and controllable AI automation.
Learn how to secure, observe, and control AI models at scale without code changes to simplify zero-trust deployments by using service mesh.
Enhance your Node.js AI applications with distributed tracing. Discover how to use Jaeger and OpenTelemetry for insights into Llama Stack interactions.
Learn how to deploy a Whisper model on Red Hat AI Inference Server within a RHEL 9 environment using Podman containers and NVIDIA GPUs for speech recognition.
Learn to build a chatbot leveraging vLLM for generative AI inference. This guide provides source code and steps to connect to a Llama Stack Swift SDK server.
Deploy AI at the edge with Red Hat OpenShift AI. Learn to set up OpenShift AI, configure storage, train models, and serve using KServe's RawDeployment.
Dive into the world of containers and Kubernetes with Podman Desktop, an open-source tool to empower your container development workflow, and seamlessly deploy applications to local and remote Kubernetes environments. For developers, operations, and those looking to simplify building and deploying containers, Podman Desktop provides an intuitive interface compatible with container engines such as Podman, Docker, Lima, and more.
Learn about the Podman AI Lab and how you can start using it today for testing and building AI-enabled applications. As an extension for Podman Desktop, the container & cloud-native tool for application developers and administrators, the AI Lab is your one-stop-shop for popular generative AI use cases like summarizers, chatbots, and RAG applications. In addition, from the model catalog, you can easily download and start AI models as local services on your machine. We'll cover this and more, and be sure to try out the Podman AI Lab today!
Learn to harness the power of natural language processing by creating LLM tools with Apache Camel's low-code UI. Engage with this interactive tutorial in the Developer Sandbox for a hands-on experience.