• AI This Week
  • Posts
  • Dive Deep : Memory Augmented Large Language Models are Computationally Universal

Dive Deep : Memory Augmented Large Language Models are Computationally Universal

Attention and Memory are all you need

We have come a long way since the paper Attention is all you need was published. Could we simulate any algorithm using LLMs and memory? Can a combination of LLM and memory be Turing Complete ? The paper Memory Augmented Large Language Models are Computationally Universal presents a detailed study on enhancing the computational capabilities of transformer-based large language models (LLMs) through external memory augmentation.

Large language models (LLMs) are rapidly advancing computational boundaries, and recent research from Google Brain and the University of Alberta suggests these limits could be significantly extended. The study examines whether LLMs, when combined with memory, can simulate any algorithm and achieve Turing completeness.

This exploration into the computational potential of LLMs focuses on integrating them with associative read-write memory. The research, utilizing Flan-U-PaLM 540B, demonstrates that augmenting an LLM with this memory type enhances its computational capabilities without altering the model's original weights. Instead, the approach uses a simple stored instruction computer programmed with specific prompts, suggesting a substantial leap in the functional scope of LLMs.

It delves into the following areas:

  1. Computational Universality of LLMs with External Memory: The paper demonstrates that transformer-based LLMs, which are inherently limited due to their finite automaton nature, can achieve computational universality when augmented with an external read-write memory. This setup allows them to process arbitrarily large inputs and simulate any algorithm​​.

  2. Evolution and Limitations of LLMs: The introduction discusses the evolution of LLMs, including popular models like GPT-2, GPT-3, InstructGPT, and ChatGPT. It highlights advancements such as in-context learning and chain of thought prompting. Despite these advancements, it points out that current LLMs are limited by their ability to only process inputs of bounded length, making them comparable to finite automata​​.

  3. Stored Instruction Computer Architecture: The study introduces a concept of a stored instruction computer, where the LLM functions as a central processing unit (CPU), and an external associative memory acts as random access memory (RAM). This setup enables general computation and convenient programmability, with the associative memory mapping unique keys to values. All interactions between the LLM and the memory are limited to finite state computations​​.

  4. Compute Cycle Process: The document describes the compute cycle, where the system retrieves a prompt from memory, processes it with the LLM, and updates the memory based on the LLM's output. This cycle repeats, with the LLM effectively functioning as the CPU, processing instructions and operands to produce results that update the memory​​.

  5. Universal Turing Machine Simulation: The paper explains the concept of a universal Turing machine, which can simulate the execution of any other computing machine. The study then shows how the stored instruction computer, with specific prompt programming, can simulate a specific universal Turing machine (U152) by correctly executing conditional assignments and evaluations​​.

  6. Detailed Mechanism of Simulation: A detailed explanation is provided on how the prompt program works, including how memory locations track the Turing machine head's position and how post-processing and assignment of result strings are managed. The instruction strings are designed to mimic the logic of the corresponding states of the Turing machine, ensuring accurate simulation​​.

  7. Verification of Simulation Accuracy: The equivalence of each compute cycle of the main loop in the stored instruction computer with the compute cycle of the Turing machine U152 is claimed. This is verified by demonstrating that for each state-symbol pair in the Turing machine, the LLM produces the correct result string following the specified transitions​​.

  8. Specifics of the Language Model Used: The Flan-U-PaLM 540B model is specifically used for this study. The model has been refined with additional instruction fine-tuning, and deterministic computation is ensured through zero decoding temperature​​.

  9. Challenges and Reflections: The document concludes with reflections on the study, noting the brittleness of the language model's behavior and the challenges in engineering prompts to achieve desired results. It mentions difficulties in interpreting conditionals and simulating smaller Turing machines, indicating potential areas for improvement in LLMs​​.

Overall, the study showcases a novel approach to enhancing the computational capabilities of LLMs through external memory augmentation, enabling them to simulate complex computational models like Turing machines.