Simulatrex ✨ – Revolutionizing Social Simulations with Large Language Models.
A LLM-based multi-agent framework for conducting large-scale, social simulations
Simulatrex is an open-source Large Language Model (LLM) based simulation framework tailored for social sciences and market research.
In the following, we will delve into the realms of why there is a pressing need for more accurate simulations and what Simulatrex is capable of.
Why LLM-based simulations?
Simulations are already widely used in social sciences to apply theories and to understand underlying implications.
Currently, if you speak about simulations for example in sociology or economics, you are referring to one of these techniques:
Agent-Based Modeling (ABM):
Uses autonomous agents that follow specified rules and interact with one another or with their environment. The interactions can lead to emergent phenomena at the system level.
System Dynamics:
Focuses on the feedback loops and time delays to model the behavior of complex systems over time. It uses stocks (accumulations) and flows (transitions) for representation.
Cellular Automata:
Spatially explicit models where grid cells evolve based on a set of rules and states of neighboring cells.
Monte Carlo Simulation:
Uses randomness to solve problems that might be deterministic. It repeatedly runs simulations using random inputs within predefined constraints and then analyzes the distribution of results.
If you apply for example ABM you need to define the behavior of each agent and decisions are made by a statistical model, which is defined by the same group that defined the in- and outputs. So simulations that involve human behavior still lack accuracy.
I claim that there is a pressing need for more accurate and real-world-like simulations that are more capable of mimicking human behavior.
In a recent paper “Turning large language models into cognitive models“ by Binz et al., 20231 researchers demonstrated that large language models can be turned into cognitive models by finetuning their final layer. This implies that we can use adjusted LLMs to mimic human behavior.
Following this research, Park et al. conducted a large-scale simulation with agents using GPT 3.5-turbo as their cognitive model referring to their infamous paper “Generative Agents: Interactive Simulacra of Human Behavior”2. The agents are equipped with different personality traits and act in a virtual environment (a virtual town). The agents can perceive, plan, act, and reflect on their actions. They can even dialogue with other agents.
After running the simulation with 25 agents for two days, the agent showed emergent behavior. For evaluation purposes, each memory, action, and conversation of an agent was recorded to evaluate and understand their decision-making. They also started interviewing the agents with a panel of participants and found out that their system produces realistic actions. They believe that generative agents can be used in many interactive tools, from design software to social platforms to virtual settings.
The bigger picture
Simulations are ways to model the real world and to see what would happen if. So to say, to predict the future. They’re called the third scientific method, after empirical experiments and analytical theory.
Predicting the future has been an intriguing endeavor for humanity for millennia. Countless academics tried to build robust world models or engines to predict stock market prices or historical events.
As it turns out the results were less impressive and far away from real-world. If predictions in a large-scale, social environment turn out to be “true”, you can often explain their outcome by involving randomness.
Accurately predicting the outcomes of large, complex social systems is an unsolved problem, and if it becomes possible, accelerate our civilization in an unimaginable way. It would allow us to use forecasts to turn events in our favor (whatever that means in a macrocosm). However, I must emphasize that it would also be a very dangerous "weapon" and should be used with the utmost caution.
We are already using micro-simulations in a wide range of applications, such as in science to validate theories or in economic research to examine policy implications for specific target groups. These micro-simulations enable rapid prototyping to verify hypotheses.
With Simulatrex we want to make rapid social simulations accessible to every professional out in the world. We aim for:
Accessibility
We allow everyone to run simulations, fast. That means the setup should be simple, intuitive, and explainable. As outlined social simulations following complex environments are currently hard to conduct. It takes days to outline the starting conditions. By fine-tuning our own models, we want to rapidly decrease the setup time.
Performance
By paralleling processes we aim for the maximum speed so that simulations take hours instead of days or weeks.
Dynamic without limits
Our world is constantly changing, so it’s a preliminary requirement for a social simulation framework to model dynamic environments. Simulatrex allows that by introducing a novel event engine, that releases an event at a certain time or follows natural language-described triggers.
These form the underlying paradigms of Simulatrex. Let's see how to run a trial simulation.
Conducting a social policy simulation
Let’s assume we want to simulate the following scenario:
Policy Impact on Local Community Dynamics
A community comprising residents from various backgrounds, local businesses, and government institutions. The community is marked by regular events, public meetings, and interactions among its members.
To get started, we need to define our objectives, spin up an environment, describe our agents, and define our events.
Simulatrex uses a JSON structure to describe our simulation config.
1. Defining our objectives
First of all, we need to define our objectives. That’s basically the underlying principles we want to evaluate our simulation on.
Following our policy example here is a proposal:
Understand the immediate reactions of community stakeholders to the new EV policy.
Assess the long-term implications of the policy on local businesses.
For each objective, we can define a description (✅), a metric, and a target.
Regarding Objective 1:
Metric: Initial sentiment analysis post-announcement
Taget: 70 % positive sentiment
Regarding Objective 2:
Metric: Business adaptability and feedback after 6 months
Target: 60 % businesses adapting positively
2. Setup the environment
Next, we need to spin up the environment. This can look like:
- Type: Dynamic
- Context: Urban community with a diverse population
- Description: Community with residents of varied backgrounds, businesses, and government entities; marked by events and interactions.
- Entities: Residents, Local Businesses, Government Institutions
- Time Configuration:
- Start_Time: "2023-06-01T00:00:00",
- End_Time: "2023-12-31T23:59:59",
- Multiplier: 1000
We differentiate between static (every condition is fixed) and dynamic environments.
We describe the current context and description of the environment and introduce entities, that refer to our agents.
For simplicity, we use real timestamps for start and end times and introduce a time factor. A time factor of 1k would mean 1s in our simulation would equal a millisecond (1/1000s) in the real world.
3. Introduce our agents
Agents are the atomic units in our environment and are supposed to mimic human behavior by following certain initial conditions and described personalities.
Let’s look at an example agent:
Alice: Female, 35, Caucasian
- Type: LLM_AGENT
- Cognitive Model: gpt-4
- Language: English
- Persona: Local Resident
- Personality Description: Community organizer who collaborates with local government
- Traits: Engaged, Proactive, Community-focused
- Interests: Community events, Public welfare, Education
- Knowledge Base: Local bylaws, Community history, Public services
- Skills: Communication, Organization
- Behavior Patterns: Attends community events, Volunteers
- Past Experiences: Organized community fairs, Led resident meetings
- Societal Role: Community Organizer
- Affiliations: Local Residents' Association
- Current State: Community Organizer
- Core Memories: Organized community fairs, Led resident meetings
- Initial Conditions: Awareness at 0.9
- Group Affiliations: G1
Each agent gets its own name, gender, age, and ethnicity. Furthermore, you can define the overall personal identity as well as core and past memories. Each agent can have relations with other agents and can be part of a group.
4. Define the event engine
The event engine is also a crucial part of our simulation framework. Events are described like:
Policy announcement from City Hall
- Type: Announcement
- Content: Announcement of a new policy for electric vehicle usage
- Impact: 0.8
- Scheduled Time: 2023-07-01 at 09:00
Each event consists of a Type (which is rather flexible and simulation-specific), the content, an impact factor (between 0-1), and a scheduled time following our simulation time.
5. Run the simulation
After defining these entities along with groups, the relationships of agents and the evaluation (based on the objectives) we derive at our config:
{
"version": "0.1",
"simulation": {
"title": "Policy Impact on Local Community Dynamics",
"environment": {
"type": "DYNAMIC",
"context": "Urban community with a diverse population",
"description": "A community comprising residents from various backgrounds, local businesses, and government institutions. The community is marked by regular events, public meetings, and interactions among its members.",
"entities": ["Residents", "Local Businesses", "Government Institutions"],
"time_config": {
"start_time": "2023-06-01T00:00:00",
"end_time": "2023-12-31T23:59:59",
"time_multiplier": 1000
}
},
"agents": [
{
"id": "1",
"type": "LLM_AGENT",
"cognitive_model": "gpt-4",
"identity": {
"name": "Alice",
"age": 35,
"gender": "Female",
"ethnicity": "Caucasian",
"language": "English",
"persona": "Local Resident",
"personality_description": "Community organizer who frequently collaborates with local government.",
"traits": ["Engaged", "Proactive", "Community-focused"],
"interests": ["Community events", "Public welfare", "Education"],
"knowledge_base": [
"Local bylaws",
"Community history",
"Public services"
],
"skills": ["Communication", "Organization"],
"behavior_patterns": ["Attends community events", "Volunteers"],
"past_experiences": [
"Organized community fairs",
"Led resident meetings"
],
"societal_role": "Community Organizer",
"affiliations": ["Local Residents' Association"],
"current_state": "Community Organizer",
"core_memories": [
"Organized community fairs",
"Led resident meetings"
]
},
"initial_conditions": { "awareness": 0.9 },
"relationships": [],
"group_affiliations": ["G1"]
},
{
"id": "2",
"type": "LLM_AGENT",
"cognitive_model": "gpt-4",
"identity": {
"name": "Downtown Deli",
"age": 10,
"gender": "Not Applicable",
"ethnicity": "Not Applicable",
"language": "English",
"persona": "Local Business",
"personality_description": "A popular eatery that actively engages with community initiatives.",
"traits": ["Customer-focused", "Community-minded"],
"interests": ["Local events", "Sustainable business"],
"knowledge_base": [
"Local clientele",
"Business regulations",
"Supply chains"
],
"skills": ["Catering", "Event hosting"],
"behavior_patterns": [],
"past_experiences": [
"Hosted community events",
"Supported local causes"
],
"societal_role": "Local Business Owner",
"affiliations": ["Local Business Association"],
"current_state": "Local Business Owner",
"core_memories": []
},
"initial_conditions": { "awareness": 0.7 },
"relationships": [],
"group_affiliations": ["G2"]
},
...
],
"groups": [
{
"id": "G1",
"type": "residents_association",
"member_agent_ids": ["1"],
"metadata": {
"name": "Local Residents' Association",
"description": "A group focused on resident welfare and community initiatives."
}
},
{
"id": "G2",
"type": "business_association",
"member_agent_ids": ["2"],
"metadata": {
"name": "Local Business Association",
"description": "An association representing the interests of local businesses."
}
},
{
"id": "G3",
"type": "local_government",
"member_agent_ids": ["3"],
"metadata": {
"name": "City Hall",
"description": "The administrative and decision-making body of the city."
}
}
],
"events": [
{
"id": "E1",
"type": "announcement",
"source": "City Hall",
"content": "Announcement of a new policy aimed at encouraging electric vehicle usage within the city.",
"impact": 0.8,
"scheduled_time": "2023-07-01T09:00:00"
},
{
"id": "E2",
"type": "public_feedback",
"source": "Local Residents' Association",
"content": "A public meeting is organized to discuss the potential effects of the new policy on residents.",
"impact": 0.6,
"scheduled_time": "2023-07-15T18:00:00"
}
],
"evaluation": {
"objectives": [
{
"id": "O1",
"description": "Understand the immediate reactions of community stakeholders to the new policy.",
"metric": "Initial sentiment analysis post-announcement",
"target": "70% positive sentiment"
},
{
"id": "O2",
"description": "Assess the long-term implications of the policy on local businesses.",
"metric": "Business adaptability and feedback after 6 months",
"target": "60% businesses adapting positively"
}
],
"metrics": [
"Overall sentiment",
"Policy adaptability",
"Infrastructure and support needs"
]
}
}
}
We only need to write two lines of Python in order to run the simulation:
engine = SimulationEngine("config.json")
await engine.run()
6. Evaluation of the simulation results
To evaluate the results of our social simulation we have the following outputs:
A complete log of every concise step of our simulation
A complete log of all the agent’s outputs based on our prompt templates
We can interview our agents in a chat-like Q&A
We get a text-based evaluation based on initial objectives. In our case the initial sentiment analysis and the business adaptability rate
In the future, we aim to provide comparative analyses between our simulations and studies conducted by humans, or potentially engage in joint research initiatives
What else is Simulatrex capable of?
Simulatrex is capable of running every single social simulation that follows the previous pattern.
Raising your imagination, here are a few more examples:
Consumer Prices Simulation
Objective: Predict consumer reactions to a new product.
Setup: Create a conversational simulation where the LLM emulates potential consumers. Marketers can present the new product's features, price, and marketing strategy.
Possible outcome: The LLM provides feedback, concerns, and potential buying intentions, simulating a focus group's response without convening an actual group.
Brand Crisis Management Simulation:
Objective: Prepare for potential PR crises by modeling various scenarios.
Setup: Use the story-based scenario approach, asking the LLM to generate narratives about possible brand crises based on current market situations.
Possible outcome: Extract strategies and insights on handling different crisis scenarios, giving PR teams a rehearsal space for damage control.
Consumer Trend Forecasting:
Objective: Anticipate upcoming consumer trends based on current data.
Setup: Feed the LLM with current market research, trend reports, and consumer behavior data. Use the dynamic Q&A simulation to ask the model about potential trends in the next 1-5 years.
Possible outcome: A set of predicted trends, their potential impacts, and strategies businesses can adopt to capitalize on them.
You can find a selection of curated examples ready to run here.
Current Challenges
Defining optimal prompts
Prompts play a pivotal role in the simulation, facilitating the cognitive models to produce outputs. Although they are continually refined, there is potential for further enhancement, particularly through comparisons.
Non-determinism of Large Language models
Language models like GPT-4 utilize a parameter termed 'temperature' to regulate the randomness and determinism of their generated outputs. You can easily set this temperature to 0 to have a deterministic environment, but this would imply losing dynamics, which are explicitly wanted. So there is an open quest to better control randomness.
Inter-agent communication
Agents can converse 1:1 with each other. Further work needs to be done to allow group conversations.
Improved evaluation methods
The current evaluation methods are a work in progress and we hope to find a more illustratable evaluation technique.
Next steps
We hope that Simulatrex will help to accelerate social simulations. Our vision is to create a way for organizations to run rapid simulations en masse for whatever use case they think about. Whether testing a new product on their target audience in silico or mitigating risks by conducting crisis simulations.
We will continue addressing the challenges and aim for a more conversational approach to launching simulations. Ultimately, we want to make simulations accessible and enjoyable for all professionals.
Let’s shape the future of social simulations together.
Thank you for reading.
Try out Simulatrex here: https://github.com/simulatrex/simulatrex.
A special thanks to:
Yaroslav Shipilov and his article on Alien CoT: eliciting LLM use of non-human reasoning
Johannes Hagemann for helping to shape this idea
Kevin Liu and his article about LLM simulations
https://arxiv.org/pdf/2306.03917.pdf
https://arxiv.org/pdf/2304.03442.pdf