Skip to content

Memory Systems

Experimental Preview

This feature is currently in experimental preview. The implementation is scaffolded and undergoing active development. APIs and behaviors may change.

Overview

Crystalyse implements sophisticated memory systems that enable agents to maintain context, learn from interactions, and build upon previous discoveries. The memory architecture is designed specifically for materials design research workflows.

Memory Architecture

Hierarchical Memory Structure

┌────────────────────────────────────────┐
│           User Memory                  │
│    (Preferences, History, Projects)    │
├────────────────────────────────────────┤
│          Session Memory                │
│    (Current Context, Discoveries)      │
├────────────────────────────────────────┤
│         Discovery Memory               │
│    (Important Findings, Insights)      │
├────────────────────────────────────────┤
│          Working Memory                │
│      (Immediate Context, Cache)        │
└────────────────────────────────────────┘

Memory Types

1. Working Memory

Short-term memory for immediate context: - Current material composition being analysed - Recent tool outputs (SMACT, Chemeleon, MACE) - Temporary calculations and energy values - Active materials design hypotheses

Characteristics: - High-speed access - Limited capacity - Cleared after each session - Optimised for performance

2. Session Memory

Medium-term memory for ongoing conversations: - Conversation history - Analysis progression - User queries and responses - Contextual relationships

Characteristics: - Persists during session - Enables contextual understanding - Supports follow-up questions - Tracks analysis flow

3. Discovery Memory

Long-term storage for important findings: - Significant materials discoveries - Validated crystal structures - Stable material compositions - Structure-property relationships

Characteristics: - Permanent storage - Cross-session accessibility - Searchable and indexed - Quality-filtered content

4. User Memory

Personalised memory for each user: - Analysis preferences - Project history - Custom configurations - Frequently analysed materials

Characteristics: - User-specific storage - Privacy-protected - Enables personalisation - Tracks usage patterns

Memory Implementation

Storage Backends

Crystalyse supports multiple storage options:

# In-memory storage (default)
memory = InMemoryStorage()

# Redis for distributed systems
memory = RedisMemoryStorage(
    host="localhost",
    port=6379,
    db=0
)

# PostgreSQL for persistent storage
memory = PostgreSQLMemoryStorage(
    connection_string="postgresql://..."
)

# File-based for simple deployments
memory = FileMemoryStorage(
    base_path="~/.crystalyse/memory"
)

Memory Manager

The central memory management system:

from crystalyse.memory import MemoryManager

manager = MemoryManager(
    working_memory=InMemoryStorage(),
    session_memory=RedisMemoryStorage(),
    discovery_memory=PostgreSQLMemoryStorage(),
    user_memory=FileMemoryStorage()
)

Memory Operations

Storing Information

# Store in working memory
manager.working.store(
    key="current_material",
    value={
        "smiles": "CC(C)Cc1ccc(cc1)C(C)C(=O)O",
        "name": "Ibuprofen",
        "properties": {...}
    },
    ttl=300  # 5 minutes
)

# Store discovery
manager.discoveries.store(
    discovery={
        "type": "sar_relationship",
        "description": "Higher formation energy stability in perovskite structure",
        "materials": ["CaTiO3", "BaTiO3"],
        "confidence": 0.85
    }
)

Retrieving Information

# Get from working memory
current = manager.working.get("current_material")

# Search discoveries
discoveries = manager.discoveries.search(
    query="anti-inflammatory",
    filters={"confidence": {"$gte": 0.8}}
)

# Get session context
context = manager.session.get_context(
    session_id="session_123",
    last_n_messages=10
)

Memory Queries

Advanced querying capabilities:

# Semantic search in discoveries
results = manager.discoveries.semantic_search(
    "materials with high ionic conductivity",
    top_k=5
)

# Find similar analyses
similar = manager.user.find_similar_analyses(
    material="LiFePO4",
    user_id="user_123"
)

Context Management

Building Context

The memory system builds context intelligently:

context = manager.build_context(
    current_query="What about its metabolites?",
    session_id="session_123"
)

# Context includes:
# - Current material (from working memory)
# - Recent conversation (from session memory)
# - Relevant discoveries (from discovery memory)
# - User preferences (from user memory)

Context Windows

Manage context size for optimal performance:

# Configure context window
manager.configure_context(
    max_tokens=4000,
    prioritisation="recency",  # or "relevance"
    include_discoveries=True
)

# Prune old context
manager.session.prune_context(
    session_id="session_123",
    keep_last_n=20
)

Discovery System

Automatic Discovery Detection

The system automatically identifies important findings:

# Configure discovery detection
manager.configure_discovery_detection(
    min_confidence=0.7,
    categories=[
        "structure_property_relationship",
        "novel_material",
        "unexpected_formation_energy",
        "safety_concern"
    ]
)

Discovery Validation

Discoveries are validated before storage:

class DiscoveryValidator:
    def validate(self, discovery):
        # Check scientific validity
        if not self.is_chemically_valid(discovery):
            return False

        # Check novelty
        if self.exists_in_literature(discovery):
            discovery.novelty = "known"

        # Assign confidence score
        discovery.confidence = self.calculate_confidence(discovery)

        return discovery.confidence > threshold

Memory Optimisation

Caching Strategies

Efficient caching for performance:

# Configure caching
manager.configure_cache(
    strategy="lru",  # Least Recently Used
    max_size=1000,
    ttl=3600  # 1 hour
)

# Cache materials calculations
@manager.cache(key_prefix="mat_props")
def calculate_properties(smiles):
    # Expensive calculation
    return properties

Memory Compression

Reduce storage requirements:

# Enable compression
manager.enable_compression(
    algorithm="zstd",
    level=3,
    min_size=1024  # Only compress entries > 1KB
)

Indexing

Optimise search performance:

# Create indices
manager.discoveries.create_index("material_formula")
manager.discoveries.create_index("discovery_type")
manager.discoveries.create_text_index("description")

Privacy and Security

Data Isolation

User data is strictly isolated:

# Each user has isolated memory space
user_manager = manager.for_user("user_123")

# No cross-user data access
user_manager.discoveries.search(...)  # Only user's discoveries

Encryption

Sensitive data encryption:

# Enable encryption at rest
manager.enable_encryption(
    key=encryption_key,
    algorithm="AES-256-GCM"
)

Data Retention

Configurable retention policies:

# Set retention policies
manager.set_retention_policy(
    working_memory={"hours": 1},
    session_memory={"days": 7},
    discovery_memory={"days": 365},
    user_memory={"days": 730}
)

Integration with Agents

Automatic Memory Management

Agents automatically manage memory:

agent = CrystaLyseAgent(memory_manager=manager)

# Agent automatically:
# - Stores queries in session memory
# - Detects and stores discoveries
# - Builds context from all memory types
# - Manages working memory lifecycle

Custom Memory Handlers

Extend memory behaviour:

class CustomMemoryHandler:
    def on_discovery(self, discovery):
        # Custom processing
        if discovery.type == "battery_cathode":
            notify_research_team(discovery)

    def on_session_end(self, session):
        # Generate session summary
        summary = generate_summary(session)
        store_summary(summary)

agent.register_memory_handler(CustomMemoryHandler())

Best Practices

1. Memory Hygiene

  • Clear working memory between unrelated tasks
  • Prune session memory periodically
  • Validate discoveries before storage
  • Archive old user data

2. Performance Optimisation

  • Use appropriate storage backends
  • Enable caching for repeated queries
  • Index frequently searched fields
  • Monitor memory usage

3. Data Management

# Regular maintenance
manager.maintenance.run_cleanup()
manager.maintenance.optimise_indices()
manager.maintenance.validate_integrity()

4. Backup and Recovery

# Backup critical data
manager.backup(
    types=["discoveries", "user"],
    destination="s3://backups/crystalyse/"
)

# Restore from backup
manager.restore(
    source="s3://backups/crystalyse/20240115/",
    types=["discoveries"]
)

Monitoring and Analytics

Memory Metrics

Track memory system performance:

metrics = manager.get_metrics()
print(f"Total memories: {metrics.total_count}")
print(f"Storage used: {metrics.storage_gb} GB")
print(f"Query latency: {metrics.avg_latency_ms} ms")
print(f"Cache hit rate: {metrics.cache_hit_rate}%")

Usage Analytics

Understand memory patterns:

analytics = manager.get_analytics()
# Most accessed materials
# Common discovery types
# Peak usage times
# User engagement metrics

Next Steps