Black and white crayon drawing of a research lab
Technology

AI Lens - An Agentic System for Automated Editorial Workflows

Radu Boncea

AI Lens is an innovative AI-driven platform designed to monitor the latest advancements in science and technology, generating daily articles based on the most relevant updates. The project’s primary goal is to showcase the capability of AI agent workflows in fully automating the editorial process, from source discovery to publication. This research note delves into the components of the AI Lens system, their interdependencies, and the overarching workflow that facilitates its end-to-end automation.

You can check our demonstrator here.

Core Components of AI Lens

1. Source discovery and monitoring

The first crucial step in the AI Lens workflow involves discovering and monitoring reliable sources of information. This component is powered by a dedicated AI agent with the following functionalities:

  • Web searching and classification: The agent searches the web for relevant science and technology outlets, employing classification techniques to evaluate the credibility and relevance of potential sources.
  • Follow decision: Based on the classification, the agent decides whether to follow the source. Criteria include the availability of structured updates (e.g., RSS or Atom feeds), how well the articles align with topics decided by humans, and whether they address the frontier in technology and science.
  • Dynamic monitoring: The system continuously monitors the approved sources, ensuring a real-time feed of the latest developments in science and technology.

2. Scraping and indexing

Once sources are identified, the next step is scraping information from these monitored outlets. This component involves:

  • Content scraping: Extracting updates as they appear on the monitored sources.
  • Embedding and Vector Storage: Using ChromaDB as a vector store, scraped content is transformed into vector embeddings for efficient retrieval and organization.
  • Classification and summarization: Each article is classified into a relevant category and summarized by a specialized AI agent.
  • Topic bucketing: Articles are grouped into “buckets” based on specific topics, ensuring coherent organization of related content.

3. Bucket monitoring

An essential feature of the system is monitoring the status of the topic buckets. The AI agent performs continuous qualitative evaluations to determine when a bucket is ready for further exploration:

  • Qualitative assessment: The agent assesses whether the bucket’s content is sufficient to create a compelling article. Metrics include the presence of novel advancements and whether the information reflects the research or technological frontier.
  • Bucket readiness decision: When a bucket is deemed ready, it is flagged for the next step—research and editorial processes.
Bucket NameBucket Description
Space ExplorationMars missions, exoplanets, space telescopes, astrophysics, cosmology, black holes, dark matter, dark energy, star formation, galaxy formation, gravitational waves, the Big Bang theory, and quantum mechanics in space.
Artificial IntelligenceMachine learning, deep learning, neural networks, natural language processing, computer vision, robotics, self-driving cars, and AI ethics.
Quantum ComputingQuantum supremacy, quantum algorithms, quantum entanglement, quantum teleportation, quantum cryptography, and quantum error correction.
BiotechnologyCRISPR, gene editing, gene therapy, stem cells, cloning, synthetic biology, and biotech startups.
CybersecurityEthical hacking, ransomware, malware, zero-day vulnerabilities, encryption, network security, IoT security, and cybersecurity policies.
Renewable EnergySolar power, wind energy, hydropower, geothermal energy, bioenergy, energy storage, and smart grids.
Healthcare InnovationsTelemedicine, wearable health tech, digital health platforms, AI in diagnostics, personalized medicine, and health data privacy.
Blockchain and CryptocurrenciesDecentralized finance (DeFi), smart contracts, NFT marketplaces, blockchain scalability, and cryptocurrency regulation.
Robotics and AutomationIndustrial robots, service robots, autonomous systems, robotic process automation, human-robot interaction, and swarm robotics.
Augmented and Virtual RealityMixed reality applications, AR in retail, VR gaming, virtual collaboration tools, and immersive experiences in training.
Internet of Things (IoT)Smart homes, smart cities, industrial IoT, IoT security challenges, edge computing, and sensor networks.

Examples of buckets or topics monitored in our demonstrator. The system can propose new buckets if it identifies a sufficient number of articles that do not align with existing topics.

4. Research and editorial process

The editorial process represents the system’s core intellectual contribution. Here, the AI agent performs several advanced tasks:

  • Article planning: The agent creates a detailed plan for writing an article, adhering to the bucket’s topic and scope.
  • Additional research: Conducting supplementary research, the agent searches the internet for further references and contextual information.
  • Draft writing: Using the bucket content and additional research, the agent writes a coherent draft.
  • Review and refinement: The draft undergoes an automated review process to ensure accuracy, readability, and relevance.

5. Publication decision

The final component of the workflow is deciding when and how to publish the article. This involves:

  • Category assignment: The agent determines the appropriate category for the article based on its content.
  • Tagging: Relevant tags are assigned to optimize discoverability.
  • Publication timing: A strategic decision is made on the timing of the article’s release to maximize its impact and relevance.

Technologies used

1. Document indexing

  • ChromaDB: Utilized as the vector store for embedding and organizing content efficiently.
  • Embedding model: mxbai-embed-large-v1, accessed via the Ollama interface, ensures high-quality vector embeddings for effective content retrieval.

2. Scraping and data management

  • Scraping scripts: Implemented in Python for robust and scalable data extraction.
  • Relational database: PostgreSQL is used for managing structured data, ensuring reliability and scalability.

3. AI agents and orchestration

  • Framework: ELL.SO, a lightweight prompt engineering library, is employed for implementing the AI agents and managing their orchestration.
  • GPT-4o: Used for drafting and writing articles, ensuring a high-quality editorial process.
  • Llama3.2 7B: Powers summarization, classification, and bucket monitoring, enabling efficient content organization and readiness assessment. This is a local model served by Ollama.

4. Content repository

  • Strapi: The articles generated by the system are pushed to a Strapi repository for efficient content management and publication.

Future directions

To further enhance the AI Lens system, potential areas of improvement and expansion include:

  • Enhanced Source Discovery: Currently, the system scrapes content primarily from RSS feeds. A potential improvement is to monitor relevant social media accounts, such as Twitter and LinkedIn, to identify and analyze posts related to advancements in science and technology. This approach would involve deploying AI agents to extract and classify relevant updates, identify influential accounts, and conduct targeted research based on high-value posts, thereby extending the scope of information sources.

  • Integration with Open-Access Scientific Archives: At present, the AI agents, during their research process, conduct limited internet searches within a closed loop to gather relevant information. To improve the reliability and accuracy of retrieved information, an enhanced approach would involve querying open-access scientific archives such as arXiv. By focusing on peer-reviewed or preprint scientific articles, this strategy would provide more accurate and credible data sources for the AI agents, ensuring high-quality inputs for the editorial process.

  • Adaptive Learning: Continuous refinement of the agent’s qualitative assessment capabilities based on feedback loops.

  • Broader Applicability: Extending the workflow to other domains like healthcare, finance, or legal reporting.