Table of Contents
The Collection phase represents the critical information-gathering component of the intelligence cycle, where organizations systematically acquire the raw data needed to address their intelligence requirements. During this phase, a variety of sources are explored, including open-source intelligence, human intelligence, signals intelligence, and geospatial data, ensuring a comprehensive approach to data gathering. Without a well-designed collection strategy, even the most sophisticated analysis capabilities will be limited by insufficient or low-quality inputs, which can lead to flawed conclusions and misguided decisions. Moreover, the effectiveness of this phase hinges on the ability to prioritize and filter relevant information amidst the overwhelming volume of available data, allowing analysts to focus on what truly matters to support decision-making and operational success. Engaging in this meticulous process not only enhances the overall quality of intelligence but also strengthens an organization’s ability to adapt and respond to evolving threats and opportunities in a dynamic environment.
The Collection Phase
The Collection phase implements the intelligence requirements established during Planning and Direction, transforming abstract information needs into concrete data-gathering activities. As the second phase of the intelligence cycle, Collection serves as the bridge between what organizations need to know and the raw materials that can provide those insights.
Effective collection balances multiple factors:
Breadth: Ensuring coverage across relevant threat types, actors, and methodologies to avoid blind spots.
Depth: Gathering sufficient detail to enable meaningful analysis rather than surface-level observations.
Quality: Prioritizing reliable, accurate, and timely information over quantity alone.
Efficiency: Optimizing resource utilization by focusing collection efforts on high-value sources.
The collection strategy should be directly informed by intelligence requirements, with a clear line of sight between each requirement and the sources designated to fulfill it. This alignment ensures that collection activities remain purposeful and resource-efficient.
Collection Categories
Cyber threat intelligence collection typically falls into three main categories, each with distinct characteristics, strengths, and limitations.
Technical Collection
Technical collection involves gathering machine-readable threat data that can be directly ingested by security systems or analyzed for patterns and indicators of compromise. These sources provide the technical observables that form the foundation of tactical intelligence.
Commercial Intelligence Feeds: Subscription-based services providing curated collections of indicators of compromise (IoCs), including malicious IP addresses, domains, file hashes, and network signatures. These feeds often include contextual information about related threat actors or campaigns.
Open-Source Feeds: Community-maintained collections of indicators, often focused on specific threat types or industries. These include platforms like AlienVault OTX, Malware Information Sharing Platform (MISP), or various security researcher repositories.
Internal Security Systems: Data generated by an organization’s own security infrastructure, including SIEM logs, EDR telemetry, network traffic captures, and firewall events. These sources are particularly valuable as they provide direct visibility into the organization’s security posture.
Honeypots and Sensors: Purposely deployed decoy systems designed to attract and monitor adversary activity, providing early warning of emerging tactics or targeted campaigns.
Malware Repositories: Collections of malicious code samples that can be analyzed to understand functionality, attribution, and mitigation strategies. These may be internal collections or shared through information exchange communities.
Open-Source Intelligence (OSINT)
OSINT leverages publicly available information to build a broader understanding of the threat landscape, adversary motivations, and emerging trends. While technical collection focuses on indicators, OSINT provides essential context and strategic insights.
Security Research Publications: Vendor reports, academic research papers, and conference presentations providing in-depth analysis of threat actors, campaigns, or vulnerabilities. These resources often offer valuable insights into adversary tradecraft and attribution.
Social Media: Monitoring relevant security communities, threat actor communications, vulnerability discussions, and exploit sharing across platforms like Twitter, LinkedIn, or specialized forums. These sources can provide early warning of emerging threats.
Forums and Underground Communities: Observing hacker discussions, exploit marketplaces, and dark web forums to track adversary capabilities, motivations, and targeting interests. This requires careful access management and operational security.
Code Repositories: Analyzing public repositories like GitHub for emerging exploit development, offensive security tools, or malicious code. This allows tracking of technical innovations before they appear in active campaigns.
News and Media: Following breach reports, industry developments, and geopolitical events that might influence the cyber threat landscape or trigger new campaigns.
Human Intelligence (HUMINT)
HUMINT focuses on insights from people rather than technical systems or publications. While traditional intelligence operations define HUMINT more narrowly, in cyber threat intelligence it encompasses a range of human knowledge-sharing mechanisms.
Information Sharing Communities: Industry-specific Information Sharing and Analysis Centers (ISACs), professional networks, and trust groups where practitioners exchange threat information and defensive strategies.
Vendor Briefings: Threat updates and emerging trend discussions provided by security vendors, often including information not yet publicly disclosed.
Government Advisories: National CERT bulletins, law enforcement alerts, and regulatory guidance providing information about significant threats, especially nation-state activities.
Internal Expertise: Knowledge from an organization’s security teams, business units, and partners about observed threats, vulnerabilities, or suspicious activities.
Partner Organizations: Shared experiences and threat observations from supply chain partners, industry peers, or affiliated organizations facing similar threats.
Collection Management
Effective collection requires systematic management across the entire process, from source identification to data ingestion.
Source Evaluation
Each collection source should be evaluated based on several key factors:
Reliability: The source’s historical accuracy and consistency in providing correct information.
Relevance: Alignment between the source’s focus and the organization’s intelligence requirements.
Timeliness: How quickly the source provides information about new threats or developments.
Uniqueness: Whether the source provides information not available elsewhere.
Accessibility: The technical, financial, and operational feasibility of accessing the source.
Coverage Mapping
Organizations should maintain a mapping between intelligence requirements and collection sources to identify potential gaps:
Requirement Coverage Assessment: Periodically evaluate whether each intelligence requirement has sufficient collection sources aligned to it.
Source Diversity Analysis: Ensure critical requirements don’t rely on a single source type or provider.
Gap Identification: Proactively identify areas where collection capabilities may be inadequate.
Redundancy Planning: Establish alternative sources for critical intelligence areas to avoid single points of failure.
Collection Prioritization
With limited resources, organizations must prioritize collection efforts:
Alignment with PIRs: Sources that support Priority Intelligence Requirements receive higher priority.
Threat Severity: Collection related to high-impact threats receives greater attention.
Source Productivity: Sources with a history of providing valuable intelligence receive continued investment.
Resource Efficiency: Sources requiring lower effort relative to their intelligence value are prioritized.
Quality and Source Evaluation
Maintaining collection quality requires systematic evaluation of sources and their outputs.
Source Authentication
Organizations should implement mechanisms to validate source credibility:
Provider Verification: Confirming the identity and reputation of intelligence providers.
Technical Verification: Implementing technical measures to validate source authenticity.
Cross-Source Validation: Comparing information across multiple sources to identify inconsistencies.
Historical Performance: Tracking the accuracy of information from each source over time.
Quality Control Measures
Several practices help maintain collection quality:
False Positive Tracking: Monitoring erroneous indicators or intelligence from each source.
Update Frequency Assessment: Evaluating how regularly sources refresh their information.
Coverage Completeness: Assessing whether sources provide comprehensive information or only partial data.
Contextual Richness: Evaluating whether sources provide sufficient context beyond raw indicators.
Collection Planning
Effective collection requires deliberate planning rather than ad-hoc information gathering.
Collection Requirements Definition
Each intelligence requirement should be translated into specific collection needs:
Data Types: Defining exactly what information types are needed (e.g., IP addresses, malware samples, actor profiles).
Source Selection: Identifying which sources can provide the required information types.
Collection Parameters: Establishing timeframes, volume expectations, and priority levels.
Technical Specifications: Defining format requirements and integration considerations.
Collection Workflow Design
Organizations should establish clear workflows for all collection activities:
Ingestion Processes: How data will be retrieved from each source and brought into intelligence systems.
Initial Validation: Preliminary checks to ensure basic data quality and relevance.
Enrichment Requirements: Additional context needed to make raw data more valuable.
Storage and Retention: Where collected data will be stored and for how long.
Collection Challenges
Organizations frequently encounter obstacles in the Collection phase.
Information Overload
The volume of available threat data can easily overwhelm collection and analysis capabilities:
Indicator Deluge: Commercial feeds may provide millions of indicators with limited context.
Alert Fatigue: Security systems generate more alerts than teams can effectively process.
Signal vs. Noise: Distinguishing meaningful threats from background activity becomes increasingly difficult.
Processing Backlogs: Collection can outpace processing and analysis capabilities.
Access Limitations
Not all valuable intelligence sources are readily accessible:
Closed Communities: Some information sharing groups require sponsorship or specific credentials.
Technical Barriers: Some data formats or feeds may require specialized tools or knowledge.
Financial Constraints: Commercial intelligence services can be expensive, especially for smaller organizations.
Legal and Policy Restrictions: Certain collection activities may be constrained by legal considerations or organizational policies.
Quality Concerns
The reliability and accuracy of collected information varies significantly:
False Positives: Incorrect indicators can lead to wasted investigative resources or legitimate activity disruption.
Outdated Information: Threat intelligence rapidly loses value if not refreshed regularly.
Contextual Gaps: Raw indicators without context provide limited actionable value.
Source Bias: Intelligence providers may have commercial or operational biases that affect their reporting.
Best Practices
These practices help organizations build effective collection capabilities.
Diversify Collection Sources
Relying on a single source type creates significant blind spots:
Multi-Source Strategy: Implement a mix of technical feeds, OSINT, and human intelligence sources.
Cross-Domain Coverage: Ensure collection spans tactical, operational, and strategic intelligence needs.
Vendor Diversity: Avoid over-dependence on a single commercial intelligence provider.
Internal-External Balance: Complement external intelligence with internal threat data.
Automate Routine Collection
Automation frees analyst time for more complex tasks:
Scheduled Retrieval: Implement automated polling of intelligence feeds and sources.
Format Standardization: Automatically normalize different data formats into a consistent structure.
Initial Filtering: Apply basic relevance filters to reduce noise before human review.
Integration with Security Infrastructure: Automate the connection between collection and security controls.
Regularly Audit Collection Sources
Collection sources should be periodically evaluated:
Effectiveness Reviews: Assess how frequently each source provides actionable intelligence.
False Positive Analysis: Track inaccurate information by source to identify quality issues.
Requirements Alignment Check: Verify that sources continue to address current intelligence requirements.
Cost-Benefit Analysis: Evaluate the return on investment for commercial intelligence services.
Tools and Technologies
Several technologies support effective collection activities.
Threat Feed Aggregators
Platforms that centralize and normalize multiple intelligence feeds:
TAXII/STIX Clients: Tools that leverage standardized threat intelligence formats and transport protocols.
Commercial TIP Platforms: Threat Intelligence Platforms that integrate multiple commercial and open-source feeds.
Custom Aggregation Tools: Organization-specific solutions designed to consolidate internal and external intelligence.
OSINT Platforms
Tools designed to systematically gather and organize open-source intelligence:
Social Media Monitoring: Platforms that track security-relevant conversations across social networks.
Web Scrapers: Automated tools that extract information from security blogs, news sites, and research publications.
Dark Web Monitoring: Specialized services that access and monitor underground forums and marketplaces.
Collection Automation Tools
Solutions that reduce manual effort in collection activities:
API Integrators: Tools that programmatically connect to intelligence provider APIs.
Security Orchestration Platforms: Systems that automate intelligence collection workflows.
Data Transformation Tools: Utilities that convert between different indicator formats and structures.
Measuring Collection Effectiveness
Organizations should establish metrics to evaluate their collection capabilities.
Source Diversity
Metrics that assess the breadth of collection sources:
Source Type Distribution: Percentage of intelligence derived from different collection categories.
Provider Concentration: Degree of dependence on individual intelligence providers.
Coverage Across Requirements: How evenly collection sources map to intelligence requirements.
Collection Quality
Measures of the reliability and usefulness of collected intelligence:
False Positive Rate: Percentage of collected indicators that prove incorrect.
Unique Intelligence Ratio: Proportion of collected intelligence not available from other sources.
Actionability Score: Assessment of how directly collected intelligence can inform security actions.
Collection Efficiency
Metrics focused on resource optimization:
Collection Cost Per Actionable Indicator: Financial efficiency of intelligence gathering.
Time from Emergence to Collection: How quickly new threats are incorporated into collection.
Automation Percentage: Proportion of collection activities performed without manual intervention.
Case Study: Collection in Action
Consider a financial services organization implementing a comprehensive collection strategy:
- Requirements Mapping: The organization begins by mapping its highest-priority intelligence requirements to appropriate collection sources, identifying critical gaps in visibility of emerging ransomware tactics.
- Source Diversification: To address the gaps, the security team implements a multi-layered approach:
- Subscribes to a financial sector-specific threat feed focusing on ransomware campaigns
- Joins the financial services ISAC to gain peer insights and early warnings
- Deploys honeypots that mimic their internal banking applications
- Establishes an internal collection process for suspicious emails and employee reports
- Automation Implementation: The team automates routine collection tasks:
- Builds API connections to ingest commercial threat feeds directly into their TIP
- Creates scheduled scraping of security blogs covering ransomware trends
- Implements automated monitoring of relevant GitHub repositories for exploit development
- Quality Control Process: To maintain collection quality, they establish:
- Weekly false positive reviews for each intelligence source
- Monthly value assessments comparing collection costs against security incidents prevented
- Quarterly requirement reviews to ensure continued alignment with business priorities
- Continuous Refinement: Based on operational feedback, the collection strategy evolves:
- Decommissioning low-value sources that consistently provide duplicative information
- Expanding collection in areas where intelligence has prevented actual attacks
- Adjusting collection frequency based on the volatility of different threat categories
Conclusion
The Collection phase transforms intelligence requirements into concrete data gathering activities, providing the raw materials that fuel the entire intelligence cycle. Effective collection balances breadth, depth, quality, and efficiency while remaining tightly aligned with organizational priorities.
By implementing a diverse, well-managed collection strategy, organizations can ensure they have visibility into relevant threats without drowning in irrelevant data. The key to success lies not in collecting everything possible, but in collecting the right information, from the right sources, at the right time.
In the next article, we’ll explore the Processing phase, examining how organizations transform the raw data gathered during Collection into a format suitable for meaningful analysis.