Open-source intelligence (OSINT) self-audit for researchers

What the internet knows about you
Being publicly visible online is part of academic life. Drawing on open-source intelligence (OSINT) methods, this guide helps you discover what information about you is publicly available and take practical steps to reduce unwanted exposure. Finding a lot of information about yourself is normal; the goal is not to disappear, but to be deliberate about what’s visible.
Note: External tools mentioned in this guide are provided as examples only and have not been assessed or endorsed by the university. For tools managed by U of T (such as 1Password and M365 Copilot Chat), links point to official university resources.
If you only do three things
Secure your accounts
- About 15 minutes.
- This is your foundation; if your email is compromised, other safeguards are far less effective.
Search for yourself online
- About 30 minutes.
- You can’t protect what you don’t know is visible.
Remove personal data from people-finder sites and prune social media
- 1 – 2 hours over a few sessions.
- This action usually reduces the most risk.
The remaining steps build on these foundations. Consider working through them over a few weeks.
Why this matters to researchers
Your digital presence (publications, profiles, lab websites, conference listings) supports the reach and impact of your work. That same visibility can expose you to risks such as harassment, impersonation, phishing, or misuse of your identity. Researchers routinely assess risk in fieldwork and ethics review; the same deliberate approach applies to your online presence.
The Canadian Centre for Cyber Security (CCCS) recommends that individuals understand and regularly monitor their digital footprint (ITSAP.00.133).
Why this matters?
If your primary accounts (especially email) are compromised, everything else becomes harder to contain. An attacker with access to your email can reset passwords, intercept MFA codes, and impersonate you to colleagues and funders. Cleaning up public traces has limited value if your email is accessible to someone else.
What can I do?
- Use a password manager (like the university-managed tool or a personal one) to identify and fix weak or reused passwords.
- Enable multi-factor authentication (MFA), also known as two step sign-in/verification, on personal accounts (email, social media) and keep your authenticator device (e.g., your phone) secure. Several options are available for managing your second factor:
- U of T accounts are protected through UTORMFA (Duo). Since Duo Mobile is already on your phone, you can use it as a time-based one-time password (TOTP) authenticator for your personal accounts as well.
- 1Password can generate and autofill TOTP codes, keeping your passwords and second factors in one secure vault.
- For stronger protection, consider a hardware security key (e.g., YubiKey), which can serve as a passkey or second factor on supported services. 1Password also supports passkey storage. The CCCS provides a helpful overview of MFA and why it matters (ITSAP.30.030).
- Store recovery codes securely. Keep them in your password manager’s “notes” section or printed and kept in a locked location. If your recovery codes are downloaded as a PDF, you can attach the file directly to the relevant login item in 1Password, then delete the original from your device. This keeps the codes encrypted within your vault and associated with the account they belong to. Remember, these codes are the only way back into your account if you lose your authenticator device.
Why this matters?
You can’t protect what you can’t see. A quick scan reveals where your identity shows up, sometimes without your knowledge. This is the “discovery” phase.
What can I do?
- Search for yourself on various search engines (e.g., Google, DuckDuckGo, Yandex, Bing). Try your name, name + “U of T,” and your name in quotes (e.g., “First Last”).
- Reporters Without Borders provides a helpful guide to discovering how much of your private and professional life is findable through search engines.
- Search for your usernames. If you use a consistent handle across platforms (e.g., FLast-research), a username enumeration tool (such as WhatsMyName or Sherlock) ) can reveal where else that handle is registered, sometimes on platforms you’ve forgotten about.
- Check data breaches. Use a service like Have I Been Pwned to see if your university or personal email addresses have been exposed in known data breaches.
- Your password manager (like 1Password Watchtower) likely has this built in.
- Reverse image search your professional headshot, profile photos, and lab images to see where they appear online (e.g., Google Lens, TinEye).
- Use a protected AI tool (like a large language model [LLM]). If you have access to a university-approved, private LLM (e.g., M365 Copilot Chat, ChatGPT Edu) instance, use it to automate a synthesis of your online footprint.
- Ask it:
- “Summarize the public-facing professional activities and collaborations of ‘Dr. First Last’ at ‘U of T’.”
- “I am conducting a security audit. Using the public text from my lab website [paste text or give URL] as a case study, what types of information here would be most valuable for an attacker building an OSINT profile? Please categorize the risks (e.g., ‘Personnel Names,’ ‘Project Details,’ ‘Mentioned Software’) and explain why each is a risk.”
- Note: Commercial LLMs have safety guardrails that may block security-related prompts, even when directed at yourself. Reframing your prompt (e.g., as a “personal security audit”) is unlikely to reliably bypass these filters, as most models are designed to resist persona-based workarounds. If you encounter this, try a more indirect approach: paste publicly available text (e.g., from your lab website or bio) and ask the LLM to categorise the types of information present, without framing it as a security exercise. Alternatively, use the LLM for synthesis rather than threat analysis: ask it to summarise everything publicly known about a research group, then review the output yourself for sensitive details.
- Ask it:
- Consider what’s discoverable beyond text. Public recordings of lectures or conference talks, media appearances, and podcast episodes can be used to clone your voice or generate synthetic video. If you find recordings you didn’t consent to, document them (Step 3) and request removal.
Why this matters?
If you find something hostile, misleading, or simply information that shouldn’t be public, you’ll need a record of it. Online content can change or disappear without warning, and having evidence is essential for reporting or requesting removal.
What can I do?
- Create an “evidence” folder in a secure location.
- Take full-page screenshots. Don’t just clip a small section. Be sure to capture the full URL, the date, and the time in your screenshot.
- Save the page. Use your browser’s “Save Page As…” (or “Print to PDF”) feature to capture a local copy of the webpage.
- Archive the page. For public pages, use the Wayback Machine‘s “Save Page Now” feature to create a timestamped, third-party snapshot. This creates independent evidence that you do not control, which strengthens any formal complaint or report.
Why this matters?
Now that you’ve mapped what’s visible, you can start to remove or reduce it. Think of this as shrinking the amount of personal data available to anyone looking, whether a data broker, a harasser, or someone attempting to impersonate you.
What can I do?
- Prune social media.
- Set personal accounts (Facebook, Instagram) to “Private.”
- On public/professional accounts (X/Twitter, Bluesky, LinkedIn), remove your personal phone number, full birthdate, and any location tags you don’t need. On LinkedIn specifically, review your privacy settings: by default, your profile is visible to logged-out users, your network connections are visible to other connections, and LinkedIn broadcasts activity updates (such as profile changes or new connections) to your network.
- Review and untag yourself from photos, especially on friends’ and colleagues’ public accounts. Consider asking them to remove photos that show your location, lab space, or daily routine.
- Contact data brokers. “People-finder” sites automatically scrape public records, social media, and web pages to build profiles containing your address, phone number, relatives’ names, and approximate age. This is a commercial industry: your data is their product. Under Canadian federal privacy law (PIPEDA), you have the right to know what personal information an organisation holds about you and to request correction of inaccurate information. To have your data removed, use each broker’s formal opt-out process.
- Big Ass Data Broker Opt-Out List (BADBOOL) provides a prioritised list of data brokers with opt-out instructions for each.
- Consumer Reports has several articles on how to delete your information from people-search sites and the benefits and gaps from using paid services to do it for you.
- Use Google’s tools. Use Google’s “Results about you“ tool to request the removal of personal information (like your home address or phone number) from search results.
- Request cache removal. If a site has removed your info but it still shows up in Google, use the “Remove outdated content“ tool to ask Google to re-index the page.
- Use a VPN, especially on untrusted networks. A virtual private network (VPN) encrypts your internet traffic and masks your IP address, which can reveal your approximate location and internet provider. This is relevant not only on public Wi-Fi at conferences and airports, but on any network: due to carrier peering arrangements, Canadian domestic internet traffic is routinely routed through US infrastructure where it may be subject to foreign surveillance (Clement & Obar, 2015). The Canadian Centre for Cyber Security recommends using a VPN whenever connecting to public or unfamiliar networks (ITSAP.80.009).
- The university provides UTORvpn for connecting to U of T systems and encrypting your traffic.
- For situations where you prefer not to route traffic through university infrastructure, such as fieldwork, journalism-adjacent research, or contexts where institutional affiliation itself is sensitive, consider a reputable personal VPN such as Mullvad VPN or Proton VPN, both of which accept anonymous payment and maintain independently audited no-logs policies. As a practical bonus, using a VPN while conducting your OSINT searches (Step 2) helps prevent those searches from being linked back to your IP address.
Why this matters?
Cleaning up your past is only half the battle. Adopting safer habits stops you from creating new, unnecessary risks.
What can I do?
- Err on the side of caution. If you are unsure whether to post something personal or make a profile public, default to keeping it private. It is always easier to share more later than to take back something you’ve already exposed.
- Practice “data minimalism.” When signing up for any new service, list, or account, provide only the minimum information required. Skip optional fields. If a service asks for your birthdate or phone number and doesn’t need it, leave it blank or provide a non-identifying alternative.
- Check privacy settings first. Before you actively use a new social media app or service, go into its settings and lock down privacy controls.
- Think before you tag. Be mindful of geotagging posts or tagging colleagues and students, as it connects identities and locations.
- Use institutional info strategically. Use your U of T email and address for official professional profiles (like ORCiD) but consider a non-identifiable alias or email for public forums or platforms where you don’t need your official affiliation.
- Be mindful of collaboration platforms. Slack workspaces, Discord servers, and Microsoft Teams channels associated with research groups can be semi-public or fully public. Before posting, check whether the channel is visible to people outside your immediate group.
Why this matters?
Your digital footprint is not static. This isn’t a “one-and-done” task. New publications, conference rosters, and data broker scrapes will add new information about you.
What can I do?
- Set up Google Alerts for your full name (in quotes: “First Last”), your name + U of T (“First Last” “University of Toronto”), and your lab’s name.
- Turn on breach monitoring in your password manager.
- Do a quarterly “social media check-in.” Platform settings change. Once a quarter, quickly review your privacy settings, check your “tagged photos” on social media, and see who your new followers are.
- Review your “active sessions” monthly. In the security settings of your primary accounts (Google, Microsoft, X/Twitter), check the list of logged-in devices and locations. If you see a device or location you don’t recognize, end that session immediately, change your password, and review your MFA settings.
- Manually review search results. Once every 6 – 12 months, re-run the searches from Step 2. Google Alerts are not perfect and will miss things.
Why this matters?
The files you share (PDFs, Word documents, images, spreadsheets) contain hidden data called metadata that can reveal your name, device name, GPS coordinates, file paths, and software versions. For images, this metadata is called EXIF data.
What can I do?
- Inspect documents before publishing.
- In Microsoft Word: Go to File > Info > Check for Issues > Inspect Document.
- In Adobe Acrobat: Use the “Remove Hidden Information” tool.
- Scrub images. Before uploading a photo to a website, remove its metadata.
- On Windows: Right-click the file > Properties > Details > Remove Properties and Personal Information.
- On Mac: Use a third-party tool or save a new version (File > Export) which often strips it.
- When in doubt, “flatten” it. Print the document to PDF or copy-paste the text only into a brand new, clean file.
- For more thorough, open-source options: Advanced users can use tools like MAT2 or ExifTool for command-line removal across many file types, or ExifCleaner for a simple drag-and-drop desktop app.
- Do not upload confidential or sensitive files to websites that claim to remove metadata for you. This is especially important for sensitive research. If your documents reveal internal file paths (e.g., C:\Users\YourName\GrantProposal\DefenceContract\), device names, or GPS-tagged images from a field site, that metadata could compromise the confidentiality of your work or even your physical safety.
Why this matters?
As a PI, your lab website is a key digital asset and a prime OSINT source. It’s often self-managed and can be a major target for both information gathering and technical attacks.
What can I do?
- Review your lab’s “People” page. Consider whether you need to list every student’s full name and direct email. A central contact form or role-based email (e.g., info@your-lab.ca) protects students’ personal information while keeping your lab accessible. Check with current and former lab members before publishing their details.
- Check website photos. Are you showing whiteboards with project details? Are you showing old photos of students who have since left? Get consent.
- Check your domain registration. Check your domain registration. If you have a custom lab domain (e.g., your-lab.com), check its WHOIS record using a lookup tool. If it shows your home address or personal phone number, contact your domain registrar (the company you purchased the domain from, e.g., GoDaddy, Namecheap, Hover) and enable “WHOIS Privacy” or “Privacy Protection.” Most registrars offer this as a free or low-cost add-on, and it replaces your personal details with the registrar’s proxy information.
- Review “Projects” and “Research” pages. Are you sharing internal project codenames or sensitive, unpublished methodologies?
- Check for public calendars. A lab’s public Google Calendar can reveal the PI’s travel schedule, lab meeting times, and project deadlines. If you share a calendar publicly, review who can see it and consider limiting visibility to “free/busy” only.
- Audit old job postings. Postings often contain a wealth of OSINT: specific software used, internal goals, and team structure. Remove old postings.
- Keep the platform updated. An out-of-date site (content management system [CMS] like WordPress or Drupal, plugins) is a public advertisement for attackers. Set a monthly reminder to apply updates and delete any plugins you no longer use.
- Audit user accounts. On your website’s admin panel, delete old user accounts (e.g., from past students) and ensure no one is using a default username like “admin.” Limit “Administrator” roles to only those who truly need them.
- Keep in mind that much of the information you remove from your lab website may still appear on lab members’ LinkedIn profiles: project names, software tools, supervisory relationships, and even unpublished research directions. Securing your lab website addresses what you control directly but consider discussing LinkedIn and social media hygiene with your group as part of regular onboarding or lab meetings. See Step 4 and Step 10 for individual-level guidance.
Why this matters?
Your code repositories are part of your public footprint. Even if you think of them as technical workspaces, they can reveal your institutional email address (through git commit metadata), your work schedule (through commit timestamps), internal project names (through directory structures and branches), and your collaborators (through contributor logs). Beyond this personal exposure, repositories can also accidentally leak credentials (“secrets”) like API keys or passwords that give attackers direct access to your data or cloud services.
What can I do?
- Review your git commit metadata. Run git log –format=’%ae’ | sort -u in your repository to see what email addresses are embedded in your commit history. If your personal email is there, configure git to use your institutional address going forward (git config user.email). Note that existing commits retain the old address in the repository history.
- Never hard-code secrets. Don’t paste a password or API key directly into your code. Use environment variables or a secrets manager.
- Use a .gitignore file. This is a simple text file that tells Git to ignore files (like your local secrets.env file) so they are never uploaded.
- Scan your code. Use a tool to check your existing code for any leaked secrets.
- Built-in tools: GitHub and GitLab have “secret scanning” features.
- Open-source tools: You can run command-line tools like Gitleaks or TruffleHog locally to scan your code before you push it to a public repository.
- If you find a leaked key: rotate it immediately. Deleting the key from your current code is not enough; the key is already in your repository’s history and may have been copied. Log in to the affected service, generate a new key, update your code, and then purge the old key from your Git history using a tool like git filter-repo or BFG Repo-Cleaner.
Why this matters?
Your academic profiles (like DiscoverResearch, ORCiD, Google Scholar, ResearchGate) and conference materials are public and often link all your work together. Proactively curating these profiles helps control your public narrative and ensures the information people find is accurate and professional.
What can I do?
- Take control of your U of T profile. Use institutional tools like ‘U of T DiscoverResearch’ to ensure your public-facing academic profile is accurate and complete. This is often the first result people find.
- Curate your LinkedIn profile deliberately. LinkedIn is one of the richest OSINT sources available because it links your institutional affiliation, role, research interests, collaborators, employment history, and professional network in a single, publicly searchable location. Review what is visible to people outside your network, including your connections list, activity feed, and any recommendations or endorsements that reveal project details. Consider whether your headline and summary need to include your specific lab name or research focus, or whether a broader description serves you just as well.
- Check and link external profiles. Review your ORCiD, Google Scholar, and any other academic profiles. Link them to your U of T profile and ensure they use your institutional email and address, not your personal ones.
- Post a “public CV.” The full CV you send for grants (with personal phone, address, etc.) should not be the same one you post publicly on your website. Post a “public” version with only your institutional contact info.
- Be aware of conference lists. These are frequently made public or shared with sponsors and can be combined with travel dates and social media activity to build a detailed picture of your movements. Opt out of public lists where possible and consider the information you share in conference registration forms.
- Review what Google Scholar auto-generates. Google Scholar profiles are often auto created and may attribute incorrect publications to you or reveal co-author networks you’d prefer not to highlight. Claim your profile and curate it deliberately.
Why this matters?
The audit is the first step. Knowing what to do when you find something alarming (like your data being sold, or a hostile post) is the most important part.
What can I do?
- Don’t panic. This is normal. Platforms and data brokers are designed to collect and share as much data as possible by default. You are not being careless; you are seeing these systems work as intended. Completing this audit is the first step toward managing your digital presence deliberately.
- Document it. (See Step 3).
- Report it. Use the platform’s “Report” button (for harassment) or follow the data broker’s opt-out (for data).
- Know who to contact. For serious issues (compromised accounts, threats, impersonation), contact the university’s Incident Response team. For harassment or personal safety concerns, you should reach out to Campus Safety.
- Talk to someone. Don’t handle harassment alone. Talk to your supervisor, a trusted colleague, or university support services.
