SMAT
Search
⌃K

Data Guide

Navigating SMAT data fields.
The data that SMAT offers, whether through the API or premium dashboard tools, is mostly left exactly as it was offered by the platforms themselves. So that means we share everything publicly available, but also that the data is not normalized across the different platforms.
Data is delivered via JSON blobs that look like this when making an API request.
SMAT API JSON blob from a search for "Bannon" on Gettr

Data Fields

Elasticsearch Fields

To get to the actual data, you need to navigate into hits.hits and then each result will be under a number in the nested structure.
When we add a post to our database we generate a number of fields that can be found at the level above the actual data responses. So if we are looking in response "0", the meta-fields continue until "_source" which is the beginning of the actual data. These are generated by Elasticsearch and are not part of the data from the site itself. Learn more about these fields in Elasticsearch's guide.
Red line highlighting some meta-fields in SMAT data

Data Fields Overview

Each dataset will be described in detail in the following sections with more fields laid out but this chart can function as a quick-start guide to a few key fields. The "Key Fields" sub-header in each data source is not every interesting field just a few, so do look into the data more as well.
  • Platform: This is just the plain text readable name of the platforms not the name of the platform field in the api.
  • Platform Endpoint: This is the name of that platform when you're making an API request.
  • Username: This is the handle (especially user slug) for the account that is posting the message though not necessarily the author of the post contained (i.e. if it's a forward). This is their @ not their full name.
  • Post content: This is the actual body of an individual post or message.
Platform
API Site Parameter
Username Field
Content Field
Telegram
telegram
userinfo.username
message
Gab
gab
account.acct
content
tiktok_video
author
desc
tiktok_comment
author
text
Parler
parler
username
body
Gettr
gettr
uinf.username
txt
truth_social
account.acct or account.username
content_cleaned
4chan
4chan
name
htmlparsedcom
scored
win
author
content
8kun
8kun
name
htmlparsedcom
kiwifarms
author_username
post_text
bitchute_comment
fullname
content
bitchute_video
channel_slug
N/A but look at meta.description or meta.title
rumble_comment
username
text
rumble_video
channel_id
N/A but look at full_description and metadata.name
VK
vk
author
text
lbry_comment
channel_name
comment
lbry_video
signing_channel.value.title
N/A but look at value.title and value.description
Poal
poal
user
content
WimKin
wimkin
author_username
content
Minds
minds
user.username or user.name
body
MeWe
mewe
username
content

Data Sources

Telegram

Description
Collected Data
Key Fields
Additional Resources
Telegram is a freemium, cross-platform, cloud-based messaging application and network. Telegram’s operational center is based in Dubai, UAE. Telegram data schema consists of channels, which users can join and post messages, images, videos or other media. Users and channels can also forward content across channels or private groups. Channels can be private or public depending upon whether they require a user invite to join. Telegram is an infamous messaging application because of its use by far-right and neo-Nazi groups in the United States, for providing news related to the Russian invasion of Ukraine, as well as other authoritarian regimes like the Myanmar Tatmadaw. It is contended as to the specifics on whether Telegram data is transported and stored in an end-to-end encrypted manner.
SMAT targets specific Telegram channels for collection due to the vast volume of Telegram users and content. SMAT utilizes subject matter experts and community volunteers to source hundreds of channels for crawling in addition to automated methods yielding over 65,000 channels being crawled. SMAT crawls users metadata, messages, all media, and channel metadata from the seed crawling set. Some of the categories of these channels are: Russian state and Russian affiliated propaganda, white nationalist groups, and European far-right groups like Querdenker.
Platform endpoint: telegram
  • Username : userinfo.username
  • The name of the channel: channelusername
  • The title of the channel: channeltitle
  • The self-description of the channel: channelabout
  • Post content : message
Telegram just dropped on SMAT!
Our blog post release
Example API JSON

Gab

Description
Collected Data
Key Fields
Additional Resources
American alt-tech social networking and microblogging site that is infamous for its far-right users. Gab is associated, via its user base with QAnon, conservatives, neo-Nazis, white supremacists, white nationalists and anti-Semites. Typically, Gab users are looking for a similar experience to traditional social networks like Twitter, as well as less content moderation. Like other alt-tech sites, Gab motivates its existence on the notion that it allows free-speech more than other platforms. The Gab founder is a notorious transphobic, white nationalist Christian.
SMAT aims to perform a full crawl of Gab. SMAT’s gab collection includes the data that was leaked due to a SQL vulnerability in 2021. SMAT collects posts and comments from Gab.
Platform endpoint: gab
  • Username : account.acct
  • Post content : content
  • Did accont donate : account.is_donor
  • Did account invest : account.is_investor
  • Is account pro : account.is_pro
  • url of post : url
  • User bio : account.note
Example API JSON

TikTok

Description
Collected Data
Key Fields
Additional Resources
TikTok is a video sharing social media platform based out of China but popular across the earth. SMAT focuses on it as a result of its emergent use in a range of harassment, white supremacy, conspiracy, and war crime related posts.
SMAT crawls outwards based on hundreds of seed hashtags with connections to harmful content. This is a limited crawl of specifically harmful content, not of the entire platform.
Platform endpoint : tiktok_video
  • Username : author
  • Video description : desc
  • Field containing hashtags : textExtra
    • numbers counting for each hashtag
      • Hashtag : hashtagName
  • Note: challenges field also contains hashtags embedded in counting numbers
  • Text in sticker on video : stickerText
    • each sticker embedded within a counting number
Example API JSON

Parler

Description
Collected Data
Key Fields
Additional Resources
Parler is arguably the most infamous American alt-tech microblogging social network. Parler has deep ties to Conservatives and the Republican party in the United States. Like other alt-tech microblogging platforms, Parler advertises itself on the premise of maximal free speech and therefore performs little content moderation. As a result, it is reported that Parler contains QAnon, white nationalist and neo-Nazi content, as well as organized calls to violence. Parler is known as one of the primary social networking sites used to coordinate the January 6th insurrection in the United States.
SMAT aims to crawl all posts, comments and user profiles on Parler.
Platform endpoint : parler
  • Username : username
  • Post content : body
  • User type (see Stanford article above) : badges
  • Post URL : shareLink
  • Full name : name
  • Trolling field : trolling
  • Shared URLs in post : urls.long
  • Account verified : verified
Example API JSON

Gettr

Description
Collected Data
Key Fields
Resources
Gettr is an alt-tech social network and microblogging site. It was originally founded by a former Donald Trump aide and essentially operates like a clone of its traditional counterpart, Twitter. Gettr has ties to Chinese businessman and dissident, Guo Wengui as well as Steve Bannon. Gettr is mostly full of right-wing content and reported to contain extreme racism, antisemitism, child sexual abuse material and terrorist propaganda. As with is other alt-tech sites, motivated by a free-speech narrative, Gettr appears to have very little content moderation.
SMAT aims to perform a full crawl of Gettr. SMAT collects posts, comments and use profiles from Gettr.
Platform endpoint : gettr
  • Username : uinf.username
  • Post content : txt
  • Self chosen location free text box : uinf.location
Example API JSON

Truth Social

Description
Collected Data
Key Fields
Resources
Truth Social is a microblogging social network alt-tech platform created by Trump Media & Technology Group, an American media and tech company founded by former president Donald Trump. The site itself operates like other microblogging social media platforms, however was originally only a mobile only site. The site claims it allows “free expression” however claims to block accounts and content it considers “harmful” or “inappropriate”, including death threats or parody names. SMAT observed neo-Nazi, white nationalist and QAnon content on the platform as well as calls to violence.
SMAT aims to crawl all posts, comments and user profiles on Truth Social.
Platform endpoint : truth_social
  • Username : account.acct or account.username
  • Post content : content_cleaned
  • Hashtags : tags
  • URLs shared in post : url
Example API JSON

4chan

Description
Collected Data
Key Fields
Resources
Infamous imageboard website. Users primarily participate in threaded discussions in response to an original post containing an image. Threads are categorized into “boards”, which are a many-to-one relationship between a thread and essentially a forum room. The most infamous board, “/pol/” or “politically incorrect” is where the majority of internet attacks and threats of real world violence are posted. The site is anonymous by default and users, especially on “/pol/”, proport a free-speech maximalist ideology.
SMAT aims to collect all original posts and replies made on 4chan.
Platform endpoint : 4chan
  • Username (mostly anonymous) : name
  • Post content : htmlparsedcom
  • Self-selected country code : country
Example API JSON

Scored/.win network

Description
Collected Data
Key Fields
Resources
The .win network communities sites are an alt-tech threaded based conversation forum that operates almost identically to its traditional counterpart - Reddit. The sites first came into existence when Reddit banned r/The_Donald and users stood up a standalone site using the communities format. Other example subforums on the site “FarPeopleHate”, “Incel”, and “RonDeSantis”.
SMAT aims to collect all posts and comments made on all communities sub forums.
Platform endpoint : win
  • Username : author
  • Post content : content or html_parsed_html
Example API JSON

8kun

Description
Collected Data
Key Fields
Resources
8kun, previously called 8chan, is an imageboard site where anonymous users respond in a threaded format to an original post. Like its predecessor 4chan, 8kun threads are categorized into various “boards” which essentially correspond to a room in a traditional internet forum format. The site is associated with several white supremacist, neo-Nazi, alt-right, and anti-Semitic hate groups globally. The users are linked to several mass shootings and terrorist events. Additionally, the site is infamous as the original source of the QAnon conspiracy theory which has deep ties to the January 6th insurrection in the United States.
SMAT aims to collect all original posts and replies made on 8kun.
Platform endpoint : 8kun
  • Username : name
  • Post content : htmlparesedcom
  • Board (like a sub-forum or channel) : board
Example API JSON

Kiwi Farms

Description
Collected Data
Key Fields
Resources
Kiwi Farms is an American internet forum commonly associated with extreme doxxing, organized harassment, and real-life stalking. The volume and depth of harassment campaigns on Kiwi Farms is associated with the suicide of 3 different victims. The site’s primary purpose is extreme bullying and cyberstalking. Therefore because the data contains a high volume of personally identifiable information, we do not display the source in SMAT’s public API / UI.
SMAT attempts to collect all posts and comments from Kiwi Farms.
Platform endpoint : kiwifarms
  • Username : author_username
  • Post content : post_text
  • Post URL: post_url
  • Subforum : subforum
  • page title : clean_title
Example API JSON

Rumble

Description
Collected Data
Key Fields
Resources
Rumble is an alt-tech video hosting platform popular among the American far-right. The Rumble platform received investment from Peter Thiel in May 2021. In December of 2021, Trump Media & Technology Group stated that Rumble would operate part of Truth Social.
SMAT aims to crawl all comments, video metadata and user profiles on Rumble.
Platform endpoint : rumble_video
  • Channel Username : channel_id or username
  • Video description : full_description
  • Post URL: canonical
  • Video title: metadata.name
Platform endpoint : rumble_comment
  • Username : username
  • Post content : text
Example API JSON

VK

Description
Collected Data
Key Fields
Resources
Originally named Vkontakte, VK is a Russian social networking site based out of Saint Petersburg, Russia. VK is considered a Facebook “clone” except the primary user base is located in Russia. As of March 2022, VK was the most popular website in Russia. VK has been shown to have loose enforcement on policy violating content and is hosts neo-Nazi groups in Russia and the United States.
SMAT crawls targeted groups and users on VK and collects all of the posts, comments, images and videos posted on the maintained seed list. Our targeted VK crawling list consists of VK groups mentioned in our Telegram collections, users who are associated with Russian soldiers committing war crimes in Ukraine, and additional users or groups hand curated by subject matter experts.
Platform endpoint : vk
  • Username : wall_owner
  • Post content : text
  • URLs in post : link.url (note other interesting fields in "link" json)
Example API JSON

Bitchute

Description
Collected Data
Key Fields
Resources
Alt-tech video sharing platform. BitChute is known to host content pertaining to harmful conspiracies like QAnon, hate speech, and neo-nazi propaganda. As is with other alt-tech sites which ideate “free speech”, the site is full of videos and comments containing racist slurs, Nazi imagery and calls for violence.
SMAT begins with a seed set of video links and attempts to crawl the entire site using spidering. SMAT collects video metadata and comments made on those videos. We are not currently collecting and storing the videos themselves.
Platform endpoint : bitchute_video
  • Username : creator
  • Channel name : meta.channel_id
  • Video title : title
  • Video description : meta.description
Platform endpoint : bitchute_comment
  • Username : creator
  • Username full : fullname
  • Post content : content
Example API JSON

LBRY/Odysee

Description
Collected Data
Key Fields
Resources
LBRY is a blockchain-based, peer-to-peer file-sharing and payment network. The primary use cases for LBRY are to serve as the protocol backing social networks and video sharing platforms. The creators also run Odysee - which is one such alt-tech video sharing network that is run on the LBRY network. Odysee is reported to contain COVID-19 misinformation, neo-Nazi propaganda, antisemitic conspiracy theories and other white nationalist content.
SMAT focuses collection on videos, users and comments posted on LBRY beginning with a seed set and aiming at spidering to the entire network.
Platform endpoint : lbry_video
  • Username : name or normalized_name
  • Channel username : signingchannel.name or signingchannel.normalized_name
  • Video title : title
  • Video description : value.description
  • Post URL : permanenturl or short_url
Platform endpoint : lbry_comment
  • Username : channel_name
  • Post content : comment
  • Video comment is responding to : video_canonical_url
Example API JSON
(note this is someone responding "holohoax" to Lauren Southern doing genocide denial about mass graves of indigenous children in "Canada")

Wimkin

Description
Collected Data
Key Fields
Resources
Wimkin is an alt-tech social network that is argues its main selling point being the promotion of free speech. The site is known to allow calls to violence, videos and recordings of violent acts, and threats. Wimkin was pulled from major app stores. The site itself is known to host QAnon, far-right militia and white nationalist groups.
SMAT aims to collect all posts, comments and user profiles on Wimkin.
Platform endpoint : wimkin
  • Username : author_username
  • Username full : author
  • Post content : content
Example API JSON

Poal

Description
Collected Data
Key Fields
Resources
Poal is an alt-tech threaded forum site, similar to its traditional counterpart, Reddit. Poal argues a “free speech” approach to their community which as a result means little content moderation. Poal contains harmful and harassing content like neo-Nazi and antisemitic hate speech and white nationalist propaganda.
SMAT aims to crawl all posts, comments and user profiles on Poal.
Platform endpoint : poal
  • Username : user
  • Post content : content
  • Subforum : sub
Example API JSON

Minds

Description
Collected Data
Key Fields
Resources
Minds is a peer-to-peer blockchain-based social network that reportedly migrated to the Ethereum network. Minds exists as an alternative to mainstream social networking platforms. Minds is known white supremacist and far-right content due to its lack of enforcement on hate and harassment and free speech maximization stance.
SMAT aims to crawl all posts, comments and user profiles on Minds.
Platform endpoint : minds
  • Username : user.username
  • Username full : user.name
  • Post content : body
  • Access id : access_id
  • Cryptocurrency information : wire_threshold
    • Support tier : support_tier
      • has_tokens
      • has_usd
      • description
      • name
  • Is content monetized : monetized
  • P2P Boosted : p2p_boosted
Example API JSON

MeWe

Description
Collected Data
Key Fields
Resources
MeWe is an alt-tech social networking site that operates as a Facebook clone. MeWe has little content moderation and as a result contains harmful content like QAnon, “Stop the Steal”, and COVID-19 misinformation. MeWe is also shown to host US domestic terrorist groups like the Boogaloo movement.
SMAT aims to crawl all posts, comments and user profiles on MeWe.
Platform endpoint : mewe
  • Username : username
  • Post content : content
  • External URLs : external_urls
  • Hashtags : hashtags
Example API JSON
Last modified 18d ago