Skip to contents

Bluesky Social is a decentralized social network built on the AT Protocol. For users, this means it is possible to programmatically access a broad range of public data from the network. Bluesky organizes information as a series of records, each defined by a structured schema known as a lexicon. These lexicons (e.g., app.bsky.*) specify the types of data (such as posts, profiles, or social relationships) and the methods for querying them. The bskyr package provides an R interface to these endpoints, abstracting the underlying API and returning data in a tidy format suitable for analysis.

This vignette provides an overview of how to collect public data from Bluesky using the bskyr package. It focuses on common data collection tasks such as retrieving user profiles, gathering posts, exploring threads, accessing social relationships, and examining likes or reposts.

Setup and Authentication

Begin by loading the bskyr package:

Before gathering data, you need to authenticate with Bluesky. (This requires having a Bluesky account and creating an App Password in your account settings.) In this vignette, we assume you have already set your Bluesky username and App Password via set_bluesky_user() and set_bluesky_pass() (or by storing them in your .Renviron). If so, you can create an authenticated session with:.

auth <- bs_auth(user = bs_get_user(), pass = bs_get_pass())

This authentication is used internally by most functions, though explicitly creating a session at the beginning of your script is recommended to avoid repeated logins. Note that this is not necessary to do for every function call, as your authentication is automatically cached for the session and refreshed if it gets stale.

Working with your own data

Bluesky’s API distinguishes between data you can access about your own account versus public data about others. A few endpoints only allow fetching your personal data (for managing your account) and not other users’. For example, you can retrieve your preferences with bs_get_preferences(), but you cannot retrieve another user’s preferences. Similarly, you can only get your list of blocks or mutes, and your notifications. .

For instance, to fetch your current preference settings (such as content filtering preferences):

Other self-focused functions include retrieving your blocked accounts (bs_get_blocks()), your muted lists (bs_get_muted_lists()), and your notifications (bs_get_notifications()). These functions correspond to lexicons in the app.bsky.notification.* or app.bsky.graph.* namespaces, and they only return data for the authenticated user. .

In the remainder of this vignette, we’ll focus on gathering public data about other users and content on Bluesky. All such data is accessible via the Bluesky API (with your authentication), provided the information is public.

Public Data about Other Users

Most bskyr functions are designed to gather data about other users or content they’ve created. We will demonstrate these using a sample account: chriskenny.bsky.social. (This is the Bluesky handle of the bskyr package author, Christopher T. Kenny.) Using this account as an example, we’ll show how to retrieve various types of information:.

  • Profile information (user metadata)
  • User feeds / posts (the content they have posted)
  • Threads and replies (conversation threads involving that user’s posts)
  • Followers and follows (the user’s social graph)
  • Likes and other interactions (posts the user liked, and who liked the user’s posts)

Each subsection below focuses on one of these data types, with example code and notes on the relevant lexicons.

Accessing Profile Information

A basic starting point is retrieving a user’s profile. The bs_get_profile() function queries the app.bsky.actor.getProfile lexicon to fetch profile details for one or more users. For example, to get the profile of chriskenny.bsky.social, use bs_get_profile().

profile <- bs_get_profile('chriskenny.bsky.social')
profile

This returns metadata such as the user’s handle, display name, description, and follower counts. To retrieve multiple profiles:.

bs_get_profile(actors = c('chriskenny.bsky.social', 'simko.bsky.social'))

Gathering Posts from a User

To access posts authored by a user, use bs_get_author_feed(), which queries the app.bsky.feed.getAuthorFeed lexicon:

feed <- bs_get_author_feed('chriskenny.bsky.social')
feed |>
  dplyr::select(uri, like_count, reply_count)

Each row in the feed tibble represents a post made by the user. Key columns include the post content (text), the number of likes and replies, and potentially other metadata like repost count, timestamps (created_at), and unique post IDs (uri and cid). By default, bs_get_author_feed() returns the most recent batch of posts (the Bluesky API typically returns up to 50 at a time). .

If the user has more posts, the cursor returned can be used to paginate:

more_posts <- bs_get_author_feed('chriskenny.bsky.social', cursor = attr(feed, 'cursor')$cursor)

Retrieving Threads and Replies

Social interactions on Bluesky often happen in threads: a post and its replies (and replies to those replies, and so on). The bskyr package provides tools to retrieve entire conversation threads so you can analyze the context of a post. .

To get a full thread for a particular post, use bs_get_post_thread(). This function calls the app.bsky.feed.getPostThread lexicon, which returns the specified post, its ancestors (if it was a reply itself), and all of its replies (and replies to those replies, up to a specified depth). In other words, it fetches the whole conversation tree. .

Suppose we have a specific post by me that we want to examine in context. If we know the post’s URI or URL, we can retrieve the thread. For example, using a post URL (as you might copy from the Bluesky app or web):.

thread <- bs_get_post_thread('at://did:plc:ic6zqvuw5ulmfpjiwnhsr2ns/app.bsky.feed.post/3k7qmjev5lr2s')
thread

This returns the conversation context around a specific post. If the post URI is known, bs_get_posts() can be used to retrieve it directly:.

post <- bs_get_posts('https://bsky.app/profile/chriskenny.bsky.social/post/3loagm2phgk2t')
post$record[[1]] |>
  dplyr::select(`$type`, text, created_at)

Accessing Social Graph Data

Bluesky, like other social platforms, allows users to follow each other. The app.bsky.graph.* lexicons provide access to this social graph. In bskyr, you can list a user’s followers and who a user is following (their “follows”) with dedicated functions.

Both functions query the corresponding lexicons app.bsky.graph.getFollowers and app.bsky.graph.getFollows and return tibbles of user profiles. .

For example, to get all followers of chriskenny.bsky.social:

followers <- bs_get_followers('chriskenny.bsky.social')
followers

Each row represents one follower, including their handle, display name, and bio (description). The followers tibble may also include each follower’s did and possibly counts or mutual follow info (e.g., whether you follow them, if you are authenticated). By default, a certain number of followers are returned (Bluesky’s API uses pagination via cursors here as well). If the user has a lot of followers, you can use the cursor and limit arguments in bs_get_followers() to page through all results. .

Similarly, to get the list of accounts that chriskenny.bsky.social follows (his “following” list):

follows <- bs_get_follows('chriskenny.bsky.social')
follows

This returns a tibble of the users that I am following. The structure is the same as for followers (each row is a user profile). You could, for instance, compare the two lists to find overlaps, or combine them to analyze mutual connections. These functions rely on the Bluesky graph endpoints and are great for mapping out the network around a user.

Likes and Reposts

Finally, let’s look at how to gather data on post interactions like likes (and similarly, reposts). Bluesky treats likes and reposts as separate record types, and the API provides endpoints to query them. Posts liked by a user: To fetch the posts that a given user has liked (the content of their “Likes” tab on their profile), use bs_get_likes(). This function hits the app.bsky.feed.getActorLikes lexicon and returns a tibble of posts. Essentially, it’s the reverse of bs_get_author_feed(), instead of posts the user authored, it gives posts the user liked. For example:.

liked_posts <- bs_get_likes('bskyr.bsky.social')
liked_posts |>
  dplyr::select(author_handle, record_text)

Each row here is a post that was liked by chriskenny.bsky.social, showing who the author of that post is (author.handle) and the post content (text). The full data frame would also include details like the post’s URI, timestamps, and engagement counts. This is a great way to retrieve a user’s liked content for further analysis (for instance, to see what topics or users they engage with). .

Who liked a specific post: Conversely, if you want to see which users have liked a particular post, you can use bs_get_post_likes(). This corresponds to the app.bsky.feed.getLikes lexicon and returns a tibble of actors. You need to specify the post by its URI or a bsky.app URL. For example, if we want to find out who liked one of my posts (identified by a known URI):.

bs_get_post_likes('at://did:plc:wpe35pganb6d4pg4ekmfy6u5/app.bsky.feed.post/3lnghukd7vk22')

This would return a tibble of users (handles, display names, etc.) who have liked that post. It’s similar in structure to the followers list we saw earlier. By examining this, you could see the audience engaging with a particular piece of content. Reposts: Bluesky also has a concept of “reposts” (analogous to retweets), where a user rebroadcasts someone else’s post. To get the users who reposted a given post, you can use bs_get_reposts(), which calls the app.bsky.feed.getRepostedBy endpoint. Its usage is just like bs_get_post_likes(), you provide a post URI or URL, and it returns a tibble of users who reposted that post. For brevity, we won’t show a full example, but it works in much the same way.

bs_get_reposts('at://did:plc:wpe35pganb6d4pg4ekmfy6u5/app.bsky.feed.post/3lnghukd7vk22')

would give the list of accounts that reposted the specified post.

Conclusion

By combining these functions, you can gather a rich set of data from Bluesky Social. For instance, you might collect a user’s profile and posts, then fetch all replies to those posts, the list of users who follow them, and the posts they’ve liked. This opens up many possibilities for analysis: social network analysis (using follows data), content analysis (using posts and threads), and engagement analysis (using likes and reposts).

In this vignette, we demonstrated how to use bskyr to gather various types of public data from the Bluesky Social network. We covered retrieving user profiles, fetching a user’s posts, exploring conversation threads, listing followers and follows, and accessing likes (and other interactions). All of these tasks are made possible by Bluesky’s open AT Protocol and its lexicon-defined API endpoints, which bskyr conveniently wraps in R functions. .

With these tools, R users can treat Bluesky as a data source for research or development. You can pull data on social connections, user behavior, and content trends from Bluesky and then apply the vast ecosystem of R packages for data analysis and visualization. As the Bluesky network grows and its API evolves, bskyr will aim to keep up, providing an efficient bridge between the AT Protocol and R.


DISCLAIMER: This vignette has been written with help from ChatGPT 4o. It has been reviewed for correctness and edited for clarity by the package author. Please note any issues at https://github.com/christopherkenny/bskyr/issues.