[Announcement] NEAR Lake Framework - brand new word in indexer building approach

khorolets · March 30, 2022, 4:39pm

Greetings from the Data Platform Team! We are happy and proud to announce an MVP release of a brand new word in indexer building approach - NEAR Lake Framework.

What indexers are used for?

Blockchains are great at providing a way to apply the requested changes to the account state in a decentralized manner. However, in order to observe the changes, you need to actively pull the information from the network. You might have done it through JSON-RPC, but it is not efficient. Instead, we created an abstraction that streams the information block by block to your application, and it is called Indexer.

Let me explain by example. Let’s say you sell e-books via contract on the blockchain. Once a book is bought, you want to send a file to the buyer via email. While you can ask for a customer email to be sent in the transaction, you can’t directly send those emails from the chain. What you can do is, have an off-chain helper with e-book files and email sending implementation. You just need to empower your off-chain part with an indexer that will analyze the incoming blocks to find the approval that an e-book is bought, and trigger the rest of the off-chain pipeline that will send the email.

What’s wrong with the current NEAR Indexer Framework?

There is nothing wrong with NEAR Indexer Framework. It was designed as a minimum wrapper around the nearcore node that still provides a great facility to implement Indexers that are part of the decentralized NEAR network.

However, for the end users who built their own indexers and \run them, becoming a node operator was something unexpected and undesirable. Running the node requires a lot of resources and maintenance is a time-consuming thing.

So if your project required an indexer, you had no choice but to become a node operator and deal with all the pains like block syncing, regular maintenance, keeping up to date with the nearcore releases, etc.

Main disadvantages of using NEAR Indexer Framework:

You need a lot of resources in terms of hardware and costs
You constantly need to follow nearcore releases (if you miss the protocol upgrade your indexer will get stuck)
If for some reason your indexer got stuck and you missed it, you could miss the data and have to handle blockchain data backups to speed up the syncing process for your node
Syncing process
Almost impossible to debug locally when you want to check the testnet or mainnet

What is so good about NEAR Lake Framework?

INFO stats: #61265200 Downloading blocks 97.93% (1089267)

If you feel the pain after seeing the line above this announcement you’d love this section!

NEAR Lake Framework is a microframework on top of the S3 storage which makes it easy to build your own indexers! The S3 storage is filled in by the NEAR Lake Indexer that is maintained by the Data Platform team at Pagoda Inc.

Shortly it resolves all the problems listed above.

MVP version of NEAR Lake Framework consumes ~145MB of RAM and we’re going to improve it
It doesn’t depend on nearcore and it is not a NEAR node so you don’t need to upgrade it every time nearcore cuts a release
No syncing process is involved anymore, so even if your indexer got stuck you can restart it from any block and run with it immediately
There is no huge data folder anymore, so you don’t need to pay for quite expensive 1TB SSD drives
Want to debug your indexer on mainnet locally? No problem, do whatever you need

How to use it?

In order to answer this question, we have prepared the video tutorial with a simple example to give you an overview and some practical ideas.

Source code on GitHub

How does it work?

The project consists of two pieces. The first one is NEAR Lake which is an old-school indexer built on top of NEAR Indexer Framework. We run this indexer on our own, it indexes all the blockchain data and stores it in AWS S3 buckets (testnet, mainnet).

The buckets are configured in such a way that the requester pays for their usage which enables you to consume the data from AWS S3 buckets and pay to AWS by yourself.

And the second piece is NEAR Lake Framework which is a Rust library that allows you to build your own indexer in a similar way that NEAR Indexer Framework does but it reads the blocks data from the AWS S3 bucket instead of an embedded NEAR node so the data stream is getting available immediately on start! You only need to provide the AWS credentials so you can be charged for the reading access.

By the way, we consider implementing the NEAR Lake Framework library in JavaScript as our next step. Let us know if you are interested in it, so we can keep you updated! Subscribe to our newsletter here.

Running your own indexer has become easier than ever before. We encourage you to create your indexers, we encourage you to migrate your indexers to NEAR Lake Framework if you already run one. Leave your questions or feedback in this thread, if you have issues building or migrating to NEAR Lake, we encourage you to open issues in the repositories or ask on StackOverflow with tag “nearprotocol”

Links:

GitHub - near/near-lake-framework-rs: Library to connect to the NEAR Lake S3 and stream the data NEAR Lake Framework official repo
GitHub - near/near-lake-indexer: Watch NEAR network and store all the events as JSON files on AWS S3 NEAR Lake indexer repo (this one the Data Platform Team is running)
GitHub - near-examples/near-lake-raw-printer: An example of NEAR Lake Framework usage that prints the raw data from the stream simple example of a data printer built on top of NEAR Lake Framework
GitHub - near-examples/near-lake-accounts-watcher: A source code for a video tutorial on how to use the [NEAR Lake Framework](https://github.com/near/near-lake-framework) another simple example of the indexer built on top of NEAR Lake Framework for a tutorial purpose
Newest 'nearprotocol' Questions - Stack Overflow “nearprotocol” StackOverflow
Subscribe to our Pagoda Dev Newsletter to keep up with the most recent product updates!

Benji · April 4, 2022, 3:30pm

To give some context, my team at Fayyr has been running indexers for over 10 months now. We have 1 dev and 2 prod indexers running on 3 separate VMs each with 500 Gb SSD, 8 core CPU, and 32 Gb RAM. This costs us a little over 1100$ / month.

When @khorolets reached out to me with the lake framework proposal, I was bouncing off the walls. Instead of running 3 separate VMs with the specs listed above, we’re now running two VMs each with 1 GB RAM, 25 GB SSD, and 1 core CPU. One runs production and the other is capable of running both a production and development indexer. Instead of spending 1100$ / month on virtual machines, we’re now spending 10$. This is a 99.09% decrease in price!!!

Not only was the migration flawless and extremely quick, but we never have to worry about nearcore releases or updating our indexers again. The difference in speed at which blocks are streamed is not noticeable and having this framework makes running indexers accessible to the average developer.

Huge props to the team for putting this together. I’ve had no issues so far and look forward to seeing what advancements come in the future.

GIF by Hyper RPG

tahpot · April 22, 2022, 1:49am

This looks like a really great option. We are evaluating this as part of adding support for NEAR into the Verida Vault mobile app to support private data storage, identity and crypto transactions for NEAR apps.

As an aside, at Verida we have built a highly performant decentralized database network for Web3. This would be ideally suited to replace the S3 model with a decentralized model.

Basically, the indexer data would be written to a decentralized database on the Verida network. All data would be signed (ie: NEAR foundation signs data that originates from their indexer). I expect it would be faster than S3. It is also a real database, so supports indexes and queries etc.

Any third party could be granted read access to the database and sync it (either to their own decentralized database or to a private database).

It would also be possible to wrap a token payment around it, ie: It costs 10 NEAR / month to access the decentralized database containing NEAR indexer data.

Let me know if there’s any interest in exploring this further.

edwardchew97 · June 8, 2022, 6:17pm

Wondering if this service is available?
Near Lake Framework comes in handy when we need to subscribe to certain events but in our case, we need to retrieve accounts’ transaction history and Near Public postgre database is kinda slow for production usage.

We can’t find a very good solution for that other than saving all data on our SSD for now.

frol · June 21, 2022, 9:27am

Basically, the indexer data would be written to a decentralized database on the Verida network. All data would be signed (ie: NEAR foundation signs data that originates from their indexer). I expect it would be faster than S3. It is also a real database, so supports indexes and queries etc.

It would be great to see a prototype of that! I imagine you could start with using the current near-lake-framework to put the data from S3 to Verida network and then adjust near-lake-framework to use Verida network as a backend. Would you be interested to work on this project and present your results?

frol · June 21, 2022, 9:33am

Wondering if this service is available?

Do you refer to Verina service?

Near Lake Framework comes in handy when we need to subscribe to certain events but in our case, we need to retrieve accounts’ transaction history and Near Public postgre database is kinda slow for production usage.

Is your use-case lean towards analytical type of queries (aggregation operations) or transactional (recent operations)? Data Platform team is working on providing access to Redshift where the users will pay for what they use and won’t be restricted by the resources we are a capable to cover free of charge.

edwardchew97 · June 21, 2022, 10:01am

Yeah, I was referring to Verina Service and you’re right, the use case is more analytical in which we need to get users’ entire transaction history.

It will be super awesome if we have all the data accessible on Redshift. Appreciate the effort and looking forward to the release!

cloudmex-alan · August 2, 2022, 3:12pm

Guys, can you give support to integrate this indexer to a DApp? We want to use this solution for nativonft.app

isonar · October 5, 2022, 9:43am

Centralized solutions cost less, no surprise about this

Topic		Replies	Views
Indexer as a service. Proof of concept Projects indexer	16	1229	March 30, 2022
Decentralized Indexer as a Service Projects community	0	293	September 23, 2021
Engineering Weekly Update - 2022-04-04 Engineering Updates weekly-updates	3	596	April 18, 2022
[Announcement] Automated Indexer Deployment from PrimeLab Data indexer , development	2	609	April 7, 2022
[Announcement] Indexer ETL & OpenAI powered NLP on NEAR by PrimeLab Data community , indexer , data , development , near-lake	1	918	May 7, 2022