⚡️ Next-generation data transformation framework for TypeScript that puts developer experience first

Overview

TypeStream

Next-generation data transformation framework for TypeScript that puts developer experience first

Nowadays, almost every developer is working with increasingly complex and varying types of data. While tooling for this problem already exists, current solutions are heavy to use, targeted towards big enterprises and put little to no emphasis on developer experience.

TypeStream allows you to get started within seconds, iterate blazingly fast over type-safe transformation code and work with common data storage services either locally or in the cloud.

Here's how it could be integrated into your workflow:

Getting started

Make sure you have Node.js (at least 16.0.0) installed and scaffold a new project using:

$ npm init typestream -- --get-started

Opening the project

Note: Right now, we only officially support Visual Studio Code as some important TypeStream features like zero-setup debugging require editor-specific configuration.

To get started developing your project, open the created folder in VS Code. At this point, you will probably be asked whether you want to use the workspace TypeScript version: press "Allow" to continue. If you don't see the prompt, you can also configure this manually.

Working on a pipe

Pipes are at the core of what TypeStream does as they contain the data transformation code of your project. Since you've specified the --get-started flag while creating the project, you should already see a pipe under src/pipes/transform-product.ts. Feel free to read through it to get a general idea of what it contains.

To try out the pipe and experiment with changes, you can start TypeStream in watch mode. To do that, open up an integrated terminal (this is necessary for debugging support) and run the following command:

$ npx tyst watch <pipe-name>

Make sure to replace <pipe-name> with the name of the pipe you want to work on. If you're following the getting started guide, that's going to be transform-product.

If everything's working correctly, TypeStream should now download a number of sample files and then attempt to process them using the pipe. Since you're in watch mode, TypeStream will start over whenever you save the file, allowing you to quickly experiment with changes to your transformation.

At this point, feel free to play around with the code and give all of TypeStream's different features a try, some of which are documented in the example file, others right here in the README.

If you get stuck with anything, want to suggest a new feature, or share general feedback, please don't hesitate to reach out to us by creating an issue — we'd love to hear from you! ❤️

Features

Iterate blazingly fast over your transformation code

When writing software, being able to directly see how the changes you've made affect the output is a key feature for efficient and fun development. Thus we have designed TypeStream in a way that let's you see your transformed data anywhere in your pipeline and update it every time you save your code. If there are errors in your transformation you will get an aggregated overview over the complete sample of datapoints your testing on.

Step into edge cases, right when they are happening

When working with a lot of data, it's impossible to know every edge case upfront. That's why you'll hit a breakpoint right when an edge case breaks your transformation code to see what the outlier data looks like. You can also set your own breakpoints anywhere in your transformation code and step through one data sample at a time

Automatic type inference

Everyone who has used a strictly typed language before will love features like advanced IntelliSense, catching bugs at compile-time and the like. Using typed you can infer the type of any variable in your pipe based on a statistically relevant sample.

Data source agnostic

Want to read and write data from your local file system, Google Cloud Storage, S3, BigQuery or Redshift? All at once? No problem! TypeStream’s modular resource system allows to read from and write to most common storage systems.

Multi-step pipelines

To keep things more maintainable or to aggregate multiple streams of data into one you can push into a resource in one pipe and consume it in the next.

Concepts

The three core concepts to understand when working with TypeStream are resources, documents and pipes. To make each of them more tangible, we will work with an example use-case. If you want to get a more hands-on feeling for them, you can also use the getting started guide. An example use case could be that you have raw product data of two different eCommerce platforms - let's say Amazon and eBay. Your goal is to take the raw data from each provider, transform it into a common format and put it into a common storage so you can work with it.

Resources

One resource holds many documents that are all described by the same concept and have a similar structure. Each resource will also have different metadata that describe where its data can be retrieved from. Thus, for all of your raw amazon and ebay products you could define your resources as follows:

const amazonProduct = new S3Resource('raw-amazon-product', {
  region: 'eu-central-1',
  bucket: 'business-data',
  pathPrefix: 'amazon-products/2022/',
})

const ebayProduct = new CloudStorageResource('raw-ebay-product', {
  cloudStorageProject: 'typestream',
  bucket: 'business-data',
  pathPrefix: 'ebay-products/2022/',
})

// Used to write the transformed data into
const allProducts = new FileResource('transformed-product', {
  basePath: '/Users/typestream/data',
  recursive: true,
})

Note that for each type of storage there will be a different resource class with different kinds of parameters required. As of now, TypeStream supports the following resources:

  • Google Cloud Storage
  • AWS S3
  • BigQuery
  • AWS Redshift (coming soon...)
  • Local file system

The standard authentication method for both GCP and AWS is authentication via default credentials. You can find the documentation on how to set up these for each platform here:

Alternatively, you can also provide explicit authentication for a project. If these environment variables are set, default credentials will be ignored entirely. You can set the environment variables by putting their values in the generated .env file of your project:

Documents

Documents are the containers of the data you’re working with. While you will never have to create a document yourself because TypeStream takes care of this under the hood, it makes sense to understand their properties.

Each document has data which will usually be in the form of a Buffer. You can call the read() method of the document to retrieve the data in raw form or helpers like asJson(), asHtml() or asText() to automatically parse the data into the respective format. If the document doesn't contain i.e. valid JSON, an error will be thrown.

const buffer = await doc.read() // Buffer
const json = await doc.asJson() // any
const html = await doc.asHtml() // HTMLElement (node-html-parser)
const text = await doc.asText() // string

You can also work with the document’s metadata without ever calling read() on it. What this looks like is dependent on what kind of resource the document belongs to. Metadata could for example hold information about the MIME-type of a Google Cloud Storage object or the path of a file in the local file system.

if (doc.metadata.contentType === 'application/json') console.log('Found JSON!')

Pipes

Pipes are the essential building blocks when working with TypeStream. You can think of them as connectors between resources.

Each pipe has an origin resource from which it will consume data. When defining the pipe, you can transform the data of a document and then publish it to one or more target resources.

Screenshot 2022-03-18 at 14 00 33

Working with the example from above, you could write a pipe that reads the documents from amazonProducts, transforms them in any desired way and publishes them to the allProducts resource.

export default definePipe(ebayProducts, async ctx => {
  const rawProduct = typed('RawProduct', await ctx.doc.asJson())

  const transformedData = ctx.publish({
    // Your transformation code goes here...
    resource: allProducts,
    data: transformedData,
    metdata: { name: transformedData.name },
  })
})

You can now write a second pipe for your ebayProducts resource and also publish them into allProducts. When hosted via TypeStream Cloud, these pipes will listen for new objects being added to your resources and process them automatically.

Transformation utilities

Transforming a lot of data, you easily find yourself repeating different processes time over time. To mitigate this problem TypeStream comes with a few simple utitilities. Each of these utilities is further documented in the TypeStream library

dump()

While using tyst watch on a pipe, dump() can be used to store all intermediate results into a single file. This can be used to quickly understand how changes in the transformation code affect the output. Every time you save your pipe, dump will overwrite the new intermediate results.

const intermediateResult = {
  /** ...your data here*/
}
dump(intermediateResult)

pick()

pick() can be used to comfortably select a few keys from a messy object. If the object is typed, there will also be autocomplete/type errors on the keys you choose.

const messyObject = { key1: 1, key2: 2, key3: 3, key4: 4, key5: 5 }
const prunedObject = pick(messyObject, ['key1', 'key3'])

Hydration utilities

When extracting data from server side rendered applications, automatically extracting the hydration from an HTML response can save a lot of time and nerves.

const hydration = extractJsonAssignments(htmlString)
const hydration = extractJsonAssignmentsFromDocument(htmlElement)
const hydration = extractJsonScriptsFromDocument(htmlElement)

Array utilities

Utilities to write more readable code when dealing with arrays

products.sort(basedOn(_ => _.price, 'desc'))
products.sort(basedOnKey('price', 'desc'))
products.sort(
  basedOnMultiple([
    ['price', 'desc'],
    ['discount', 'asc'],
  ]),
)

sumOf(products.map(product => product.price))
You might also like...

Prisma is a next-generation object–relational mapper (ORM) that claims to help developers build faster and make fewer errors.

This is a Next.js project bootstrapped with create-next-app. Getting Started First, run the development server: npm run dev # or yarn dev Open http://

Oct 8, 2022

Framework agnostic CLI tool for routes parsing and generation of a type-safe helper for safe route usage. 🗺️ Remix driver included. 🤟

Framework agnostic CLI tool for routes parsing and generation of a type-safe helper for safe route usage. 🗺️ Remix driver included. 🤟

About routes-gen is a framework agnostic CLI tool for routes parsing and generation of a type-safe helper for safe route usage. Think of it as Prisma,

Jan 2, 2023

A new generation GUI automation framework for Web and Desktop Application Testing and Automation.

A new generation GUI automation framework for Web and Desktop Application Testing and Automation.

Clicknium-docs Clicknium is a new generation GUI automation framework for all types of applications. It provides easy and smooth developer experience

Dec 19, 2022

A developer directory built on Next.js and MongoDB Atlas, deployed on Vercel with the Vercel + MongoDB integration.

A developer directory built on Next.js and MongoDB Atlas, deployed on Vercel with the Vercel + MongoDB integration.

MongoDB Starter – Developer Directory A developer directory built on Next.js and MongoDB Atlas, deployed on Vercel with the Vercel + MongoDB integrati

Dec 20, 2022

A base project for Express with Typescript to create an API. Includes automatic input validation and Swagger UI generation.

A base project for Express with Typescript to create an API. Includes automatic input validation and Swagger UI generation.

(Typescript) Express API with input Validation and Swagger UI Thats a mouthful isn't it. Typescript: The language used, a superset of Javascript with

Oct 26, 2022

A framework for every developer

Deprecated! Thanks for everyone who supported this framework, try Nexts N I T R O J S A framework for every developer NitroJS • Discord Why you should

Jun 23, 2022

Next-gen mobile first analytics server (think Mixpanel, Google Analytics) with built-in encryption supporting HTTP2 and gRPC. Node.js, headless, API-only, horizontally scaleable.

Next-gen mobile first analytics server (think Mixpanel, Google Analytics) with built-in encryption supporting HTTP2 and gRPC. Node.js, headless, API-only, horizontally scaleable.

Introduction to Awacs Next-gen behavior analysis server (think Mixpanel, Google Analytics) with built-in encryption supporting HTTP2 and gRPC. Node.js

Dec 19, 2022

There can be more than Notion and Miro. Affine is a next-gen knowledge base that brings planning, sorting and creating all together. Privacy first, open-source, customizable and ready to use.

There can be more than Notion and Miro. Affine is a next-gen knowledge base that brings planning, sorting and creating all together. Privacy first, open-source, customizable and ready to use.

AFFiNE.PRO The Next-Gen Knowledge Base to Replace Notion & Miro. Planning, Sorting and Creating all Together. Open-source, Privacy-First, and Free to

Jan 9, 2023

Preline UI is an open-source set of prebuilt UI components based on the utility-first Tailwind CSS framework.

Preline UI is an open-source set of prebuilt UI components based on the utility-first Tailwind CSS framework.

Preline UI is an open-source set of prebuilt UI components based on the utility-first Tailwind CSS framework. Why use Preline UI? Based on the Tailwin

Jan 3, 2023
Comments
  • What would it take to get this working with nvim?

    What would it take to get this working with nvim?

    This project looks very interesting, but I don't like using VSCode. If I'm going to use this extensively, I'd like to implement a port to make it work with nvim. Do you suggestions on where to start with this?

    EDIT: realized I wrote TypeScript instead of VSCode.

    opened by vonadz 0
Owner
Scopas Technologies
The data to elevate your business, it's out there.
Scopas Technologies
Grupprojekt för kurserna 'Javascript med Ramverk' och 'Agil Utveckling'

JavaScript-med-Ramverk-Laboration-3 Grupprojektet för kurserna Javascript med Ramverk och Agil Utveckling. Utvecklingsguide För information om hur utv

Svante Jonsson IT-Högskolan 3 May 18, 2022
Hemsida för personer i Sverige som kan och vill erbjuda boende till människor på flykt

Getting Started with Create React App This project was bootstrapped with Create React App. Available Scripts In the project directory, you can run: np

null 4 May 3, 2022
Kurs-repo för kursen Webbserver och Databaser

Webbserver och databaser This repository is meant for CME students to access exercises and codealongs that happen throughout the course. I hope you wi

null 14 Jan 3, 2023
WunderGraph is the Next-Generation API Developer Platform.

WunderGraph Quickstart • Website • Docs • Examples • Blog • Discord • Twitter What is WunderGraph? WunderGraph is the Next-Generation API Developer Pl

WunderGraph 861 Jan 4, 2023
The first place winning no-code platform for generating developer resume pages, designed for and submitted to the 2022 Tech Optimum Hackathon.

Genfolio Genfolio is a no-code platform for generating developer portfolios. A demo can be found on the project's devpost or on youtube. Our stack We

Lenny 4 Dec 5, 2022
Experience Lab is a set of utilities that assist in creating instances of Microsoft Energy Data Services, performing data loads, and performing basic management operations.

Experience Lab - Microsoft Energy Data Services Build Status About Experience Lab is an automated, end-to-end deployment accelerator for Microsoft Ene

Microsoft 9 Dec 14, 2022
⚡ the first open-source redis client made with care and acessibility-first 🚀

⚡ Redis UI The first open-source project to create an awesome and accessible UI for Redis as a native desktop application. ✨ ?? ?? How to develop loca

Nicolas Lopes Aquino 14 Dec 5, 2022
📬 Lightweight Typescript-first framework built on top of Express

?? abstain Lightweight Typescript-first framework built on top of Express [WIP] ?? api // index.ts import { Application } from '@pinkcig/abstain'; imp

Faye Keller 5 May 26, 2022
Software for the next generation of social media. https://gitlab.com/soapbox-pub/soapbox-fe

Soapbox FE Soapbox FE is a frontend for Mastodon and Pleroma with a focus on custom branding and ease of use. It's part of the Soapbox project. Try it

Soapbox 52 Dec 30, 2022
👩‍🎤 Headless, type-safe, UI components for the next generation Web3.Storage APIs.

Headless, type-safe, UI components for the next generation Web3.Storage APIs. Documentation beta.ui.web3.storage Examples React Sign up / Sign in Sing

Web3 Storage 47 Dec 22, 2022