Open-source dataset mapper

Overview

RDM Dataset Mapper

RDM is an open-source dataset mapper. It intends to help developers transferring data from external sources into their database.

⚠️ This is a work in progress. Do NOT use it in production unless you know what you're doing!

Installation

The project is currently only available in GitHub. To install it, clone the repository and install its dependencies with the package manager of your choice (here I'm using yarn).

git clone https://github.com/lsmacedo/rdm
cd rdm
yarn install

Usage

Although RDM is not yet available for general use, it's already possible to test it with dataset files and HTTP requests. The required steps are described below:

  1. Create a new project by running yarn rdm-init
  2. Update the contents of the generated rdm.json file
  3. cd to the project directory
  4. Execute the data transfer with yarn rdm-apply

RDM File

The RDM File contains configurations about a data migration. It includes specification for the input (dataset) and the output (database).

There is still no documentation as the file specification keeps changing during the initial stages of the project. Check one of the examples for reference on how to configure a RDM File.

Comments
  • Update templates

    Update templates

    Should not require {{}} for static values.

    Suggestions:

    "type": "'video'",
    "active": true (or "true"),
    "order": 0 (or "0"),
    "name": "_.path + '/' + _.name",
    "duration": "_.duration / 60"
    

    Create a regex for identifying the pattern table.column. Should probably trim fields before working with them.

    opened by lsmacedo 0
  • Allow iterating over object keys

    Allow iterating over object keys

    Example object:

    "bpi": {
      "USD": {
        "code": "USD",
        "symbol": "$",
        "rate": "29,911.4932",
        "description": "United States Dollar",
        "rate_float": 29911.4932
      },
      "GBP": {
        "code": "GBP",
        "symbol": "£",
        "rate": "23,800.9640",
        "description": "British Pound Sterling",
        "rate_float": 23800.964
      },
      "EUR": {
        "code": "EUR",
        "symbol": "€",
        "rate": "27,997.2174",
        "description": "Euro",
        "rate_float": 27997.2174
      }
    }
    

    We should be able to iterate through each key as a row and get their values.

    Suggestion:

    "alias": {
      "_currency": "_.bpi__*__"
    },
    "tables": {
      "currency": {
        "set": {
          "name": "_currency.code",
          "label": "_currency.description",
          "rate": "_currency.label"
        }
      }
    }
    
    opened by lsmacedo 0
  • Stop creating temporary table

    Stop creating temporary table

    It is not necessary to create a temporary table. We can instead directly select values from a CTE.

    Example:

    with "cte__" as (
      values (1, 'one'),
      (2, 'two'),
      (3, 'three')
    ) s("id", "name")
    
    opened by lsmacedo 0
  • Add support for other column types

    Add support for other column types

    Some data parsing might be necessary to insert into tables that are not of type text (not yet tested). Ideally, RDM should be able to run a database introspection to determine the correct column types. Focus on getting text, boolean, integer, float and date working for this issue.

    SELECT
        column_name,
        data_type
    FROM
        information_schema.columns
    WHERE
        table_name = 'table_name';
    
    opened by lsmacedo 0
  • Configure database url in RDM Object

    Configure database url in RDM Object

    Developer should be able to add data from each dataset into different databases.

    Suggestion:

    ...
    "database": {
      "url": "{{env.SOUNTR_DB_URL}}"
    },
    ...
    
    opened by lsmacedo 0
  • Create basic command line interface

    Create basic command line interface

    Commands suggestion:

    • Create a new project: rdm init [--name <name> | -n <name>] [--input-type <type> | -it <type>]
    • Execute data transfer: rdm apply

    Initializing a RDM project creates a new directory with a RDM file.

    opened by lsmacedo 0
  • Add environment variables support to remote mapping

    Add environment variables support to remote mapping

    Allow to set headers from RDM Object. Suggestion:

    ...
    "headers": {
      "Authentication": "'Bearer ' + env.API_AUTH"
    },
    ...
    
    opened by lsmacedo 0
  • Implement merge strategies

    Implement merge strategies

    • One for upserting data based on unique identifiers from table
    • One for only updating existing data
      • Option to fail if it doesn't exist
    • One for only inserting data if it doesn't exist
      • Option to fail if it exists
      • Option to clear all before inserting
    opened by lsmacedo 0
  • Add support for alias inside RDM object

    Add support for alias inside RDM object

    Going to be useful when mapping data from json files.

    Suggestion:

    ...
    alias: {
      "$track": "_.track__items__track"
    },
    fields: {
      ...
      "track.name": "$track.name",
      ...
    },
    ...
    
    opened by lsmacedo 0
  • Parse field strings to a new type

    Parse field strings to a new type

    • Create type RdmField (props: name, entity, isTemplate)
    • Parse field strings from RDM file to RdmFields
    • Remove util functions to get entity/field from string
    opened by lsmacedo 0
  • Apply with command line arguments

    Apply with command line arguments

    Suggestion:

    rdm apply --args="id=x"
    

    Or

    rdm apply --id x
    

    Code:

    "type": "http",
    "url": "http://localhost:3000/wh",
    "params": {
      "id": "args.id"
    }
    
    opened by lsmacedo 0
  • Fix insert with nested json arrays and option failIfExists

    Fix insert with nested json arrays and option failIfExists

    Should not try to insert multiple rows for same record.

    Example, for the following object:

    {
      "track": "Track",
      "artist": ["Artist 1", "Artist 2"]
    }
    

    Return the following from flattenObjectIntoArrayOfRows:

    ----
    track: Track
    artist: Artist 1
    ----
    artist: Artist 2
    ----
    
    opened by lsmacedo 0
  • Create database/tables/columns/constraints from dataset

    Create database/tables/columns/constraints from dataset

    Use case: someone has a dataset, for example a json file, and wants to export it to a database in order to execute queries.

    Person should configure the RDM File as usual, and the CLI asks them if the missing tables/columns/database/unique constraint should be created.

    opened by lsmacedo 0
  • Automatically identify unique keys

    Automatically identify unique keys

    Use some database introspection to identify the unique keys from each table. When inserting/upserting, we should be able to identify which of them are being used.

    Make sure it also works for partial unique indexes.

    opened by lsmacedo 0
  • Add conditionals to field templates

    Add conditionals to field templates

    Allow the following:

    ...
    "album": {
      "set": {
        "is_single": "_.type == 'single'",
        "is_album": "_.tracks_count > 4"
      }
    }
    ...
    
    opened by lsmacedo 0
  • Allow fetching nested data from remote datasets

    Allow fetching nested data from remote datasets

    There are scenarios where we have one endpoint to list data and another to get more detailed data for one specific id.

    Suggestion:

    • [x] Multiple inputs
    • [ ] Join responses.
    opened by lsmacedo 0
Owner
Lucas Silveira
Lucas Silveira
📃 A public dataset of crypto addresses labeled

EVM Labels A public dataset of crypto addresses labeled (Ethereum and more) Ethereum Label CSV JSON Updated exchange (Centralized Exchanges) View CSV

earnifi 69 Jan 7, 2023
API para o Desafio 2.1 - Consumindo um dataset de filmes

A progressive Node.js framework for building efficient and scalable server-side applications. Description Nest framework TypeScript starter repository

Juliana Oliveira 3 Sep 14, 2022
Reference for How to Write an Open Source JavaScript Library - https://egghead.io/series/how-to-write-an-open-source-javascript-library

Reference for How to Write an Open Source JavaScript Library The purpose of this document is to serve as a reference for: How to Write an Open Source

Sarbbottam Bandyopadhyay 175 Dec 24, 2022
An Open-Source Platform to certify open-source projects.

OC-Frontend This includes the frontend for Open-Certs. ?? After seeing so many open-source projects being monetized ?? without giving any recognition

Open Certs 15 Oct 23, 2022
Shikhar 4 Oct 9, 2022
This is a project for open source enthusiast who want to contribute to open source in this hacktoberfest 2022. 💻 🎯🚀

HACKTOBERFEST-2022-GDSC-IET-LUCKNOW Beginner-Hacktoberfest Need Your first pr for hacktoberfest 2k22 ? come on in About Participate in Hacktoberfest b

null 8 Oct 29, 2022
A recreation of a startpage posted on Reddit without the source, so I rewrote it in Next.js + Tailwind for the open source community.

Startpage "Figma Balls" Rewrite Why Did I Make This I saw a startpage posted on the subreddit r/startpages that I thought looked nice, but when I look

Thomas Leon Highbaugh 5 Mar 29, 2022
Let's participate in Hacktoberfest and contribute to open-source. Star the repo and open a PR to get accepted.

Let's Contribute To Open-source First Contributions This project aims to simplify and guide the way, beginners can make their first contribution towar

Ehmad Saeed⚡ 5 Dec 3, 2022
freeCodeCamp.org's open source codebase and curriculum. Learn to code for free.

freeCodeCamp.org's open-source codebase and curriculum freeCodeCamp.org is a friendly community where you can learn to code for free. It is run by a d

freeCodeCamp.org 359.2k Jan 7, 2023
:books: The definitive guide to TypeScript and possibly the best TypeScript book :book:. Free and Open Source 🌹

TypeScript Deep Dive I've been looking at the issues that turn up commonly when people start using TypeScript. This is based on the lessons from Stack

Basarat Ali Syed 18.7k Jan 4, 2023
A Open Source API for hadiths in Bangla for the first time.

Hadith API A API for hadiths in bengali language for the first time! From this API you will get: Hadith narrator Hadith number Hadith authenticity API

Md. Rakibur Rahman Talukder 21 Dec 13, 2022
Open Source projects are a project to improve your JavaScript knowledge with JavaScript documentation, design patterns, books, playlists.

It is a project I am trying to list the repos that have received thousands of stars on Github and deemed useful by the JavaScript community. It's a gi

Cihat Salik 22 Aug 14, 2022
Bringing an all Open-Source Platform to study Data Structures and Algorithms ⚡

NeoAlgo-Docs Bringing an all Open-Source Platform to study Data Structures and Algorithms ⚡ ?? Installation You will need to have NodeJS and Yarn inst

Tesseract Coding 24 Jun 2, 2022
Open Source ResearchHub - Web

Setup This project was bootstrapped with Create Next App. Find the most recent version of this guide at here. And check out Next.js repo for the most

ResearchHub 22 Dec 16, 2022
A refined tool for exploring open-source projects on GitHub with a file tree, rich Markdown and image previews, multi-pane multi-tab layouts and first-class support for Ink syntax highlighting.

Ink codebase browser, "Kin" ?? The Ink codebase browser is a tool to explore open-source code on GitHub, especially my side projects written in the In

Linus Lee 20 Oct 30, 2022
Open Source and Embedded Nano Faucet

NanoDrop Open Source, Transparent and Embedded Nano Faucet Visit: https://nanodrop.io This project was created to help bring Nano to the masses. Fauce

Anarkrypto 30 Dec 26, 2022
Making participation to open source easy for everyone

Torii ⛩️ Your Gateway to open source ⛩️ View Demo · Report Bug · Request Feature Introducing Torii ⛩️ Your gateway to Open Souce Contributing to open

Gaurav Tewari 45 Oct 27, 2022
Free, open source game engine online

microStudio is a free, open source game engine online. It is also a platform to learn and practise programming. microStudio can be used for free at ht

Gilles 719 Dec 30, 2022