🧙 Mage is an open-source data management platform that helps you clean data and prepare it for training AI/ML models.

Overview

Intro

Mage is an open-source data management platform that helps you clean data and prepare it for training AI/ML models.

What does this do?

The current version of Mage includes a data cleaning UI tool that can run locally on your laptop or can be hosted in your own cloud environment.

Why should I use it?

Using a data cleaning tool enables you to quickly visualize data quality issues, easily fix them, and create repeatable data cleaning pipelines that can be used in production environments (e.g. online re-training, inference, etc).

Table of contents

  1. Quick start
  2. Features
  3. Roadmap
  4. Contributing
  5. Community

Quick start

Install library

$ pip install git+https://github.com/mage-ai/mage-ai.git

Launch tool

Load your data, connect it to Mage, and launch the tool locally.

From anywhere you can execute Python code (e.g. terminal, Jupyter notebook, etc.), run the following:

import mage_ai
import pandas as pd


df = pd.read_csv('/path_to_data')
mage_ai.connect_data(df, name='name_of_dataset')
mage_ai.launch()

Open http://localhost:5000 in your browser to access the tool locally.

To stop the tool, run this command: mage_ai.kill()

Cleaning data

After building a data cleaning pipeline from the UI, you can clean your data anywhere you can execute Python code:

import mage_ai
import pandas as pd


df = pd.read_csv('/path_to_data')
mage_ai.clean(df, pipeline_uuid='name_of_cleaning_pipeline') #=> returns cleaned dataframe

More resources

  • Here is a step-by-step guide on how to use the tool.
  • Check out the tutorials to quickly become a master of magic.

Features

  1. Data visualizations
  2. Reports
  3. Cleaning actions
  4. Data cleaning suggestions

Data visualizations

Inspect your data using different charts (e.g. time series, bar chart, box plot, etc.).

Here’s a list of available charts.

dataset visualizations

Reports

Quickly diagnose data quality issues with summary reports.

Here’s a list of available reports.

dataset reports

Cleaning actions

Easily add common cleaning functions to your pipeline with a few clicks. Cleaning actions include imputing missing values, reformatting strings, removing duplicates, and many more.

If a cleaning action you need doesn’t exist in the library, you can write and save custom cleaning functions in the UI.

Here’s a list of available cleaning actions.

cleaning actions

Data cleaning suggestions

The tool will automatically suggest different ways to clean your data and improve quality metrics.

Here’s a list of available suggestions.

suggested cleaning actions

Roadmap

Big features being worked on or in the design phase.

  1. Encoding actions (e.g. one-hot encoding, label hasher, ordinal encoding, embeddings, etc.)
  2. Data quality monitoring and alerting
  3. Apply cleaning actions to columns and values that match a condition

Here’s a detailed list of features and bugs that are in progress or upcoming.

Contributing

We welcome all contributions to Mage; from small UI enhancements to brand new cleaning actions. We love seeing community members level up and give people power-ups!

Check out the contributing guide to get started by setting up your development environment and exploring the code base.

Got questions? Live chat with us in Slack.

Anything you contribute, the Mage team and community will maintain. We’re in it together!

Community

We love the community of Magers (/ˈmājər/); a group of mages who help each other realize their full potential!

To live chat with the Mage team and community, please join the free Mage Slack channel.

For real-time news and fun memes, check out the Mage Twitter.

To report bugs or add your awesome code for others to enjoy, visit GitHub.

License

WIP

Comments
  • Add mongodb data loader

    Add mongodb data loader

    Summary

    Edit custom query mongodb in field text (pymongo) If not edit, auto add query 'collection.find()' image

    Edit connection to mongodb image

    Tests

    Testing in df return to data transform

    cc:

    opened by DanePham 10
  • [nt] Ace code editor

    [nt] Ace code editor

    Summary

    Add a cooler and more feature intensive code editor

    • Switch current code editor to use Ace
    • Add in line numbers and syntax highlighting
    • Implement NextJS workaround with instructions for adding more functionality to the editor.

    Tests

    1. Typed words and ran commands using the new editor for Custom Code & Filter actions.
    image

    Create code for custom actions

    1. Testing it on the Suggestions List code button
    image

    Able to edit from the suggestions and apply

    1. Update the instructions in the export pipelines
    image image

    Read only

    cc: @johnson-mage

    opened by nathaniel-mage 10
  • Mage can't find template files

    Mage can't find template files

    I have this problem whenever I try to create a new block (except for transformers and sensors for some reason), where jinja can't load the template. The template files are present in C:\Users\benja\anaconda3\envs\mage-test\Lib\site-packages\mage_ai\data_preparation\templates in the folders data_exporters, data_loaders etc... Which I think should be the path where they are loaded from

    I'm on Windows 11 Browser is Chrome (however this shouldn't matter)

    Error:

    Traceback (most recent call last):
    
      File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\tornado\web.py", line 1702, in _execute
        result = method(*self.path_args, **self.path_kwargs)
      File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\mage_ai\server\api\blocks.py", line 96, in post
        upstream_block_uuids=payload.get('upstream_blocks', []),
      File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\mage_ai\data_preparation\models\block\__init__.py", line 338, in create
        pipeline_type=pipeline.type if pipeline is not None else None,
      File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\mage_ai\data_preparation\templates\template.py", line 82, in load_template
        pipeline_type=pipeline_type,
      File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\mage_ai\data_preparation\templates\template.py", line 61, in fetch_template_source
        template_source = __fetch_data_loader_templates(config, pipeline_type=pipeline_type)
      File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\mage_ai\data_preparation\templates\template.py", line 107, in __fetch_data_loader_templates
        template_env.get_template(template_path).render(
      File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\jinja2\environment.py", line 997, in get_template
        return self._load_template(name, globals)
      File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\jinja2\environment.py", line 958, in _load_template
        template = self.loader.load(self, name, self.make_globals(globals))
      File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\jinja2\loaders.py", line 125, in load
        source, filename, uptodate = self.get_source(environment, name)
      File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\jinja2\loaders.py", line 194, in get_source
        pieces = split_template_path(template)
      File "C:\Users\benja\anaconda3\envs\mage-test\lib\site-packages\jinja2\loaders.py", line 35, in split_template_path
        raise TemplateNotFound(template)
    jinja2.exceptions.TemplateNotFound: data_loaders\default.jinja
    data_loaders\default.jinja
    
    bug 
    opened by Travior 9
  • [nt]  column metric additions

    [nt] column metric additions

    Summary

    Improve the column reports page

    • Refactor cell and tables for undefined values
    • Add progress bar to percentage metrics
    • Add more quality metrics to the page.

    Tests

    Went on localhost

    Long table

    image

    Short Table

    image

    cc: @johnson-mage

    Note this has big changes to design as it removes the border from the cells for the datatable.

    opened by nathaniel-mage 9
  • [nt] Add correlation table for high correlations

    [nt] Add correlation table for high correlations

    Summary

    Display a table of high correlation values

    • On the column detail visualization view, add table for "Columns with high correlation" next to the correlations heatmap, for all correlation values of abs(0.5) or more.

    Tests

    Table with a highly correlated column

    image

    Table with a low correlation

    image

    Table with no correlation

    image

    cc: @johnson-mage

    opened by nathaniel-mage 9
  • [dy] Add server code

    [dy] Add server code

    Summary

    Add code for API endpoints, DB models, and client library methods. The code for the models is a bit messy, so I'm going to try to improve that a bit, but it works for now.

    TODO:

    • [ ] Add unit tests
    • [ ] Add iFrame

    Tests

    tested in local jupyter notebook Screen Shot 2022-05-24 at 10 14 12 AM

    requests: Screen Shot 2022-05-24 at 10 21 33 AM Screen Shot 2022-05-24 at 10 21 20 AM Screen Shot 2022-05-24 at 10 21 03 AM

    cc: @wangxiaoyou1993

    opened by dy46 9
  • [sp] add box plot

    [sp] add box plot

    Summary

    • see above

    Tests

    Tested "primary", "secondary", and "danger" props. Sample data distribution displays correctly.

    image image

    One-sided distributions also display correctly.

    image

    Added tooltips (cursor looks off in the GIF but it works correctly):

    2022-06-28 17 18 24

    cc: @johnson-mage @dy46

    opened by shrey-mage 7
  • [sg] Filter pipeline runs by status

    [sg] Filter pipeline runs by status

    Summary

    Added a query parameter called "status" URI -- /api/pipeline_runs?_limit=30&_offset=0&status=failed Added a filter for filtering results by "status" in process_pipeline_runs( )

    Tests

    Needs further testing on all possible status values

    cc: @tommydangerous @wangxiaoyou1993

    opened by soumojit 6
  • Cannot Create New Pipeline

    Cannot Create New Pipeline

    I have a project running with:

    • mage start
    • mage version 0.3.4
    • windows machine using WSL

    I seem to be unable to create a new pipeline. I can press the new pipeline button on the UI and refresh the page so it appears, but then I get the error:

    Traceback (most recent call last):
    
      File "/mnt/c/Users/path_to/venv/lib/python3.7/site-packages/tornado/web.py", line 1702, in _execute
    
        result = method(*self.path_args, **self.path_kwargs)
    
      File "/mnt/c/Users/path_to/venv/lib/python3.7/site-packages/mage_ai/server/server.py", line 111, in get
    
        pipeline = Pipeline.get(pipeline_uuid)
    
      File "/mnt/c/Users/path_to/venv/lib/python3.7/site-packages/mage_ai/data_preparation/models/pipeline.py", line 137, in get
    
        return Pipeline(uuid)
    
      File "/mnt/c/Users/path_to/venv/lib/python3.7/site-packages/mage_ai/data_preparation/models/pipeline.py", line 38, in __init__
    
        self.load_config_from_yaml()
    
      File "/mnt/c/Users/path_to/venv/lib/python3.7/site-packages/mage_ai/data_preparation/models/pipeline.py", line 247, in load_config_from_yaml
    
        self.load_config(self.get_config_from_yaml())
    
      File "/mnt/c/Users/path_to/venv/lib/python3.7/site-packages/mage_ai/data_preparation/models/pipeline.py", line 269, in load_config
    
        blocks = [build_shared_args_kwargs(c, Block) for c in self.block_configs]
    
    TypeError: 'NoneType' object is not iterable
    
    'NoneType' object is not iterable
    
    opened by connorjoleary 6
  • [sk] Updated templates + clients to use new IO configuration framework.

    [sk] Updated templates + clients to use new IO configuration framework.

    Summary

    Integrated new configuration into templates and clients.

    Tests

    Local testing was performed by querying data warehouses and databases using the new framework. Unit tests were updated.

    cc: @wangxiaoyou1993

    opened by skunichetty 6
  • Collinearity Identification Speed Up

    Collinearity Identification Speed Up

    Summary

    This implementation uses the correlation matrix formula to compute the VIFs. This change should speed up the computation by ~$p^2$, with $p$ the number of params.

    However, the suggestion is now "less" informative, and just says that there is a problem collinearity with the returned columns.

    The issue with the current implementation is that it is not column order invariant. This is particularly concerning in cases such as the test below, where $z$ and $x$ are functions of some of the other variables, but the other variables are removed because they are first in the ordering.

    Rather than removing columns in case VIF>5, maybe provide users with the matrix of variance decomposition proportions which gives greater insight into the structure of the relationships between the variables.

    @tommydangerous, as an idea, expand suggestions to allow for the inclusion of graphs. Then, you could include a heat map of variance decomposition proportions, and cast a spell of illumination on the correlation structure for users.

    Tests

    rng = np.random.default_rng()
    
    t = rng.normal(loc=4., scale=5., size=1000)
    u = rng.normal(loc=4., scale=5., size=1000)
    v = rng.normal(loc=3., scale=9., size=1000)
    w = rng.normal(loc=7., scale=1., size=1000)
    
    x = 2.*u + 7.*v + rng.normal(loc=7., scale=1., size=1000)
    z = w + 6.*v + rng.normal(loc=1., scale=6., size=1000)
    df = pd.DataFrame(np.stack([t, u, v, w, x, z], axis=-1),
                      columns=["t", "u", "v", "w", "x", "z"])
    
    rcc = RemoveCollinearColumns(df, df.dtypes, [])
    print(rcc.evaluate())
    
    opened by ChadGueli 6
  • Bump json5 from 1.0.1 to 1.0.2 in /mage_ai/frontend

    Bump json5 from 1.0.1 to 1.0.2 in /mage_ai/frontend

    Bumps json5 from 1.0.1 to 1.0.2.

    Release notes

    Sourced from json5's releases.

    v1.0.2

    • Fix: Properties with the name __proto__ are added to objects and arrays. (#199) This also fixes a prototype pollution vulnerability reported by Jonathan Gregson! (#295). This has been backported to v1. (#298)
    Changelog

    Sourced from json5's changelog.

    Unreleased [code, diff]

    v2.2.3 [code, diff]

    v2.2.2 [code, diff]

    • Fix: Properties with the name __proto__ are added to objects and arrays. (#199) This also fixes a prototype pollution vulnerability reported by Jonathan Gregson! (#295).

    v2.2.1 [code, diff]

    • Fix: Removed dependence on minimist to patch CVE-2021-44906. (#266)

    v2.2.0 [code, diff]

    • New: Accurate and documented TypeScript declarations are now included. There is no need to install @types/json5. (#236, #244)

    v2.1.3 [code, diff]

    • Fix: An out of memory bug when parsing numbers has been fixed. (#228, #229)

    v2.1.2 [code, diff]

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 0
  • Run a pipeline without the need for decorators (data_loaders, transformers, data_exporters)

    Run a pipeline without the need for decorators (data_loaders, transformers, data_exporters)

    I'm testing Mage to see if it can be a good fit for a project I'm working on. I already have a pipeline in notebooks, using Spark Structured Streaming. I tried to just copy the notebooks to mage and use scratchpad, and it works fine, but then it's not possible to run it as a pipeline. I then tried to use data loader, transformer and data exporter, but I get some strange errors.

    The pipeline reads from a Kafka topic using spark structured streaming, so first I thought I could just wrap the spark read method in a decorator and it would work. So I created a streaming pipeline, but then I can't make the data loader in spark and have to use the kafka yaml file. So instead I tried to use batch pipeline. Here is a super simple example

    from pyspark.sql import SparkSession
    
    if 'data_loader' not in globals():
        from mage_ai.data_preparation.decorators import data_loader
    if 'test' not in globals():
        from mage_ai.data_preparation.decorators import test
    
    spark = SparkSession.builder.appName("foo").getOrCreate()
    
    
    @data_loader
    def load_data(*args, **kwargs):
        return (spark.readStream.format("kafka")
                .option("kafka.bootstrap.servers", "localhost:9093")
                .option("subscribe", "example")
                .option("startingOffsets", "earliest")
                .load())
    
    
    @test
    def test_output(df, *args) -> None:
        """
        Template code for testing the output of the block.
        """
        assert df is not None, 'The output is undefined'
    
    

    But I get the following error

    ---------------------------------------------------------------------------
    AnalysisException                         Traceback (most recent call last)
    Cell In[7], line 71
         68     else:
         69         return find(lambda val: val is not None, output)
    ---> 71 df = execute_custom_code()
         73 # Post processing code below (source: output_display.py)
         76 def __custom_output():
    
    Cell In[7], line 55, in execute_custom_code()
         50     block.run_upstream_blocks()
         52 global_vars = {'env': 'dev', 'execution_date': datetime.datetime(2023, 1, 2, 21, 0, 21, 934826), 'event': {}} or dict()
    ---> 55 block_output = block.execute_sync(
         56     custom_code=code,
         57     global_vars=global_vars,
         58     analyze_outputs=True,
         59     update_status=True,
         60     test_execution=True,
         61 )
         62 if False:
         63     block.run_tests(custom_code=code, update_tests=False)
    
    File /usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/models/block/__init__.py:575, in Block.execute_sync(self, analyze_outputs, build_block_output_stdout, custom_code, execution_partition, global_vars, logger, run_all_blocks, test_execution, update_status, store_variables, verify_output, input_from_output, runtime_arguments, dynamic_block_index, dynamic_block_uuid, dynamic_upstream_block_uuids)
        568     if logger is not None:
        569         logger.exception(
        570             f'Failed to execute block {self.uuid}',
        571             block_type=self.type,
        572             block_uuid=self.uuid,
        573             error=err,
        574         )
    --> 575     raise err
        576 finally:
        577     if update_status:
    
    File /usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/models/block/__init__.py:544, in Block.execute_sync(self, analyze_outputs, build_block_output_stdout, custom_code, execution_partition, global_vars, logger, run_all_blocks, test_execution, update_status, store_variables, verify_output, input_from_output, runtime_arguments, dynamic_block_index, dynamic_block_uuid, dynamic_upstream_block_uuids)
        542 if store_variables and self.pipeline.type != PipelineType.INTEGRATION:
        543     try:
    --> 544         self.store_variables(
        545             variable_mapping,
        546             execution_partition=execution_partition,
        547             override_outputs=True,
        548             spark=(global_vars or dict()).get('spark'),
        549             dynamic_block_uuid=dynamic_block_uuid,
        550         )
        551     except ValueError as e:
        552         if str(e) == 'Circular reference detected':
    
    File /usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/models/block/__init__.py:1210, in Block.store_variables(self, variable_mapping, execution_partition, override, override_outputs, spark, dynamic_block_uuid)
       1208     if spark is not None and type(data) is pd.DataFrame:
       1209         data = spark.createDataFrame(data)
    -> 1210     self.pipeline.variable_manager.add_variable(
       1211         self.pipeline.uuid,
       1212         uuid_to_use,
       1213         uuid,
       1214         data,
       1215         partition=execution_partition,
       1216     )
       1218 for uuid in removed_variables:
       1219     self.pipeline.variable_manager.delete_variable(
       1220         self.pipeline.uuid,
       1221         uuid_to_use,
       1222         uuid,
       1223     )
    
    File /usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/variable_manager.py:72, in VariableManager.add_variable(self, pipeline_uuid, block_uuid, variable_uuid, data, partition, variable_type)
         70 variable.delete()
         71 variable.variable_type = variable_type
    ---> 72 variable.write_data(data)
    
    File /usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/models/variable.py:153, in Variable.write_data(self, data)
        151     self.__write_parquet(data)
        152 elif self.variable_type == VariableType.SPARK_DATAFRAME:
    --> 153     self.__write_spark_parquet(data)
        154 elif self.variable_type == VariableType.GEO_DATAFRAME:
        155     self.__write_geo_dataframe(data)
    
    File /usr/local/lib/python3.10/site-packages/mage_ai/data_preparation/models/variable.py:279, in Variable.__write_spark_parquet(self, data)
        277 def __write_spark_parquet(self, data) -> None:
        278     (
    --> 279         data.write
        280         .option('header', 'True')
        281         .mode('overwrite')
        282         .csv(self.variable_path)
        283     )
    
    File /usr/local/lib/python3.10/site-packages/pyspark/sql/dataframe.py:338, in DataFrame.write(self)
        326 @property
        327 def write(self) -> DataFrameWriter:
        328     """
        329     Interface for saving the content of the non-streaming :class:`DataFrame` out into external
        330     storage.
       (...)
        336     :class:`DataFrameWriter`
        337     """
    --> 338     return DataFrameWriter(self)
    
    File /usr/local/lib/python3.10/site-packages/pyspark/sql/readwriter.py:731, in DataFrameWriter.__init__(self, df)
        729 self._df = df
        730 self._spark = df.sparkSession
    --> 731 self._jwrite = df._jdf.write()
    
    File /usr/local/lib/python3.10/site-packages/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args)
       1315 command = proto.CALL_COMMAND_NAME +\
       1316     self.command_header +\
       1317     args_command +\
       1318     proto.END_COMMAND_PART
       1320 answer = self.gateway_client.send_command(command)
    -> 1321 return_value = get_return_value(
       1322     answer, self.gateway_client, self.target_id, self.name)
       1324 for temp_arg in temp_args:
       1325     temp_arg._detach()
    
    File /usr/local/lib/python3.10/site-packages/pyspark/sql/utils.py:196, in capture_sql_exception.<locals>.deco(*a, **kw)
        192 converted = convert_exception(e.java_exception)
        193 if not isinstance(converted, UnknownException):
        194     # Hide where the exception came from that shows a non-Pythonic
        195     # JVM exception message.
    --> 196     raise converted from None
        197 else:
        198     raise
    
    AnalysisException: 'write' can not be called on streaming Dataset/DataFrame
    
    

    Any idea why this isn't working?

    Will it be possible in the future to run a pipeline without the data loader, transformer and data exporter?

    opened by albertfreist 1
  • As a user, I want to filter and/or search all my pipeline runs

    As a user, I want to filter and/or search all my pipeline runs

    Is your feature request related to a problem? Please describe. I have thousands of pipeline runs for a single pipeline. I want to filter and search for a specific one using values in the pipeline run’s variables.

    Example: filter or search by execution date range

    opened by tommydangerous 0
  • Can you provide support for user authentication so that we can set up different groups of users?

    Can you provide support for user authentication so that we can set up different groups of users?

    Is your feature request related to a problem? Please describe. We are planning to set up a local instance of Mage and are wondering if it is possible to support user authentication. We would like different users to be able to have their own unique account settings."

    Describe the solution you'd like User authentication.

    Describe alternatives you've considered Grouping on a shared account.

    Additional context N/A

    opened by c0090555 3
  • [Docs] Add initial documentation setup and configuration

    [Docs] Add initial documentation setup and configuration

    Summary

    This PR enables the initial documentation setup for Mage, currently available at docs.mage.ai

    social-media-image-mage

    🚀 Setup

    Simply merge in this PR and your documentation will be connected!

    👩‍💻 Development

    Install the Mintlify CLI to preview the documentation changes locally. To install, use the following command

    npm i mintlify -g
    

    Run the following command at the root of your documentation (where mint.json is)

    mintlify dev
    

    😎 Publishing Changes

    Changes will be deployed to production automatically after pushing to the default branch.

    You can also preview changes using PRs, which generates a preview link of the docs.

    Troubleshooting

    • Mintlify dev isn't running - Run mintlify install it'll re-install dependencies.
    • Mintlify dev is updating really slowly - Run mintlify clear to clear the cache.
    opened by hanywang2 0
  • As a user, I want to be able to run complex pipelines where each task runs in its own Docker container

    As a user, I want to be able to run complex pipelines where each task runs in its own Docker container

    Is your feature request related to a problem? Please describe. Yes, the problem is that in complex production environments, different tasks within a pipeline often require different programming languages and/or Python environments. This can make it difficult to manage and execute these tasks in an efficient and scalable way.

    Describe the solution you'd like I would like the ability to define pipelines where tasks can be executed in separate Docker containers, potentially using multiple Docker images. This would allow for better isolation and management of the different environments and languages required for each task.

    Describe alternatives you've considered One alternative that I have considered is using Argo to run tasks as Docker containers. However, using YAML to define pipelines can become unwieldy for larger projects and organizations. In the past, I have worked on production projects where we used Airflow with the KubernetesPodOperator, which provided a more scalable solution.

    Additional context If Mage is intended as a replacement for Airflow, it is important that it support the most common setup used by mature organizations in production environments. This includes the ability to run each task in its own isolated environment using Docker containers.

    opened by asydorchuk 3
Owner
Mage
Magical tools for AI/ML
Mage
Xtreme1 - The Next GEN Platform for Multisensory Training Data.

Intro BasicAI Xtreme1 is an open-source suite that speedily develops and iterates your datasets and models. The built-in AI-assisted tools take your l

BasicAI 248 Dec 30, 2022
The LMS (Life Management System) is a free tool for personal knowledge management and goal management based on Obsidian.md.

README Documentation | 中文帮助 The LMS (Life Management System) is a tool for personal knowledge management and goal management based on Obsidian.md. It

null 27 Dec 21, 2022
Grupprojekt för kurserna 'Javascript med Ramverk' och 'Agil Utveckling'

JavaScript-med-Ramverk-Laboration-3 Grupprojektet för kurserna Javascript med Ramverk och Agil Utveckling. Utvecklingsguide För information om hur utv

Svante Jonsson IT-Högskolan 3 May 18, 2022
Hemsida för personer i Sverige som kan och vill erbjuda boende till människor på flykt

Getting Started with Create React App This project was bootstrapped with Create React App. Available Scripts In the project directory, you can run: np

null 4 May 3, 2022
Kurs-repo för kursen Webbserver och Databaser

Webbserver och databaser This repository is meant for CME students to access exercises and codealongs that happen throughout the course. I hope you wi

null 14 Jan 3, 2023
Campus hiring and training automation platform.

⚡ Supported Use Cases Student register themselves on the portal Student fills details in their academic profile Student opt for campus hiring / intern

null 7 Aug 24, 2022
A curated collection of common interview questions to help you prepare for your next interview.

30 Seconds of Interviews A curated collection of common interview questions to help you prepare for your next interview. This README is built using ma

30 seconds 11k Jan 7, 2023
Search for food, recepies, and full detailed information on how to prepare them.

Foodipy | JavaScript Capstone This is a group project being built in our second module of our curriculum at microverse. its a web application for list

Alexander Oguzie-Ibeh 10 Mar 24, 2022
Personal project to a student schedule classes according his course level. Using GraphQL, Clean Code e Clean Architecture.

classes-scheduler-graphql This is a personal project for student scheduling, with classes according his course level. I intend to make just the backen

Giovanny Lucas 6 Jul 9, 2022
Open-source CD platform that helps developers to deliver applications efficiently by simplifying software releases and operations in any environment.

dyrector.io - The open source internal delivery platform Overview dyrector.io is an open-source internal delivery platform that helps developers to de

dyrector.io 160 Jan 3, 2023
An Open-Source Platform to certify open-source projects.

OC-Frontend This includes the frontend for Open-Certs. ?? After seeing so many open-source projects being monetized ?? without giving any recognition

Open Certs 15 Oct 23, 2022
Kustomizegoat - Vulnerable Kustomize Kubernetes templates for training and education

KustomizeGoat - Vulnerable by design Kustomize deployment Demonstrating secure a

Bridgecrew 38 Nov 1, 2022
A tool to build courses and training decks.

Training platform ?? You can read more about this project on our blog ???? Quick start pipenv shell # Start infrastructure (database, local email ser

Premier Octet 13 Nov 23, 2022
ZaDark is an open source extension that helps you enable Dark Mode for Zalo on PC and Browser.

ZaDark – Zalo Dark Mode Table of Contents About Install Roadmap Contributing License Contact Awards Acknowledgments About ZaDark is an open source ext

Nguyễn Chánh Đại 64 Dec 22, 2022
Odoo Javascript Framework Training (public version)

Introduction to JS framework Introduction For this training, we will put ourselves in the shoes of the IT staff for the fictional Awesome T-Shirt comp

Géry Debongnie 26 Dec 16, 2022
Edrys is an open-source app that helps you teach remotely.

The Open Remote Teaching Platform ?? Join our newsletter for updates & community showcases! Edrys is an open-source app that helps you teach remotely.

Edrys 236 Dec 13, 2022
Cindy Dorantes 12 Oct 18, 2022
The Frontend of Escobar's Inventory Management System, Employee Management System, Ordering System, and Income & Expense System

Usage Create an App # with npx $ npx create-nextron-app my-app --example with-javascript # with yarn $ yarn create nextron-app my-app --example with-

Viver Bungag 4 Jan 2, 2023