Is your feature request related to a problem? Please describe. Hi, I'm thinking about bringing a new type of training data (graph data) to the SQLFlow database. This is significant since many real world data are non-euclidean such as graphs, and people who uses SQLFlow may encounter such data in their tasks. Deep learning models such as GCN and GAT are powerful to solve graph related problems, and they would be helpful if we include them in the library in the future. However, before we bring these models to the SQLFlow, it would be convenient to have a pre-load graph dataset such as cora in the SQLFlow database so that these models can be trained and tested easily.

This is a rough idea, and the following are some my thoughts on the solutions. It would be good if we could discuss it a bit and any suggestions are appreciated!

Describe the solution you'd like

Part I. Database schema

If we want to use graph related DL models to solve real world problems, there are two things that need to be provided: features, which are the information contained within each node in the graph, and adjacency matrix which represents the graph structure in the format of a matrix (this could be calculated by an edge list). features should be a 2-D tensor with shape (N,D) where N is the number of nodes and D is the dimension of each node's feature vector. adjacency matrix is a 2-D sparse tensor with shape (N,N).

Thus, I'm considering to have two tables in the database, which would be enough to maintain all the information we need in a graph.

Node Table: store the information of each node in one table.

Node Table

id  | name | features | label
-----------------------------------------
105 | node1 | "0 0 1" or [0, 0, 1] | "L1"
106 | node2 | "0 1 0" or [0, 1, 0] | "L2"

The features may be in the form of arrays or vectors, and I guess storing them with type TEXT or JSON would be efficient.

Edge Table: store the graph structures in the form of edges in one table.

Edge Table

id | from_node_id | to_node_id | weight
---------------------------------------
1  | 105          | 106        | 1.0
2  | 106          | 105        | 2.5

From my perspective, these two tables are efficient and powerful to handle most of the graph data. If you find some corner cases that make this design vulnerable, please comment below.

Part II. Loading data

(I'm not familiar with how SQLFlow pass data into Python, so I skip the process of getting data from the two tables above.) The difficult part of loading data is to build the adjacency matrix of the graph. Here are two solutions that I find to be good:

Scipy: Use the Scipy package to get the adjacency matrix. If we have a edge vector (or list of lists) edges with shape (E, 2) where E is the number of edges, we could build the adjacency matrix using following python script:

import numpy as np
import scipy as sp
# coo_matrix((data, (i, j)), [shape=(M, N)])
adjacency = sp.coo_matrix((np.ones(len(edges)),
                    (edges[:, 0], edges[:, 1])),
                    shape=(features.shape[0], features.shape[0]), dtype="int64")

features.shape[0] is the number of nodes (N) and the adjacency matrix adjacency has shape (N, N). The adjacency matrix is a Scipy sparse matrix in the format COO (COO is a fast format for constructing sparse matrices).

NetworkX: Use the NetworkX package to generate a graph, and then build the adjacency matrix automatically. This process can be done with following python script:

import networkx as nx
import scipy as sp
# use cora dataset as an example, and we are creating undirected graph. Use nx.DiGraph() for directed graph.
g = nx.Graph(edges) # edges must be list of lists, and each sublist (A,B) represents the edge between node A and B.
adjacency = nx.adjacency_matrix(G)

adjacency is a Scipy sparse matrix, we could convert it to the COO format by adjacency.tocoo() which will get the adjacency matrix same as above. NetworkX is a brilliant python library that can do graph and network analysis, and it has many other useful tools we may use in the future development. We could decide which method we should apply for loading the data from the database.

Additional Notes Here are some details about the dataset that I would love to add to the database:

cora dataset: The Cora dataset is about a citation network of scientific papers. It consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.

If there is anything that you find valuable or needs to be improved, please let me know.Thanks!

Description run activepower clustering demo in tutorial recieve ' AttributeError: 'EmbeddingColumn' object has no attribute 'key''

demo in tutorial activepower_clustering

I set mysql database locally with sql script

so I have the same data with demo.

then run: %%sqlflow SELECT * FROM sql_flow_test.activepower_train TO TRAIN sqlflow_models.DeepEmbeddingClusterModel WITH model.n_clusters=3, model.pretrain_epochs=10, model.train_max_iters=800, model.train_lr=0.01, model.pretrain_lr=1, train.batch_size=256 COLUMN m1,m2,m3,m4,m5,m6,m7,m8,m9,m10,m11,m12,m13,m14,m15,m16,m17,m18,m19,m20,m21,m22,m23,m24,m25,m26,m27,m28,m29,m30,m31,m32,m33,m34,m35,m36,m37,m38,m39,m40,m41,m42,m43,m44,m45,m46,m47,m48 INTO sqlflow_models.my_activepower_train_model;

run activepower clustering demo in tutorial recieve ' AttributeError: 'EmbeddingColumn' object has no attribute 'key''

==== `2021/05/20 10:02:13 SQLFlow Step Execute:

SELECT * FROM sql_flow_test.activepower_train

TO TRAIN sqlflow_models.DeepEmbeddingClusterModel

WITH

model.n_clusters=3,

model.pretrain_epochs=10,

model.train_max_iters=800,

model.train_lr=0.01,

model.pretrain_lr=1,

train.batch_size=256

COLUMN m1,m2,m3,m4,m5,m6,m7,m8,m9,m10,m11,m12,m13,m14,m15,m16,m17,m18,m19,m20,m21,m22,m23,m24,m25,m26,m27,m28,m29,m30,m31,m32,m33,m34,m35,m36,m37,m38,m39,m40,m41,m42,m43,m44,m45,m46,m47,m48

INTO sqlflow_models.my_activepower_train_model;

Start training using keras model...

2021-05-20 10:02:38.341380 Start pre_train.

2021-05-20 10:02:38.341461 Start preparing training dataset to save into memory.

message:<message:"runSQLProgram error: failed: exit status 1\n==========Generated Code:==========\n# -- coding: utf-8 --\nimport copy\nimport traceback\nimport tensorflow as tf\nimport runtime\nfrom runtime.tensorflow.train import train\nfrom runtime.tensorflow.get_tf_version import tf_is_version2\nfrom tensorflow.estimator import (DNNClassifier,\n DNNRegressor,\n LinearClassifier,\n LinearRegressor,\n BoostedTreesClassifier,\n BoostedTreesRegressor,\n DNNLinearCombinedClassifier,\n DNNLinearCombinedRegressor)\nif tf_is_version2():\n from tensorflow.keras.optimizers import Adadelta, Adagrad, Adam, Adamax, Ftrl, Nadam, RMSprop, SGD\n from tensorflow.keras.losses import BinaryCrossentropy, CategoricalCrossentropy, CategoricalHinge, CosineSimilarity, Hinge, Huber, KLDivergence, LogCosh, MeanAbsoluteError, MeanAbsolutePercentageError, MeanSquaredError, MeanSquaredLogarithmicError, Poisson, SparseCategoricalCrossentropy, SquaredHinge\nelse:\n from tensorflow.train import AdadeltaOptimizer, AdagradOptimizer, AdamOptimizer, FtrlOptimizer, RMSPropOptimizer, GradientDescentOptimizer, MomentumOptimizer\n from tensorflow.keras.losses import BinaryCrossentropy, CategoricalCrossentropy, CategoricalHinge, CosineSimilarity, Hinge, Huber, KLDivergence, LogCosh, MeanAbsoluteError, MeanAbsolutePercentageError, MeanSquaredError, MeanSquaredLogarithmicError, Poisson, SparseCategoricalCrossentropy, SquaredHinge\ntry:\n import sqlflow_models\nexcept Exception as e:\n print("failed to import sqlflow_models: %s", e)\n traceback.print_exc()\n\nfeature_column_names = [\n"dates",\n\n"m1",\n\n"m2",\n\n"m3",\n\n"m4",\n\n"m5",\n\n"m6",\n\n"m7",\n\n"m8",\n\n"m9",\n\n"m10",\n\n"m11",\n\n"m12",\n\n"m13",\n\n"m14",\n\n"m15",\n\n"m16",\n\n"m17",\n\n"m18",\n\n"m19",\n\n"m20",\n\n"m21",\n\n"m22",\n\n"m23",\n\n"m24",\n\n"m25",\n\n"m26",\n\n"m27",\n\n"m28",\n\n"m29",\n\n"m30",\n\n"m31",\n\n"m32",\n\n"m33",\n\n"m34",\n\n"m35",\n\n"m36",\n\n"m37",\n\n"m38",\n\n"m39",\n\n"m40",\n\n"m41",\n\n"m42",\n\n"m43",\n\n"m44",\n\n"m45",\n\n"m46",\n\n"m47",\n\n"m48",\n\n"class",\n]\n\n# feature_column_names_map is used to determine the order of feature columns of each target:\n# e.g. when using DNNLinearCombinedClassifer.\n# feature_column_names_map will be saved to a single file when using PAI.\nfeature_column_names_map = dict()\n\nfeature_column_names_map["feature_columns"] = ["dates","m1","m2","m3","m4","m5","m6","m7","m8","m9","m10","m11","m12","m13","m14","m15","m16","m17","m18","m19","m20","m21","m22","m23","m24","m25","m26","m27","m28","m29","m30","m31","m32","m33","m34","m35","m36","m37","m38","m39","m40","m41","m42","m43","m44","m45","m46","m47","m48","class",]\n\n\nfeature_metas = dict()\n\n\nfeature_metas["dates"] = {\n "feature_name": "dates",\n "dtype": "string",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m1"] = {\n "feature_name": "m1",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m2"] = {\n "feature_name": "m2",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m3"] = {\n "feature_name": "m3",\n "dtype": "float32",\n "delimiter": "",\n "format": ""

,\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m4"] = {\n "feature_name": "m4",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m5"] = {\n "feature_name": "m5",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m6"] = {\n "feature_name": "m6",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m7"] = {\n "feature_name": "m7",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m8"] = {\n "feature_name": "m8",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m9"] = {\n "feature_name": "m9",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m10"] = {\n "feature_name": "m10",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m11"] = {\n "feature_name": "m11",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m12"] = {\n "feature_name": "m12",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m13"] = {\n "feature_name": "m13",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m14"] = {\n "feature_name": "m14",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m15"] = {\n "feature_name": "m15",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m16"] = {\n "feature_name": "m16",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m17"] = {\n "feature_name": "m17",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m18"] = {\n "feature_name": "m18",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "

dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m19"] = {\n "feature_name": "m19",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m20"] = {\n "feature_name": "m20",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m21"] = {\n "feature_name": "m21",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m22"] = {\n "feature_name": "m22",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m23"] = {\n "feature_name": "m23",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m24"] = {\n "feature_name": "m24",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m25"] = {\n "feature_name": "m25",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m26"] = {\n "feature_name": "m26",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m27"] = {\n "feature_name": "m27",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m28"] = {\n "feature_name": "m28",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m29"] = {\n "feature_name": "m29",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m30"] = {\n "feature_name": "m30",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m31"] = {\n "feature_name": "m31",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m32"] = {\n "feature_name": "m32",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m33"] = {\n "feature_name": "m33",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\n

feature_metas["m34"] = {\n "feature_name": "m34",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m35"] = {\n "feature_name": "m35",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m36"] = {\n "feature_name": "m36",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m37"] = {\n "feature_name": "m37",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m38"] = {\n "feature_name": "m38",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m39"] = {\n "feature_name": "m39",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m40"] = {\n "feature_name": "m40",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m41"] = {\n "feature_name": "m41",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m42"] = {\n "feature_name": "m42",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m43"] = {\n "feature_name": "m43",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m44"] = {\n "feature_name": "m44",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m45"] = {\n "feature_name": "m45",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m46"] = {\n "feature_name": "m46",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m47"] = {\n "feature_name": "m47",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m48"] = {\n "feature_name": "m48",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["class"] = {\n "feature_name": "class"

,\n "dtype": "int64",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\n\n\nlabel_meta = {\n "feature_name": "",\n "dtype": "int64",\n "delimiter": "",\n "shape": [1],\n "is_sparse": "false" == "true"\n}\n\nmodel_params=dict()\n\nmodel_params["n_clusters"]=3\n\nmodel_params["pretrain_epochs"]=10\n\nmodel_params["pretrain_lr"]=1\n\nmodel_params["train_lr"]=0.010000\n\nmodel_params["train_max_iters"]=800\n\n\n# Construct optimizer objects to pass to model initializer.\n# The original model_params is serializable (do not have tf.xxx objects).\nmodel_params_constructed = copy.deepcopy(model_params)\nfor optimizer_arg in ["optimizer", "dnn_optimizer", "linear_optimizer"]:\n if optimizer_arg in model_params_constructed:\n model_params_constructed[optimizer_arg] = eval(model_params_constructed[optimizer_arg])\n\nif "loss" in model_params_constructed:\n model_params_constructed["loss"] = eval(model_params_constructed["loss"])\n\n# feature_columns_code will be used to save the training informations together\n# with the saved model.\nfeature_columns_code = """{"feature_columns": [tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_vocabulary_list(key="dates", vocabulary_list=["2/26","3/5","3/17","3/23","4/10","5/5","1/13","1/20","2/24","4/24","6/23","1/6","1/9","6/3","2/10","3/16","2/18","4/11","4/22","5/3","6/22","1/10","1/30","6/4","6/11","6/17","6/26","1/28","5/20","5/21","2/11","3/1","3/8","4/26","5/6","5/9","1/21","2/4","5/17","4/21","4/25","5/29","6/27","2/23","3/3","4/18","4/23","5/12","2/2","3/14","3/22","4/2","4/20","5/8","5/23","5/25","1/26","1/27","6/1","4/13","1/22","3/13","4/6","4/19","2/7","3/27","4/28","6/2","6/8","1/4","1/25","3/9","3/12","3/21","4/5","6/6","2/15","3/6","2/13","3/29","2/8","2/12","2/28","3/2","3/31","4/16","5/14","1/8","1/17","5/7","5/24","2/21","2/25","3/18","4/3","6/7","6/10","6/29","2/16","3/7","6/14","6/16","6/24","1/2","3/20","4/12","5/31","2/5","4/1","5/22","6/9","6/15","6/19","6/25","6/30","1/16","3/24","3/30","6/28","1/15","1/23","1/31","2/17","3/25","4/7","1/7","1/14","5/30","6/12","3/11","3/28","5/10","1/1","1/3","2/27","3/19","4/14","4/30","5/28","6/18","1/12","1/18","6/20","4/9","5/4","5/11","5/13","5/16","1/29","2/1","2/9","2/20","5/18","5/19","1/5","1/24","5/26","6/21","3/15","5/15","4/8","4/15","5/27","1/11","4/4","2/19","3/4","3/26","4/27","4/29","6/13","1/19","2/6","3/10","4/17","5/2","2/14","2/22","6/5","2/3","5/1"]), dimension=128, combiner="sum"),\ntf.feature_column.numeric_column("m1", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m2", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m3", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m4", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m5", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m6", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m7", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m8", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m9", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m10", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m11", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m12", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m13", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_c

olumn("m14", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m15", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m16", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m17", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m18", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m19", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m20", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m21", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m22", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m23", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m24", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m25", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m26", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m27", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m28", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m29", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m30", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m31", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m32", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m33", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m34", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m35", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m36", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m37", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m38", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m39", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m40", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m41", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m42", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m43", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m44", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m45", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m46", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m47", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m48", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("class", shape=[1], dtype=tf.dtypes.int64)]}"""\nfeature_columns = eval(feature_columns_code)\n\ntrain_max_steps = 0\ntrain_max_steps = None if train_max_steps == 0 else train_max_steps\n\ntrain(datasource="mysql://yuepf:yuepf123456@tcp(192.195.253.130:3306)/?maxAllowedPacket=0",\n estimator_string="""sqlflow_models.DeepEmbeddingClusterModel""",\n select="""\nSELECT * FROM sql_flow_test.activepower_train\n""",\n validation_select="""""",\n feature_columns=feature_columns,\n feature_column_names=feature_column_names,\n feature_metas=feature_metas,\n label_meta=label_meta,\n model_params=model_params_constructed,\n validation_metrics="Accuracy".split(","),\n save="model_save",\n batch_size=256,\n epoch=1,\n validation_steps=1,\n verbose=0,\n max_steps=train_max_steps,\n validation_start_delay_secs=0,\n validation_throttle_secs=0,\n save_checkpoints_steps=100,\n log_every_n_iter=10,\n load_pretrained_model="false" == "true",\n is_pai="false" == "true",\n pai_table="",\n pai_val_table="",\n feature_columns_code=feature_columns_code,\n model_params_code_map=model_params,\n model_repo_image="",\n or

iginal_sql='''\nSELECT * FROM sql_flow_test.activepower_train\nTO TRAIN sqlflow_models.DeepEmbeddingClusterModel\nWITH\n model.n_clusters=3,\n model.pretrain_epochs=10,\n model.train_max_iters=800,\n model.train_lr=0.01,\n model.pretrain_lr=1,\n train.batch_size=256\nCOLUMN m1,m2,m3,m4,m5,m6,m7,m8,m9,m10,m11,m12,m13,m14,m15,m16,m17,m18,m19,m20,m21,m22,m23,m24,m25,m26,m27,m28,m29,m30,m31,m32,m33,m34,m35,m36,m37,m38,m39,m40,m41,m42,m43,m44,m45,m46,m47,m48\nINTO sqlflow_models.my_activepower_train_model;\n''',\n feature_column_names_map=feature_column_names_map)\n\n==========Output==========\nTraceback (most recent call last):\n File "", line 823, in \n File "/opt/sqlflow/python/runtime/tensorflow/train.py", line 116, in train\n load_pretrained_model, model_meta, is_pai)\n File "/opt/sqlflow/python/runtime/tensorflow/train_keras.py", line 144, in keras_train_and_save_legacy\n validation_steps, has_none_optimizer)\n File "/opt/sqlflow/python/runtime/tensorflow/train_keras.py", line 155, in keras_train_compiled\n classifier.sqlflow_train_loop(train_dataset)\n File "/usr/lib/python3.6/site-packages/sqlflow_models/deep_embedding_cluster.py", line 246, in sqlflow_train_loop\n self.pre_train(x)\n File "/usr/lib/python3.6/site-packages/sqlflow_models/deep_embedding_cluster.py", line 153, in pre_train\n y = x.cache().map(map_func=_concate_generate)\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 1900, in map\n MapDataset(self, map_func, preserve_cardinality=False))\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 3416, in init\n use_legacy_function=use_legacy_function)\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2695, in init\n self._function = wrapper_fn._get_concrete_function_internal()\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1854, in _get_concrete_function_internal\n *args, **kwargs)\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1848, in _get_concrete_function_internal_garbage_collected\n graph_function, _, _ = self._maybe_define_function(args, kwargs)\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2150, in _maybe_define_function\n graph_function = self._create_graph_function(args, kwargs)\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2041, in _create_graph_function\n capture_by_value=self._capture_by_value),\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 915, in func_graph_from_py_func\n func_outputs = python_func(*func_args, **func_kwargs)\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2689, in wrapper_fn\n ret = _wrapper_helper(*args)\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2634, in _wrapper_helper\n ret = autograph.tf_convert(func, ag_ctx)(*nested_args)\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper\n raise e.ag_error_metadata.to_exception(e)\nAttributeError: in converted code:\n\n /usr/lib/python3.6/site-packages/sqlflow_models/deep_embedding_cluster.py:150 _concate_generate *\n concate_y = tf.stack([dataset_element[feature.key] for feature in self._feature_columns], axis=1)\n\n AttributeError: 'EmbeddingColumn' object has no attribute 'key'\n\n" >

workflow step failed: runSQLProgram error: failed: exit status 1

==========Generated Code:==========

-- coding: utf-8 --

import copy

import traceback

import tensorflow as tf

import runtime

from runtime.tensorflow.train import train

from runtime.tensorflow.get_tf_version import tf_is_version2

from tensorflow.estimator import (DNNClassifier,

                              DNNRegressor,

                              LinearClassifier,

                              LinearRegressor,

                              BoostedTreesClassifier,

                              BoostedTreesRegressor,

                              DNNLinearCombinedClassifier,

                              DNNLinearCombinedRegressor)

if tf_is_version2():

from tensorflow.keras.optimizers import Adadelta, Adagrad, Adam, Adamax, Ftrl, Nadam, RMSprop, SGD

from tensorflow.keras.losses import BinaryCrossentropy, CategoricalCrossentropy, CategoricalHinge, CosineSimilarity, Hinge, Huber, KLDivergence, LogCosh, MeanAbsoluteError, MeanAbsolutePercentageError, MeanSquaredError, MeanSquaredLogarithmicError, Poisson, SparseCategoricalCrossentropy, SquaredHinge

else:

from tensorflow.train import AdadeltaOptimizer, AdagradOptimizer, AdamOptimizer, FtrlOptimizer, RMSPropOptimizer, GradientDescentOptimizer, MomentumOptimizer

from tensorflow.keras.losses import BinaryCrossentropy, CategoricalCrossentropy, CategoricalHinge, CosineSimilarity, Hinge, Huber, KLDivergence, LogCosh, MeanAbsoluteError, MeanAbsolutePercentageError, MeanSquaredError, MeanSquaredLogarithmicError, Poisson, SparseCategoricalCrossentropy, SquaredHinge

try:

import sqlflow_models

except Exception as e:

print("failed to import sqlflow_models: %s", e)

traceback.print_exc()

feature_column_names = [

"dates",

"m1",

"m2",

"m3",

"m4",

"m5",

"m6",

"m7",

"m8",

"m9",

"m10",

"m11",

"m12",

"m13",

"m14",

"m15",

"m16",

"m17",

"m18",

"m19",

"m20",

"m21",

"m22",

"m23",

"m24",

"m25",

"m26",

"m27",

"m28",

"m29",

"m30",

"m31",

"m32",

"m33",

"m34",

"m35",

"m36",

"m37",

"m38",

"m39",

"m40",

"m41",

"m42",

"m43",

"m44",

"m45",

"m46",

"m47",

"m48",

"class",

]

feature_column_names_map is used to determine the order of feature columns of each target:

e.g. when using DNNLinearCombinedClassifer.

feature_column_names_map will be saved to a single file when using PAI.

feature_column_names_map = dict()

feature_column_names_map["feature_columns"] = ["dates","m1","m2","m3","m4","m5","m6","m7","m8","m9","m10","m11","m12","m13","m14","m15","m16","m17","m18","m19","m20","m21","m22","m23","m24","m25","m26","m27","m28","m29","m30","m31","m32","m33","m34","m35","m36","m37","m38","m39","m40","m41","m42","m43","m44","m45","m46","m47","m48","class",]

feature_metas = dict()

feature_metas["dates"] = {

"feature_name": "dates",

"dtype": "string",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m1"] = {

"feature_name": "m1",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m2"] = {

"feature_name": "m2",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m3"] = {

"feature_name": "m3",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m4"] = {

"feature_name": "m4",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m5"] = {

"feature_name": "m5",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m6"] = {

"feature_name": "m6",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m7"] = {

"feature_name": "m7",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m8"] = {

"feature_name": "m8",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m9"] = {

"feature_name": "m9",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m10"] = {

"feature_name": "m10",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m11"] = {

"feature_name": "m11",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m12"] = {

"feature_name": "m12",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m13"] = {

"feature_name": "m13",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m14"] = {

"feature_name": "m14",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m15"] = {

"feature_name": "m15",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m16"] = {

"feature_name": "m16",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m17"] = {

"feature_name": "m17",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m18"] = {

"feature_name": "m18",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m19"] = {

"feature_name": "m19",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m20"] = {

"feature_name": "m20",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m21"] = {

"feature_name": "m21",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m22"] = {

"feature_name": "m22",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m23"] = {

"feature_name": "m23",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m24"] = {

"feature_name": "m24",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m25"] = {

"feature_name": "m25",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m26"] = {

"feature_name": "m26",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m27"] = {

"feature_name": "m27",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m28"] = {

"feature_name": "m28",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m29"] = {

"feature_name": "m29",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m30"] = {

"feature_name": "m30",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m31"] = {

"feature_name": "m31",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m32"] = {

"feature_name": "m32",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m33"] = {

"feature_name": "m33",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m34"] = {

"feature_name": "m34",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m35"] = {

"feature_name": "m35",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m36"] = {

"feature_name": "m36",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m37"] = {

"feature_name": "m37",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m38"] = {

"feature_name": "m38",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m39"] = {

"feature_name": "m39",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m40"] = {

"feature_name": "m40",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m41"] = {

"feature_name": "m41",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m42"] = {

"feature_name": "m42",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m43"] = {

"feature_name": "m43",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m44"] = {

"feature_name": "m44",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m45"] = {

"feature_name": "m45",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m46"] = {

"feature_name": "m46",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m47"] = {

"feature_name": "m47",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["m48"] = {

"feature_name": "m48",

"dtype": "float32",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

feature_metas["class"] = {

"feature_name": "class",

"dtype": "int64",

"delimiter": "",

"format": "",

"shape": [1],

"is_sparse": "false" == "true",

"dtype_weight": "int64",

"delimiter_kv": ""

}

label_meta = {

"feature_name": "",

"dtype": "int64",

"delimiter": "",

"shape": [1],

"is_sparse": "false" == "true"

}

model_params=dict()

model_params["n_clusters"]=3

model_params["pretrain_epochs"]=10

model_params["pretrain_lr"]=1

model_params["train_lr"]=0.010000

model_params["train_max_iters"]=800

Construct optimizer objects to pass to model initializer.

The original model_params is serializable (do not have tf.xxx objects).

model_params_constructed = copy.deepcopy(model_params)

for optimizer_arg in ["optimizer", "dnn_optimizer", "linear_optimizer"]:

if optimizer_arg in model_params_constructed:

    model_params_constructed[optimizer_arg] = eval(model_params_constructed[optimizer_arg])

if "loss" in model_params_constructed:

model_params_constructed["loss"] = eval(model_params_constructed["loss"])

feature_columns_code will be used to save the training informations together

with the saved model.

feature_columns_code = """{"feature_columns": [tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_vocabulary_list(key="dates", vocabulary_list=["2/26","3/5","3/17","3/23","4/10","5/5","1/13","1/20","2/24","4/24","6/23","1/6","1/9","6/3","2/10","3/16","2/18","4/11","4/22","5/3","6/22","1/10","1/30","6/4","6/11","6/17","6/26","1/28","5/20","5/21","2/11","3/1","3/8","4/26","5/6","5/9","1/21","2/4","5/17","4/21","4/25","5/29","6/27","2/23","3/3","4/18","4/23","5/12","2/2","3/14","3/22","4/2","4/20","5/8","5/23","5/25","1/26","1/27","6/1","4/13","1/22","3/13","4/6","4/19","2/7","3/27","4/28","6/2","6/8","1/4","1/25","3/9","3/12","3/21","4/5","6/6","2/15","3/6","2/13","3/29","2/8","2/12","2/28","3/2","3/31","4/16","5/14","1/8","1/17","5/7","5/24","2/21","2/25","3/18","4/3","6/7","6/10","6/29","2/16","3/7","6/14","6/16","6/24","1/2","3/20","4/12","5/31","2/5","4/1","5/22","6/9","6/15","6/19","6/25","6/30","1/16","3/24","3/30","6/28","1/15","1/23","1/31","2/17","3/25","4/7","1/7","1/14","5/30","6/12","3/11","3/28","5/10","1/1","1/3","2/27","3/19","4/14","4/30","5/28","6/18","1/12","1/18","6/20","4/9","5/4","5/11","5/13","5/16","1/29","2/1","2/9","2/20","5/18","5/19","1/5","1/24","5/26","6/21","3/15","5/15","4/8","4/15","5/27","1/11","4/4","2/19","3/4","3/26","4/27","4/29","6/13","1/19","2/6","3/10","4/17","5/2","2/14","2/22","6/5","2/3","5/1"]), dimension=128, combiner="sum"),

tf.feature_column.numeric_column("m1", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m2", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m3", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m4", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m5", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m6", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m7", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m8", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m9", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m10", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m11", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m12", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m13", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m14", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m15", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m16", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m17", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m18", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m19", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m20", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m21", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m22", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m23", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m24", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m25", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m26", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m27", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m28", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m29", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m30", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m31", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m32", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m33", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m34", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m35", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m36", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m37", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m38", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m39", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m40", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m41", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m42", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m43", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m44", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m45", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m46", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m47", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m48", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("class", shape=[1], dtype=tf.dtypes.int64)]}"""

feature_columns = eval(feature_columns_code)

train_max_steps = 0

train_max_steps = None if train_max_steps == 0 else train_max_steps

train(datasource="mysql://yuepf:yuepf123456@tcp(192.195.253.130:3306)/?maxAllowedPacket=0",

  estimator_string="""sqlflow_models.DeepEmbeddingClusterModel""",

  select="""

SELECT * FROM sql_flow_test.activepower_train

""",

  validation_select="""""",

  feature_columns=feature_columns,

  feature_column_names=feature_column_names,

  feature_metas=feature_metas,

  label_meta=label_meta,

  model_params=model_params_constructed,

  validation_metrics="Accuracy".split(","),

  save="model_save",

  batch_size=256,

  epoch=1,

  validation_steps=1,

  verbose=0,

  max_steps=train_max_steps,

  validation_start_delay_secs=0,

  validation_throttle_secs=0,

  save_checkpoints_steps=100,

  log_every_n_iter=10,

  load_pretrained_model="false" == "true",

  is_pai="false" == "true",

  pai_table="",

  pai_val_table="",

  feature_columns_code=feature_columns_code,

  model_params_code_map=model_params,

  model_repo_image="",

  original_sql='''

SELECT * FROM sql_flow_test.activepower_train

TO TRAIN sqlflow_models.DeepEmbeddingClusterModel

WITH

model.n_clusters=3,

model.pretrain_epochs=10,

model.train_max_iters=800,

model.train_lr=0.01,

model.pretrain_lr=1,

train.batch_size=256

COLUMN m1,m2,m3,m4,m5,m6,m7,m8,m9,m10,m11,m12,m13,m14,m15,m16,m17,m18,m19,m20,m21,m22,m23,m24,m25,m26,m27,m28,m29,m30,m31,m32,m33,m34,m35,m36,m37,m38,m39,m40,m41,m42,m43,m44,m45,m46,m47,m48

INTO sqlflow_models.my_activepower_train_model;

''',

  feature_column_names_map=feature_column_names_map)

==========Output==========

Traceback (most recent call last):

File "", line 823, in

File "/opt/sqlflow/python/runtime/tensorflow/train.py", line 116, in train

load_pretrained_model, model_meta, is_pai)

File "/opt/sqlflow/python/runtime/tensorflow/train_keras.py", line 144, in keras_train_and_save_legacy

validation_steps, has_none_optimizer)

File "/opt/sqlflow/python/runtime/tensorflow/train_keras.py", line 155, in keras_train_compiled

classifier.sqlflow_train_loop(train_dataset)

File "/usr/lib/python3.6/site-packages/sqlflow_models/deep_embedding_cluster.py", line 246, in sqlflow_train_loop

self.pre_train(x)

File "/usr/lib/python3.6/site-packages/sqlflow_models/deep_embedding_cluster.py", line 153, in pre_train

y = x.cache().map(map_func=_concate_generate)

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 1900, in map

MapDataset(self, map_func, preserve_cardinality=False))

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 3416, in init

use_legacy_function=use_legacy_function)

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2695, in init

self._function = wrapper_fn._get_concrete_function_internal()

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1854, in _get_concrete_function_internal

*args, **kwargs)

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1848, in _get_concrete_function_internal_garbage_collected

graph_function, _, _ = self._maybe_define_function(args, kwargs)

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2150, in _maybe_define_function

graph_function = self._create_graph_function(args, kwargs)

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2041, in _create_graph_function

capture_by_value=self._capture_by_value),

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 915, in func_graph_from_py_func

func_outputs = python_func(*func_args, **func_kwargs)

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2689, in wrapper_fn

ret = _wrapper_helper(*args)

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2634, in _wrapper_helper

ret = autograph.tf_convert(func, ag_ctx)(*nested_args)

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper

raise e.ag_error_metadata.to_exception(e)

AttributeError: in converted code:

/usr/lib/python3.6/site-packages/sqlflow_models/deep_embedding_cluster.py:150 _concate_generate  *

    concate_y = tf.stack([dataset_element[feature.key] for feature in self._feature_columns], axis=1)

AttributeError: 'EmbeddingColumn' object has no attribute 'key'`

Environment (Please complete the following information):

OS:Windos 10
Browser:Chrome
docker destop
Docker Engine : v20.10.6
Kubernetes: v1.19.7

[Proposal] Design proposal for adding graph data to SQLFlow database
Is your feature request related to a problem? Please describe. Hi, I'm thinking about bringing a new type of training data (graph data) to the SQLFlow database. This is significant since many real world data are non-euclidean such as graphs, and people who uses SQLFlow may encounter such data in their tasks. Deep learning models such as GCN and GAT are powerful to solve graph related problems, and they would be helpful if we include them in the library in the future. However, before we bring these models to the SQLFlow, it would be convenient to have a pre-load graph dataset such as cora in the SQLFlow database so that these models can be trained and tested easily.

This is a rough idea, and the following are some my thoughts on the solutions. It would be good if we could discuss it a bit and any suggestions are appreciated!

Describe the solution you'd like

Part I. Database schema

If we want to use graph related DL models to solve real world problems, there are two things that need to be provided: features, which are the information contained within each node in the graph, and adjacency matrix which represents the graph structure in the format of a matrix (this could be calculated by an edge list). features should be a 2-D tensor with shape (N,D) where N is the number of nodes and D is the dimension of each node's feature vector. adjacency matrix is a 2-D sparse tensor with shape (N,N).

Thus, I'm considering to have two tables in the database, which would be enough to maintain all the information we need in a graph.

Node Table: store the information of each node in one table.

Node Table id | name | features | label ----------------------------------------- 105 | node1 | "0 0 1" or [0, 0, 1] | "L1" 106 | node2 | "0 1 0" or [0, 1, 0] | "L2"

The features may be in the form of arrays or vectors, and I guess storing them with type TEXT or JSON would be efficient.

Edge Table: store the graph structures in the form of edges in one table.

Edge Table id | from_node_id | to_node_id | weight --------------------------------------- 1 | 105 | 106 | 1.0 2 | 106 | 105 | 2.5

From my perspective, these two tables are efficient and powerful to handle most of the graph data. If you find some corner cases that make this design vulnerable, please comment below.

Part II. Loading data

(I'm not familiar with how SQLFlow pass data into Python, so I skip the process of getting data from the two tables above.) The difficult part of loading data is to build the adjacency matrix of the graph. Here are two solutions that I find to be good:

Scipy: Use the Scipy package to get the adjacency matrix. If we have a edge vector (or list of lists) edges with shape (E, 2) where E is the number of edges, we could build the adjacency matrix using following python script:

import numpy as np import scipy as sp # coo_matrix((data, (i, j)), [shape=(M, N)]) adjacency = sp.coo_matrix((np.ones(len(edges)), (edges[:, 0], edges[:, 1])), shape=(features.shape[0], features.shape[0]), dtype="int64")

features.shape[0] is the number of nodes (N) and the adjacency matrix adjacency has shape (N, N). The adjacency matrix is a Scipy sparse matrix in the format COO (COO is a fast format for constructing sparse matrices).

NetworkX: Use the NetworkX package to generate a graph, and then build the adjacency matrix automatically. This process can be done with following python script:

import networkx as nx import scipy as sp # use cora dataset as an example, and we are creating undirected graph. Use nx.DiGraph() for directed graph. g = nx.Graph(edges) # edges must be list of lists, and each sublist (A,B) represents the edge between node A and B. adjacency = nx.adjacency_matrix(G)

adjacency is a Scipy sparse matrix, we could convert it to the COO format by adjacency.tocoo() which will get the adjacency matrix same as above. NetworkX is a brilliant python library that can do graph and network analysis, and it has many other useful tools we may use in the future development. We could decide which method we should apply for loading the data from the database.

Additional Notes Here are some details about the dataset that I would love to add to the database:

cora dataset: The Cora dataset is about a citation network of scientific papers. It consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.

If there is anything that you find valuable or needs to be improved, please let me know.Thanks!
Rethink syntax extension
During our efforts to deploy SQLFlow in some real cases, we tried to extend the syntax of some SQL dialects, including MySQL and Hive. In these efforts, we learned an art:

Try reusing existing SQL reserved words in the extended syntax.

Why this art? Let us look at an example in https://github.com/sql-machine-learning/sqlflow/issues/473. Because TRAIN isn't a reserved word, users could name a field by TRAIN, and this might confuse our parser. The solution is to either replace TRAIN by a reserved word, where we couldn't find one that expresses the meaning of "train", or, extend TRAIN into TO TRAIN, where, please be aware that TO is a reserved word.

It seems that other clause extensions after TO TRAIN, TO PREDICT, and TO EXPLAIN can use arbitrary words, as they are not going to be parsed by a SQL (dialect) parser, but the SQLFlow parser. However, it is not that simple. Consider that a table might have a field named label, and this field happens to be a label when we train a model. The SQL statement would look like

SELECT a, b, label FROM tbl TO TRAIN Model LABEL label

This would confuse the parser. Our current workaround is this. However, a complete solutions seems that we don't use OUTPUT to replace LABEL. OUTPUT is a SQL keyword, so users are not supposed to name a field by OUTPUT. (This might not be true always, but, at least, it seems that the probability of OUTPUT output is smaller than that of LABEL label.)

Please vote for the following syntax changes:
Start to jupyter iris-dnn examples, then encounter this problem

Description When I start to open a web browser, open iris-dnn.ipynb file and run the code... %%sqlflow describe iris.train;

But I encounter this problem: /miniconda/envs/sqlflow-dev/lib/python3.6/site-packages/grpc/_channel.py in _next(self) 363 raise StopIteration() 364 else: --> 365 raise self 366 367 def _response_ready():

_Rendezvous: <_Rendezvous of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1572500632.931119638","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3876,"referenced_errors":[{"created":"@1572500571.009073496","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":395,"grpc_status":14}]}"

Who can help me? Thanks reaaaally much!
Switch the connection parameters while sqlflow is running

Is your feature request related to a problem? Please describe. In the production environment, we can't specify the data source connection parameters when SQLFlow starts. Instead, we know the connection parameters only after the user logs on. Therefore, we should support this connection parameter change after the startup is complete.
Proposal: Managing external & internal parsers
Problem

SQLFlow calls Java HiveQL/ODPS parser to help to parse the SQL program. As shown in the following dependency graph, each parser is wrapped as a gRPC server which takes a string and returns the parsed result.

However, some parsers like ODPS parser can't be open-sourced. This leads to the circular dependencies between the internal code and open-sourced code, i.e. the open-sourced Java gRPC server needs to call ODPS parser while the close-sourced ODPS parser needs to return ParseResult.

Solution

We remove the dependencies from gRPC server to ODPS parser via dynamic loading. To be specific, the open-sourced code creates ODPS parser instance by the following.

Object parser = Class.forName("org.sqlflow.parser.internal.odpsParser....").newInstance(); ((ParserInterface)parser).parse(sql);

By doing so, we only have a one-way dependency from internal repo to the GitHub repo.

Implementation Details

GitHub repo:

Each parser should be an extension of a ParserInterface.

interface ParseInterface { ParseResult parse(String sql); }

Internal repo:

The building process starts from building the GitHub repo into a .jar file then add it to the Maven project.

The deployment of the internal .jar file should be the same as the open-sourced .jar file, since they share the same entry point.

cc @typhoonzero @weiguoz
SQLFlow Product Roadmap 2019

I am trying to summarize the milestones as follows. @sql-machine-learning/sqlflow team please comment.

Engineering time: 6 fulltime months could mean 3 people spend 2 months full time working on the project.

| Release | ETA | Features | Engineering Time | |---------|-------|--------------------------------------------------------|---------------------| | Alpha | 04/20 | ~~Open source SQLFlow Core~~ | 10+ fulltime months | | | 05/20 | ~~Support customized model with Keras~~ | 1+ fulltime months | | | 05/20 | ~~Open source Go Hive driver~~ | 3+ fulltime months | | | 06/20 | ~~Open source Go MaxCompute driver~~ | 3+ fulltime months | | 0.1.0 | 07/01 | Hive: local training & prediction | 2+ fulltime months | | | 07/20 | Support distributed training & prediction | 3+ fulltime months | | 0.2.0 | 08/01 | MaxCompute: local training & predicting | 3+ fulltime months | | 0.3.0 | 10/15 | Elastic scheduling training & prediction | 3+ fulltime months | | 0.4.0 | 11/20 | SQLFlow cloud release on AliCloud | 2+ fulltime months | | | | | | | | ? | Support third party submitter | | | | ? | Support GPU or TensorFlow-GPU | | | | ? | ML on image/audio/video | | | | ? | Calcite parser | | | | ? | Open source gosparksql | | | | ? | SparkSQL as data source: local training & predicting | | | | ? | SQL Server as data source: local training & predicting | |
Make SQLFlow parser work together with Apache Calcite
Currently, we use sql/sql.y, which, compiles into sql/parser.go, to parse a SQL statement. This parser has some limitations:

It understands our extended SQL syntax for the SELECT statement, for example,

SELECT * FROM a_table TRAIN DNNClassifier INTO a_model;

but cannot understand any other SQL statements, for example, USE my_database;

It understands the very basic syntax of standard SELECT, for example, it doesn't allow us to write a nested SELECT with the TRAIN and INTO suffix.

To make a parser that understands SQL completely and the extended SELECT statement, I am considering a hybrid of Calcite and our parser. The following example might help to explain how the hybrid parser works.

Throw the above example statementSELECT ... TRAIN ... or even a more complex one like SELECT ... FROM (SELECT ...) TRAIN ..., to Calcite parser. The Calcite parser should error because it doesn't understand TRAIN. If it happens that the error message contains information about the location of TRAIN in the input string, then we can split the input string at the location, and the left part is the SELECT or nested-SELECT, and the right part consists of TRAIN ....

Throw the left part to Calcite parser again. This time the parser should be OK with it.

Throw the right part to our parser (modified to ignore the SELECT ... prefix), and it should be able to parse the TRAIN clause and provides information for SQLFlow code generator to create a submitter.py, which takes the left part that passed the syntax check of Calcite. The submitter then sends the left part to the SQL engine (MySQL or Hive) and reads the output for training or prediction.
Determine the ordering of records for a SELECT statement in MySQL
In SQL, a select statement does not guarantee consistent ordering between runs.

Say a user uses

SELECT * FROM iris.iris PREDICT iris.predict.class USING my_dnn_model;

to predict class and store the prediction into table iris.predict, he/she can not directly compare it with ground truth iris.iris.class because the row ordering can be different.

So in order to produce useful prediction, we need to figure out a way to solve the correspondence.

ref: SQL doesn't guarantee ordering
Potential issue when running tests concurrently
Currently main_test.go runs end2end tests by first populating the data to database. This would normally work well for local DB inside docker container. However, for tests related to ODPS/Maxcompute, we are using a real database using SQL in iris_sql.go which looks like the following:

DROP TABLE IF EXISTS gomaxcompute_driver_w7u.sqlflow_test_iris_train; CREATE TABLE gomaxcompute_driver_w7u.sqlflow_test_iris_train ...

Since the table name is fixed, if the multiple test runs are executed concurrently, there's a chance that either the table is being dropped or being created, which might fail other test runs that rely on this particular table. I've seen this happen several times when developing locally or testing on Travis CI.
Data sharding problem in distributed XGBoost training using rabit
I am not sure how XGBoost distributed training using rabit shards input data. Supposing that the whole dataset is D, there are 2 usually methods to shard input data in distributed training:

Method 1: we shard dataset D beforehand into N pieces, where N is the number of workers. Then, we distribute each piece of dataset to each worker. In this way, each worker reads unique data.

Method 2: all workers can read the whole dataset D, and the each worker only pick up 1/N data inside the whole dataset D, say, what tf.data.Dataset.shard does.

In the doc of XGBoost and rabit, I have not found whether method 1 or 2 is used. But in the implementation of XGBoost, I found that maybe XGBoost uses method 2.

Link to my issue of XGBoost repo: https://github.com/dmlc/xgboost/issues/5694

File '/tmp/sqlflow227199809/input.sql' cannot be read

Description It seems that SQLFlow cannot read temporary file saved on server.

Reproduction Steps After restart the kernel of notebook, the following exception will be thrown out.

_Rendezvous: <_Rendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "thirdPartyParse failed: java.io.IOException: File '/tmp/sqlflow227199809/input.sql' cannot be read
	at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:294)
	at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1851)
	at org.sqlflow.parser.ParserAdaptorCmd.main(ParserAdaptorCmd.java:42)
 exit status 255"
	debug_error_string = "{"created":"@1574340457.693353044","description":"Error received from peer ipv4:10.82.128.7:8005","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"thirdPartyParse failed: java.io.IOException: File '/tmp/sqlflow227199809/input.sql' cannot be read\n\tat org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:294)\n\tat org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1851)\n\tat org.sqlflow.parser.ParserAdaptorCmd.main(ParserAdaptorCmd.java:42)\n exit status 255","grpc_status":2}"
>

Expected Behavior What you expected to happen.

Screenshots

OS:
Browser:
Version:

Additional Notes

CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.
Add SECURITY.md

Hello 👋

I run a security community that finds and fixes vulnerabilities in OSS. A researcher (@whokilleddb) has found a potential issue, which I would be eager to share with you.

Could you add a SECURITY.md file with an e-mail address for me to send further details to? GitHub recommends a security policy to ensure issues are responsibly disclosed, and it would help direct researchers in the future.

Looking forward to hearing from you 👍

(cc @huntr-helper)
The link provided by the documentation is unable to recogniz
document URL : https://github.com/sql-machine-learning/sqlflow/blob/develop/doc/run/workflow_mode.md step： Setup Kubernetes and Argo

kubectl apply -n argo -f https://raw.githubusercontent.com/argoproj/argo/stable/manifests/install.yaml

this URL unable to recognize，Need to update to

kubectl apply -n argo -f https://raw.githubusercontent.com/argoproj/argo-workflows/v3.0.3/manifests/install.yaml
run activepower clustering demo in tutorial recieve ' AttributeError: 'EmbeddingColumn' object has no attribute 'key''
Description run activepower clustering demo in tutorial recieve ' AttributeError: 'EmbeddingColumn' object has no attribute 'key''

demo in tutorial activepower_clustering

I set mysql database locally with sql script

so I have the same data with demo.

then run: %%sqlflow SELECT * FROM sql_flow_test.activepower_train TO TRAIN sqlflow_models.DeepEmbeddingClusterModel WITH model.n_clusters=3, model.pretrain_epochs=10, model.train_max_iters=800, model.train_lr=0.01, model.pretrain_lr=1, train.batch_size=256 COLUMN m1,m2,m3,m4,m5,m6,m7,m8,m9,m10,m11,m12,m13,m14,m15,m16,m17,m18,m19,m20,m21,m22,m23,m24,m25,m26,m27,m28,m29,m30,m31,m32,m33,m34,m35,m36,m37,m38,m39,m40,m41,m42,m43,m44,m45,m46,m47,m48 INTO sqlflow_models.my_activepower_train_model;

run activepower clustering demo in tutorial recieve ' AttributeError: 'EmbeddingColumn' object has no attribute 'key''

==== `2021/05/20 10:02:13 SQLFlow Step Execute:

SELECT * FROM sql_flow_test.activepower_train

TO TRAIN sqlflow_models.DeepEmbeddingClusterModel

WITH

model.n_clusters=3,

model.pretrain_epochs=10,

model.train_max_iters=800,

model.train_lr=0.01,

model.pretrain_lr=1,

train.batch_size=256

COLUMN m1,m2,m3,m4,m5,m6,m7,m8,m9,m10,m11,m12,m13,m14,m15,m16,m17,m18,m19,m20,m21,m22,m23,m24,m25,m26,m27,m28,m29,m30,m31,m32,m33,m34,m35,m36,m37,m38,m39,m40,m41,m42,m43,m44,m45,m46,m47,m48

INTO sqlflow_models.my_activepower_train_model;

Start training using keras model...

2021-05-20 10:02:38.341380 Start pre_train.

2021-05-20 10:02:38.341461 Start preparing training dataset to save into memory.

message:<message:"runSQLProgram error: failed: exit status 1\n==========Generated Code:==========\n# -- coding: utf-8 --\nimport copy\nimport traceback\nimport tensorflow as tf\nimport runtime\nfrom runtime.tensorflow.train import train\nfrom runtime.tensorflow.get_tf_version import tf_is_version2\nfrom tensorflow.estimator import (DNNClassifier,\n DNNRegressor,\n LinearClassifier,\n LinearRegressor,\n BoostedTreesClassifier,\n BoostedTreesRegressor,\n DNNLinearCombinedClassifier,\n DNNLinearCombinedRegressor)\nif tf_is_version2():\n from tensorflow.keras.optimizers import Adadelta, Adagrad, Adam, Adamax, Ftrl, Nadam, RMSprop, SGD\n from tensorflow.keras.losses import BinaryCrossentropy, CategoricalCrossentropy, CategoricalHinge, CosineSimilarity, Hinge, Huber, KLDivergence, LogCosh, MeanAbsoluteError, MeanAbsolutePercentageError, MeanSquaredError, MeanSquaredLogarithmicError, Poisson, SparseCategoricalCrossentropy, SquaredHinge\nelse:\n from tensorflow.train import AdadeltaOptimizer, AdagradOptimizer, AdamOptimizer, FtrlOptimizer, RMSPropOptimizer, GradientDescentOptimizer, MomentumOptimizer\n from tensorflow.keras.losses import BinaryCrossentropy, CategoricalCrossentropy, CategoricalHinge, CosineSimilarity, Hinge, Huber, KLDivergence, LogCosh, MeanAbsoluteError, MeanAbsolutePercentageError, MeanSquaredError, MeanSquaredLogarithmicError, Poisson, SparseCategoricalCrossentropy, SquaredHinge\ntry:\n import sqlflow_models\nexcept Exception as e:\n print("failed to import sqlflow_models: %s", e)\n traceback.print_exc()\n\nfeature_column_names = [\n"dates",\n\n"m1",\n\n"m2",\n\n"m3",\n\n"m4",\n\n"m5",\n\n"m6",\n\n"m7",\n\n"m8",\n\n"m9",\n\n"m10",\n\n"m11",\n\n"m12",\n\n"m13",\n\n"m14",\n\n"m15",\n\n"m16",\n\n"m17",\n\n"m18",\n\n"m19",\n\n"m20",\n\n"m21",\n\n"m22",\n\n"m23",\n\n"m24",\n\n"m25",\n\n"m26",\n\n"m27",\n\n"m28",\n\n"m29",\n\n"m30",\n\n"m31",\n\n"m32",\n\n"m33",\n\n"m34",\n\n"m35",\n\n"m36",\n\n"m37",\n\n"m38",\n\n"m39",\n\n"m40",\n\n"m41",\n\n"m42",\n\n"m43",\n\n"m44",\n\n"m45",\n\n"m46",\n\n"m47",\n\n"m48",\n\n"class",\n]\n\n# feature_column_names_map is used to determine the order of feature columns of each target:\n# e.g. when using DNNLinearCombinedClassifer.\n# feature_column_names_map will be saved to a single file when using PAI.\nfeature_column_names_map = dict()\n\nfeature_column_names_map["feature_columns"] = ["dates","m1","m2","m3","m4","m5","m6","m7","m8","m9","m10","m11","m12","m13","m14","m15","m16","m17","m18","m19","m20","m21","m22","m23","m24","m25","m26","m27","m28","m29","m30","m31","m32","m33","m34","m35","m36","m37","m38","m39","m40","m41","m42","m43","m44","m45","m46","m47","m48","class",]\n\n\nfeature_metas = dict()\n\n\nfeature_metas["dates"] = {\n "feature_name": "dates",\n "dtype": "string",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m1"] = {\n "feature_name": "m1",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m2"] = {\n "feature_name": "m2",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m3"] = {\n "feature_name": "m3",\n "dtype": "float32",\n "delimiter": "",\n "format": ""

,\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m4"] = {\n "feature_name": "m4",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m5"] = {\n "feature_name": "m5",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m6"] = {\n "feature_name": "m6",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m7"] = {\n "feature_name": "m7",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m8"] = {\n "feature_name": "m8",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m9"] = {\n "feature_name": "m9",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m10"] = {\n "feature_name": "m10",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m11"] = {\n "feature_name": "m11",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m12"] = {\n "feature_name": "m12",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m13"] = {\n "feature_name": "m13",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m14"] = {\n "feature_name": "m14",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m15"] = {\n "feature_name": "m15",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m16"] = {\n "feature_name": "m16",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m17"] = {\n "feature_name": "m17",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m18"] = {\n "feature_name": "m18",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "

dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m19"] = {\n "feature_name": "m19",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m20"] = {\n "feature_name": "m20",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m21"] = {\n "feature_name": "m21",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m22"] = {\n "feature_name": "m22",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m23"] = {\n "feature_name": "m23",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m24"] = {\n "feature_name": "m24",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m25"] = {\n "feature_name": "m25",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m26"] = {\n "feature_name": "m26",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m27"] = {\n "feature_name": "m27",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m28"] = {\n "feature_name": "m28",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m29"] = {\n "feature_name": "m29",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m30"] = {\n "feature_name": "m30",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m31"] = {\n "feature_name": "m31",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m32"] = {\n "feature_name": "m32",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m33"] = {\n "feature_name": "m33",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\n

feature_metas["m34"] = {\n "feature_name": "m34",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m35"] = {\n "feature_name": "m35",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m36"] = {\n "feature_name": "m36",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m37"] = {\n "feature_name": "m37",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m38"] = {\n "feature_name": "m38",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m39"] = {\n "feature_name": "m39",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m40"] = {\n "feature_name": "m40",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m41"] = {\n "feature_name": "m41",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m42"] = {\n "feature_name": "m42",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m43"] = {\n "feature_name": "m43",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m44"] = {\n "feature_name": "m44",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m45"] = {\n "feature_name": "m45",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m46"] = {\n "feature_name": "m46",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m47"] = {\n "feature_name": "m47",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["m48"] = {\n "feature_name": "m48",\n "dtype": "float32",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\nfeature_metas["class"] = {\n "feature_name": "class"

,\n "dtype": "int64",\n "delimiter": "",\n "format": "",\n "shape": [1],\n "is_sparse": "false" == "true",\n "dtype_weight": "int64",\n "delimiter_kv": ""\n}\n\n\n\nlabel_meta = {\n "feature_name": "",\n "dtype": "int64",\n "delimiter": "",\n "shape": [1],\n "is_sparse": "false" == "true"\n}\n\nmodel_params=dict()\n\nmodel_params["n_clusters"]=3\n\nmodel_params["pretrain_epochs"]=10\n\nmodel_params["pretrain_lr"]=1\n\nmodel_params["train_lr"]=0.010000\n\nmodel_params["train_max_iters"]=800\n\n\n# Construct optimizer objects to pass to model initializer.\n# The original model_params is serializable (do not have tf.xxx objects).\nmodel_params_constructed = copy.deepcopy(model_params)\nfor optimizer_arg in ["optimizer", "dnn_optimizer", "linear_optimizer"]:\n if optimizer_arg in model_params_constructed:\n model_params_constructed[optimizer_arg] = eval(model_params_constructed[optimizer_arg])\n\nif "loss" in model_params_constructed:\n model_params_constructed["loss"] = eval(model_params_constructed["loss"])\n\n# feature_columns_code will be used to save the training informations together\n# with the saved model.\nfeature_columns_code = """{"feature_columns": [tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_vocabulary_list(key="dates", vocabulary_list=["2/26","3/5","3/17","3/23","4/10","5/5","1/13","1/20","2/24","4/24","6/23","1/6","1/9","6/3","2/10","3/16","2/18","4/11","4/22","5/3","6/22","1/10","1/30","6/4","6/11","6/17","6/26","1/28","5/20","5/21","2/11","3/1","3/8","4/26","5/6","5/9","1/21","2/4","5/17","4/21","4/25","5/29","6/27","2/23","3/3","4/18","4/23","5/12","2/2","3/14","3/22","4/2","4/20","5/8","5/23","5/25","1/26","1/27","6/1","4/13","1/22","3/13","4/6","4/19","2/7","3/27","4/28","6/2","6/8","1/4","1/25","3/9","3/12","3/21","4/5","6/6","2/15","3/6","2/13","3/29","2/8","2/12","2/28","3/2","3/31","4/16","5/14","1/8","1/17","5/7","5/24","2/21","2/25","3/18","4/3","6/7","6/10","6/29","2/16","3/7","6/14","6/16","6/24","1/2","3/20","4/12","5/31","2/5","4/1","5/22","6/9","6/15","6/19","6/25","6/30","1/16","3/24","3/30","6/28","1/15","1/23","1/31","2/17","3/25","4/7","1/7","1/14","5/30","6/12","3/11","3/28","5/10","1/1","1/3","2/27","3/19","4/14","4/30","5/28","6/18","1/12","1/18","6/20","4/9","5/4","5/11","5/13","5/16","1/29","2/1","2/9","2/20","5/18","5/19","1/5","1/24","5/26","6/21","3/15","5/15","4/8","4/15","5/27","1/11","4/4","2/19","3/4","3/26","4/27","4/29","6/13","1/19","2/6","3/10","4/17","5/2","2/14","2/22","6/5","2/3","5/1"]), dimension=128, combiner="sum"),\ntf.feature_column.numeric_column("m1", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m2", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m3", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m4", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m5", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m6", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m7", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m8", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m9", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m10", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m11", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m12", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m13", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_c

olumn("m14", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m15", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m16", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m17", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m18", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m19", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m20", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m21", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m22", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m23", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m24", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m25", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m26", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m27", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m28", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m29", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m30", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m31", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m32", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m33", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m34", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m35", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m36", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m37", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m38", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m39", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m40", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m41", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m42", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m43", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m44", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m45", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m46", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m47", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("m48", shape=[1], dtype=tf.dtypes.float32),\ntf.feature_column.numeric_column("class", shape=[1], dtype=tf.dtypes.int64)]}"""\nfeature_columns = eval(feature_columns_code)\n\ntrain_max_steps = 0\ntrain_max_steps = None if train_max_steps == 0 else train_max_steps\n\ntrain(datasource="mysql://yuepf:yuepf123456@tcp(192.195.253.130:3306)/?maxAllowedPacket=0",\n estimator_string="""sqlflow_models.DeepEmbeddingClusterModel""",\n select="""\nSELECT * FROM sql_flow_test.activepower_train\n""",\n validation_select="""""",\n feature_columns=feature_columns,\n feature_column_names=feature_column_names,\n feature_metas=feature_metas,\n label_meta=label_meta,\n model_params=model_params_constructed,\n validation_metrics="Accuracy".split(","),\n save="model_save",\n batch_size=256,\n epoch=1,\n validation_steps=1,\n verbose=0,\n max_steps=train_max_steps,\n validation_start_delay_secs=0,\n validation_throttle_secs=0,\n save_checkpoints_steps=100,\n log_every_n_iter=10,\n load_pretrained_model="false" == "true",\n is_pai="false" == "true",\n pai_table="",\n pai_val_table="",\n feature_columns_code=feature_columns_code,\n model_params_code_map=model_params,\n model_repo_image="",\n or

iginal_sql='''\nSELECT * FROM sql_flow_test.activepower_train\nTO TRAIN sqlflow_models.DeepEmbeddingClusterModel\nWITH\n model.n_clusters=3,\n model.pretrain_epochs=10,\n model.train_max_iters=800,\n model.train_lr=0.01,\n model.pretrain_lr=1,\n train.batch_size=256\nCOLUMN m1,m2,m3,m4,m5,m6,m7,m8,m9,m10,m11,m12,m13,m14,m15,m16,m17,m18,m19,m20,m21,m22,m23,m24,m25,m26,m27,m28,m29,m30,m31,m32,m33,m34,m35,m36,m37,m38,m39,m40,m41,m42,m43,m44,m45,m46,m47,m48\nINTO sqlflow_models.my_activepower_train_model;\n''',\n feature_column_names_map=feature_column_names_map)\n\n==========Output==========\nTraceback (most recent call last):\n File "", line 823, in \n File "/opt/sqlflow/python/runtime/tensorflow/train.py", line 116, in train\n load_pretrained_model, model_meta, is_pai)\n File "/opt/sqlflow/python/runtime/tensorflow/train_keras.py", line 144, in keras_train_and_save_legacy\n validation_steps, has_none_optimizer)\n File "/opt/sqlflow/python/runtime/tensorflow/train_keras.py", line 155, in keras_train_compiled\n classifier.sqlflow_train_loop(train_dataset)\n File "/usr/lib/python3.6/site-packages/sqlflow_models/deep_embedding_cluster.py", line 246, in sqlflow_train_loop\n self.pre_train(x)\n File "/usr/lib/python3.6/site-packages/sqlflow_models/deep_embedding_cluster.py", line 153, in pre_train\n y = x.cache().map(map_func=_concate_generate)\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 1900, in map\n MapDataset(self, map_func, preserve_cardinality=False))\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 3416, in init\n use_legacy_function=use_legacy_function)\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2695, in init\n self._function = wrapper_fn._get_concrete_function_internal()\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1854, in _get_concrete_function_internal\n *args, **kwargs)\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1848, in _get_concrete_function_internal_garbage_collected\n graph_function, _, _ = self._maybe_define_function(args, kwargs)\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2150, in _maybe_define_function\n graph_function = self._create_graph_function(args, kwargs)\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2041, in _create_graph_function\n capture_by_value=self._capture_by_value),\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 915, in func_graph_from_py_func\n func_outputs = python_func(*func_args, **func_kwargs)\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2689, in wrapper_fn\n ret = _wrapper_helper(*args)\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2634, in _wrapper_helper\n ret = autograph.tf_convert(func, ag_ctx)(*nested_args)\n File "/usr/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper\n raise e.ag_error_metadata.to_exception(e)\nAttributeError: in converted code:\n\n /usr/lib/python3.6/site-packages/sqlflow_models/deep_embedding_cluster.py:150 _concate_generate *\n concate_y = tf.stack([dataset_element[feature.key] for feature in self._feature_columns], axis=1)\n\n AttributeError: 'EmbeddingColumn' object has no attribute 'key'\n\n" >

workflow step failed: runSQLProgram error: failed: exit status 1

==========Generated Code:==========

-- coding: utf-8 --

import copy

import traceback

import tensorflow as tf

import runtime

from runtime.tensorflow.train import train

from runtime.tensorflow.get_tf_version import tf_is_version2

from tensorflow.estimator import (DNNClassifier,

DNNRegressor, LinearClassifier, LinearRegressor, BoostedTreesClassifier, BoostedTreesRegressor, DNNLinearCombinedClassifier, DNNLinearCombinedRegressor)

if tf_is_version2():

from tensorflow.keras.optimizers import Adadelta, Adagrad, Adam, Adamax, Ftrl, Nadam, RMSprop, SGD from tensorflow.keras.losses import BinaryCrossentropy, CategoricalCrossentropy, CategoricalHinge, CosineSimilarity, Hinge, Huber, KLDivergence, LogCosh, MeanAbsoluteError, MeanAbsolutePercentageError, MeanSquaredError, MeanSquaredLogarithmicError, Poisson, SparseCategoricalCrossentropy, SquaredHinge

else:

from tensorflow.train import AdadeltaOptimizer, AdagradOptimizer, AdamOptimizer, FtrlOptimizer, RMSPropOptimizer, GradientDescentOptimizer, MomentumOptimizer from tensorflow.keras.losses import BinaryCrossentropy, CategoricalCrossentropy, CategoricalHinge, CosineSimilarity, Hinge, Huber, KLDivergence, LogCosh, MeanAbsoluteError, MeanAbsolutePercentageError, MeanSquaredError, MeanSquaredLogarithmicError, Poisson, SparseCategoricalCrossentropy, SquaredHinge

try:

import sqlflow_models

except Exception as e:

print("failed to import sqlflow_models: %s", e) traceback.print_exc()

feature_column_names = [

"dates",

"m1",

"m2",

"m3",

"m4",

"m5",

"m6",

"m7",

"m8",

"m9",

"m10",

"m11",

"m12",

"m13",

"m14",

"m15",

"m16",

"m17",

"m18",

"m19",

"m20",

"m21",

"m22",

"m23",

"m24",

"m25",

"m26",

"m27",

"m28",

"m29",

"m30",

"m31",

"m32",

"m33",

"m34",

"m35",

"m36",

"m37",

"m38",

"m39",

"m40",

"m41",

"m42",

"m43",

"m44",

"m45",

"m46",

"m47",

"m48",

"class",

]

feature_column_names_map is used to determine the order of feature columns of each target:

e.g. when using DNNLinearCombinedClassifer.

feature_column_names_map will be saved to a single file when using PAI.

feature_column_names_map = dict()

feature_column_names_map["feature_columns"] = ["dates","m1","m2","m3","m4","m5","m6","m7","m8","m9","m10","m11","m12","m13","m14","m15","m16","m17","m18","m19","m20","m21","m22","m23","m24","m25","m26","m27","m28","m29","m30","m31","m32","m33","m34","m35","m36","m37","m38","m39","m40","m41","m42","m43","m44","m45","m46","m47","m48","class",]

feature_metas = dict()

feature_metas["dates"] = {

"feature_name": "dates", "dtype": "string", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m1"] = {

"feature_name": "m1", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m2"] = {

"feature_name": "m2", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m3"] = {

"feature_name": "m3", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m4"] = {

"feature_name": "m4", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m5"] = {

"feature_name": "m5", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m6"] = {

"feature_name": "m6", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m7"] = {

"feature_name": "m7", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m8"] = {

"feature_name": "m8", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m9"] = {

"feature_name": "m9", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m10"] = {

"feature_name": "m10", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m11"] = {

"feature_name": "m11", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m12"] = {

"feature_name": "m12", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m13"] = {

"feature_name": "m13", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m14"] = {

"feature_name": "m14", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m15"] = {

"feature_name": "m15", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m16"] = {

"feature_name": "m16", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m17"] = {

"feature_name": "m17", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m18"] = {

"feature_name": "m18", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m19"] = {

"feature_name": "m19", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m20"] = {

"feature_name": "m20", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m21"] = {

"feature_name": "m21", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m22"] = {

"feature_name": "m22", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m23"] = {

"feature_name": "m23", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m24"] = {

"feature_name": "m24", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m25"] = {

"feature_name": "m25", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m26"] = {

"feature_name": "m26", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m27"] = {

"feature_name": "m27", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m28"] = {

"feature_name": "m28", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m29"] = {

"feature_name": "m29", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m30"] = {

"feature_name": "m30", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m31"] = {

"feature_name": "m31", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m32"] = {

"feature_name": "m32", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m33"] = {

"feature_name": "m33", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m34"] = {

"feature_name": "m34", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m35"] = {

"feature_name": "m35", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m36"] = {

"feature_name": "m36", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m37"] = {

"feature_name": "m37", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m38"] = {

"feature_name": "m38", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m39"] = {

"feature_name": "m39", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m40"] = {

"feature_name": "m40", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m41"] = {

"feature_name": "m41", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m42"] = {

"feature_name": "m42", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m43"] = {

"feature_name": "m43", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m44"] = {

"feature_name": "m44", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m45"] = {

"feature_name": "m45", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m46"] = {

"feature_name": "m46", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m47"] = {

"feature_name": "m47", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["m48"] = {

"feature_name": "m48", "dtype": "float32", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

feature_metas["class"] = {

"feature_name": "class", "dtype": "int64", "delimiter": "", "format": "", "shape": [1], "is_sparse": "false" == "true", "dtype_weight": "int64", "delimiter_kv": ""

}

label_meta = {

"feature_name": "", "dtype": "int64", "delimiter": "", "shape": [1], "is_sparse": "false" == "true"

}

model_params=dict()

model_params["n_clusters"]=3

model_params["pretrain_epochs"]=10

model_params["pretrain_lr"]=1

model_params["train_lr"]=0.010000

model_params["train_max_iters"]=800

Construct optimizer objects to pass to model initializer.

The original model_params is serializable (do not have tf.xxx objects).

model_params_constructed = copy.deepcopy(model_params)

for optimizer_arg in ["optimizer", "dnn_optimizer", "linear_optimizer"]:

if optimizer_arg in model_params_constructed: model_params_constructed[optimizer_arg] = eval(model_params_constructed[optimizer_arg])

if "loss" in model_params_constructed:

model_params_constructed["loss"] = eval(model_params_constructed["loss"])

feature_columns_code will be used to save the training informations together

with the saved model.

feature_columns_code = """{"feature_columns": [tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_vocabulary_list(key="dates", vocabulary_list=["2/26","3/5","3/17","3/23","4/10","5/5","1/13","1/20","2/24","4/24","6/23","1/6","1/9","6/3","2/10","3/16","2/18","4/11","4/22","5/3","6/22","1/10","1/30","6/4","6/11","6/17","6/26","1/28","5/20","5/21","2/11","3/1","3/8","4/26","5/6","5/9","1/21","2/4","5/17","4/21","4/25","5/29","6/27","2/23","3/3","4/18","4/23","5/12","2/2","3/14","3/22","4/2","4/20","5/8","5/23","5/25","1/26","1/27","6/1","4/13","1/22","3/13","4/6","4/19","2/7","3/27","4/28","6/2","6/8","1/4","1/25","3/9","3/12","3/21","4/5","6/6","2/15","3/6","2/13","3/29","2/8","2/12","2/28","3/2","3/31","4/16","5/14","1/8","1/17","5/7","5/24","2/21","2/25","3/18","4/3","6/7","6/10","6/29","2/16","3/7","6/14","6/16","6/24","1/2","3/20","4/12","5/31","2/5","4/1","5/22","6/9","6/15","6/19","6/25","6/30","1/16","3/24","3/30","6/28","1/15","1/23","1/31","2/17","3/25","4/7","1/7","1/14","5/30","6/12","3/11","3/28","5/10","1/1","1/3","2/27","3/19","4/14","4/30","5/28","6/18","1/12","1/18","6/20","4/9","5/4","5/11","5/13","5/16","1/29","2/1","2/9","2/20","5/18","5/19","1/5","1/24","5/26","6/21","3/15","5/15","4/8","4/15","5/27","1/11","4/4","2/19","3/4","3/26","4/27","4/29","6/13","1/19","2/6","3/10","4/17","5/2","2/14","2/22","6/5","2/3","5/1"]), dimension=128, combiner="sum"),

tf.feature_column.numeric_column("m1", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m2", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m3", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m4", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m5", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m6", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m7", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m8", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m9", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m10", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m11", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m12", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m13", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m14", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m15", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m16", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m17", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m18", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m19", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m20", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m21", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m22", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m23", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m24", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m25", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m26", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m27", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m28", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m29", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m30", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m31", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m32", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m33", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m34", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m35", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m36", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m37", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m38", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m39", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m40", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m41", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m42", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m43", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m44", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m45", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m46", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m47", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("m48", shape=[1], dtype=tf.dtypes.float32),

tf.feature_column.numeric_column("class", shape=[1], dtype=tf.dtypes.int64)]}"""

feature_columns = eval(feature_columns_code)

train_max_steps = 0

train_max_steps = None if train_max_steps == 0 else train_max_steps

train(datasource="mysql://yuepf:yuepf123456@tcp(192.195.253.130:3306)/?maxAllowedPacket=0",

estimator_string="""sqlflow_models.DeepEmbeddingClusterModel""", select="""

SELECT * FROM sql_flow_test.activepower_train

""",

validation_select="""""", feature_columns=feature_columns, feature_column_names=feature_column_names, feature_metas=feature_metas, label_meta=label_meta, model_params=model_params_constructed, validation_metrics="Accuracy".split(","), save="model_save", batch_size=256, epoch=1, validation_steps=1, verbose=0, max_steps=train_max_steps, validation_start_delay_secs=0, validation_throttle_secs=0, save_checkpoints_steps=100, log_every_n_iter=10, load_pretrained_model="false" == "true", is_pai="false" == "true", pai_table="", pai_val_table="", feature_columns_code=feature_columns_code, model_params_code_map=model_params, model_repo_image="", original_sql='''

SELECT * FROM sql_flow_test.activepower_train

TO TRAIN sqlflow_models.DeepEmbeddingClusterModel

WITH

model.n_clusters=3,

model.pretrain_epochs=10,

model.train_max_iters=800,

model.train_lr=0.01,

model.pretrain_lr=1,

train.batch_size=256

COLUMN m1,m2,m3,m4,m5,m6,m7,m8,m9,m10,m11,m12,m13,m14,m15,m16,m17,m18,m19,m20,m21,m22,m23,m24,m25,m26,m27,m28,m29,m30,m31,m32,m33,m34,m35,m36,m37,m38,m39,m40,m41,m42,m43,m44,m45,m46,m47,m48

INTO sqlflow_models.my_activepower_train_model;

''',

feature_column_names_map=feature_column_names_map)

==========Output==========

Traceback (most recent call last):

File "", line 823, in

File "/opt/sqlflow/python/runtime/tensorflow/train.py", line 116, in train

load_pretrained_model, model_meta, is_pai)

File "/opt/sqlflow/python/runtime/tensorflow/train_keras.py", line 144, in keras_train_and_save_legacy

validation_steps, has_none_optimizer)

File "/opt/sqlflow/python/runtime/tensorflow/train_keras.py", line 155, in keras_train_compiled

classifier.sqlflow_train_loop(train_dataset)

File "/usr/lib/python3.6/site-packages/sqlflow_models/deep_embedding_cluster.py", line 246, in sqlflow_train_loop

self.pre_train(x)

File "/usr/lib/python3.6/site-packages/sqlflow_models/deep_embedding_cluster.py", line 153, in pre_train

y = x.cache().map(map_func=_concate_generate)

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 1900, in map

MapDataset(self, map_func, preserve_cardinality=False))

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 3416, in init

use_legacy_function=use_legacy_function)

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2695, in init

self._function = wrapper_fn._get_concrete_function_internal()

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1854, in _get_concrete_function_internal

*args, **kwargs)

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1848, in _get_concrete_function_internal_garbage_collected

graph_function, _, _ = self._maybe_define_function(args, kwargs)

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2150, in _maybe_define_function

graph_function = self._create_graph_function(args, kwargs)

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2041, in _create_graph_function

capture_by_value=self._capture_by_value),

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 915, in func_graph_from_py_func

func_outputs = python_func(*func_args, **func_kwargs)

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2689, in wrapper_fn

ret = _wrapper_helper(*args)

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2634, in _wrapper_helper

ret = autograph.tf_convert(func, ag_ctx)(*nested_args)

File "/usr/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper

raise e.ag_error_metadata.to_exception(e)

AttributeError: in converted code:

/usr/lib/python3.6/site-packages/sqlflow_models/deep_embedding_cluster.py:150 _concate_generate * concate_y = tf.stack([dataset_element[feature.key] for feature in self._feature_columns], axis=1) AttributeError: 'EmbeddingColumn' object has no attribute 'key'`

Environment (Please complete the following information):

OS:Windos 10

Browser:Chrome

docker destop

Docker Engine : v20.10.6

Kubernetes: v1.19.7
SQLFLow integrate with Flink

Will SQLFlow integrate with Flink in the future: Pre-process data with Flink and perform TensorFlow deep learning with SQL If the community has such a plan, our team can participate in contributing code Our team is very involved in the Flink community and is interested in AI-related integration

SQLFlow is a compiler that compiles a SQL program to a workflow that runs on Kubernetes.

SQLFlow

What is SQLFlow

Motivation

Quick Overview

How to use SQLFlow

Contributing Guidelines

Roadmap

Feedback

License

Published

Owner

SQLFlow

Comments

[Proposal] Design proposal for adding graph data to SQLFlow database

Part I. Database schema

Part II. Loading data

Rethink syntax extension

Start to jupyter iris-dnn examples, then encounter this problem

Switch the connection parameters while sqlflow is running

Proposal: Managing external & internal parsers

Problem

Solution

Implementation Details

SQLFlow Product Roadmap 2019

Make SQLFlow parser work together with Apache Calcite

Determine the ordering of records for a SELECT statement in MySQL

Potential issue when running tests concurrently

Data sharding problem in distributed XGBoost training using rabit

File '/tmp/sqlflow227199809/input.sql' cannot be read

CVE-2007-4559 Patch

Patching CVE-2007-4559

Add SECURITY.md

The link provided by the documentation is unable to recogniz

run activepower clustering demo in tutorial recieve ' AttributeError: 'EmbeddingColumn' object has no attribute 'key''

run activepower clustering demo in tutorial recieve ' AttributeError: 'EmbeddingColumn' object has no attribute 'key''

-- coding: utf-8 --

feature_column_names_map is used to determine the order of feature columns of each target:

e.g. when using DNNLinearCombinedClassifer.

feature_column_names_map will be saved to a single file when using PAI.

Construct optimizer objects to pass to model initializer.

The original model_params is serializable (do not have tf.xxx objects).

feature_columns_code will be used to save the training informations together

with the saved model.

SQLFLow integrate with Flink

Related tags

write APIs using direct SQL queries with no hassle, let's rethink about SQL

Parses a file and associate SQL queries to a map. Useful for separating SQL from code logic

Go-sql-reader - Go utility to read the externalised sql with predefined tags

Manage SQL databases, users and grant using kubernetes manifests

GORM SQLChaos manipulates DML at program runtime based on gorm

A tool I made to quickly store bug bounty program scopes in a local sqlite3 database

Go package for sharding databases ( Supports every ORM or raw SQL )

Prep finds all SQL statements in a Go package and instruments db connection with prepared statements

pggen - generate type safe Go methods from Postgres SQL queries

🐳 A most popular sql audit platform for mysql

sqlx is a library which provides a set of extensions on go's standard database/sql library

A tool to run queries in defined frequency and expose the count as prometheus metrics. Supports MongoDB and SQL

Dumpling is a fast, easy-to-use tool written by Go for dumping data from the database(MySQL, TiDB...) to local/cloud(S3, GCP...) in multifarious formats(SQL, CSV...).

Universal command-line interface for SQL databases

auto generate sql from gorm model struct

a golang library for sql builder

Fluent SQL generation for golang

SQL Optimizer And Rewriter

Additions to Go's database/sql for super fast performance and convenience.