Sketch library
Sketch is an AI code-writing assistant for pandas’
users that understands the context of your data, greatly improving the
relevance of suggestions. Sketch is usable in seconds and doesn't require
adding a plugin to your IDE.
Here we follow a
"standard" (hypothetical) data-analysis workflow, showing a Natural
Language interface that successfully navigates many tasks in the data stack
landscape.
1.
Data
Cataloguing:
·
General
tagging (e.g. PII identification)
·
Metadata
generation (names and descriptions)
2.
Data
Engineering:
·
Data
cleaning and masking (compliance)
·
Derived
feature creation and extraction
3.
Data
Analysis:
·
Data
questions
·
Data
visualization
How to use
It's as simple as importing
sketch, and then using the .sketch extension
on any pandas data frame.
import sketch
Now, any pandas dataframe you
have will have an extension registered to it. Access this new extension with
your dataframes name .sketch
1. .sketch.ask
Ask is a basic question-answer
system on sketch, this will return an answer in text that is based off of the
summary statistics and description of the data.
Use ask to get an understanding
of the data, get better column names, ask hypotheticals (how would I go about
doing X with this data), and more.
df.sketch.ask("Which columns are integer
type?")
2. .sketch.howto
Howto is the basic
"code-writing" prompt in sketch. This will return a code-block you
should be able to copy paste and use as a starting point (or possibly ending!)
for any question you have to ask of the data. Ask this how to clean the data,
normalize, create new features, plot, and even build models!
df.sketch.howto("Plot the sales versus
time")
3. .sketch.apply
apply is a more advanced prompt
that is more useful for data generation. Use it to parse fields, generate new
features, and more. This is built directly on lambda prompt. In order to use this, you will
need to set up a free account with OpenAI, and set an environment variable with
your API key. OPENAI_API_KEY=YOUR_API_KEY
df['review_keywords'] =
df.sketch.apply("Keywords for the review [{{ review_text }}] of product
[{{ product_name }}] (comma separated):")
df['capitol'] = pd.DataFrame({'State': ['Colorado',
'Kansas', 'California', 'New York']}).sketch.apply("What is the capitol of
[{{ State }}]?")
Sketch
currently uses prompts.approx.dev to
help run with minimal setup
You can also directly use a few
pre-built hugging face models (right now MPT-7B and StarCoder), which will run entirely locally
(once you download the model weights from HF). Do this by setting environment 3
variables:
os.environ['LAMBDAPROMPT_BACKEND'] = 'StarCoder'
os.environ['SKETCH_USE_REMOTE_LAMBDAPROMPT'] =
'False'
os.environ['HF_ACCESS_TOKEN'] = 'your_hugging_face_token'
You can also directly call OpenAI
directly (and not use our endpoint) by using your own API key. To do this, set
2 environment variables.
SKETCH_USE_REMOTE_LAMBDAPROMPT=False (2) OPENAI_API_KEY=YOUR_API_KEY
How it
works Sketch uses efficient
approximation algorithms (data sketches) to quickly summarize your data, and
feed that information into language models. Right now, it does this by
summarizing the columns and writing these summary statistics as additional
context to be used by the code-writing prompt. In the future we hope to feed
these sketches directly into custom made "data + language" foundation
models to get more accurate results.
Demo done
with sketch library:
Summary
The sketch library looks very promising for
integrating the power of AI within Jupyter Notebooks or an IDE. Even though a
few issues cropped up whilst writing this article, we have to bear in mind it
is still a newish library that is still actively being developed. It will be
interesting to see where this library will go over the coming months. As with
any AI-based tools in this current time, caution is always needed, especially
when relying on the answers it generates. However, even with that caution,
these systems can have numerous benefits, including helping jog your memory if
you forget a function call or creating quick plots without writing code.
ISME Student Doing internship with Hunnarvi
Technologies Pvt Ltd under guidance of Nanobi data and analytics. Views are
personal.
Comments
Post a Comment