> ## Documentation Index > Fetch the complete documentation index at: https://docs.search-artemis.com/llms.txt > Use this file to discover all available pages before exploring further. # Understanding Datasets > Learn about datasets in Artemis Search and how they power intelligent searches We have provided a limited sample dataset for you to explore with. It contains \~19,000 company descriptions. In the case you are unable to find matches for your searches, we recommend uploading a bigger dataset. ## What is a Dataset? A Dataset stores the content that Artemis Search will search over. Datasets must be pandas dataframes exported as a parquet file with an "embedding" column containing OpenAI text-large-3 embeddings and a "tag" column with associated string values. Artemis Search only searches through activated datasets. ## Key Concepts Vector representations of text, allowing for semantic similarity comparisons. Metadata associated with each embedding, returned as content in search results. **This is the only column returned in search results.** Columns in the dataset that can be used to filter the search results. Datasets are linked to specific projects for organized search tasks. ## Dataset Activation Within a project, you can have multiple datasets, but only one can be active at a time. The active dataset is the one used for search queries in that project. At least one dataset must be active for a project to be operational and allow searches. ## Creating a Dataset Datasets are simple to prepare but do require a few steps. Since there are many different workflows for creating datasets, we will present an example workflow for creating a dataset. We are happy to help prepare your dataset for you if you reach out to us at [pallavi@artemisar.com](mailto:pallavi@artemisar.com). ### Background Suppose we have the following dataset stored in a Pandas dataframe: | company\_description | company\_name | id | | --------------------------------------------------- | ------------------- | -- | | 'Acme is a startup that makes widgets' | 'Acme' | 1 | | 'Wayne Enterprises is a startup that makes widgets' | 'Wayne Enterprises' | 2 | | 'Parker Industries is a startup that makes widgets' | 'Parker Industries' | 3 | ### Example Workflow Artemis Search datasets consist of embeddings of the text you want to search over as well as associated string tags for each embedding. These tags could represent IDs, names, or any other unique information associated with your embedding. In our case, it would make the most sense to use `company_description` as the text we are embedding and the `id` as the tags. This choice makes sense since we want to be able to search over the company descriptions and the ids uniquely identify each company. Ultimately, we need to end up with Pandas dataframe exported as a parquet file where one column is 'embedding' and the other is 'tag'. Given our initial dataframe, we will need to transform the `company_description` column into embeddings and the `id` column into string tags, and then store the result as a parquet file. ```python theme={null} import pandas as pd from openai import OpenAI # Load Initial Dataframe df = pd.read_csv('company_descriptions.csv') # Create Embeddings client = OpenAI(api_key=env.process.API_KEY) text_to_embed = df['company_description'].tolist() embedding_responses = client.embeddings.create(input=text_to_embed, model='text-embedding-3-large') embeddings = np.vstack([embedding.embedding for embedding in embedding_responses.data]) # Create Final Dataframe df['embedding'] = embeddings df['tag'] = df['id'].astype(str) # Save as a Parquet File df.to_parquet('data.parquet') ``` ## Filter Columns Filter columns are columns in the dataset that can be referenced in the "filter\_query" parameter of a search to filter the search results. Practically, they are any column in the dataset that is not named `embedding` or `tag`. These columns cannot be named `embedding` or `tag` since these are reserved column names. ### Example Consider the following dataset stored in a Pandas dataframe: | embedding | tag | size | | --------- | ------------------- | ---- | | \[...] | 'Acme' | 1 | | \[...] | 'Parker Industries' | 3 | You can see we have the `embedding` and `tag` columns that we need. However, we also have an `size` column. We would call this column a `filter_column` since it is not used for searching directly but can be used to filter the search results. You can read more about filter queries [here](/search/search-parameters#filter-query). ## Next Steps Now that you understand the basics of datasets in Artemis Search, learn how to: Create a dataset for your project with your custom data View, activate, edit, and delete your datasets