Files
eleakxir/leak-utils
2025-09-27 19:32:10 +02:00
..
2025-09-27 19:31:51 +02:00
2025-09-27 19:31:51 +02:00
2025-09-27 19:32:10 +02:00
2025-09-24 17:20:03 +02:00
2025-09-24 17:30:02 +02:00
2025-09-24 19:54:32 +02:00
2025-09-24 19:54:32 +02:00
2025-09-24 17:20:03 +02:00

🛠 leak-utils: The Eleakxir Data Utility Toolkit

leak-utils is a powerful command-line tool built to help you manage, process, and optimize data leaks for use with the Eleakxir search engine. It provides a suite of utilities for data cleaning, format conversion, and file manipulation, all designed to ensure your data wells are efficient and standardized.

leak-utils is written in Go and leverages DuckDB for its high-performance in-memory processing, ensuring fast and reliable operations on large datasets.

🚀 Features

  • Parquet File Management: Clean and inspect existing .parquet files.
  • Format Conversion: Seamlessly convert .csv, .txt, .json files into the optimized .parquet format.
  • Schema Uniformity: Tools designed to help you standardize and normalize your data to align with the Eleakxir data leak normalization rules. This ensures a consistent schema across all your files, which is crucial for efficient searching and consistent results.
  • High Performance: Built with Go and DuckDB for fast and efficient data processing.

⚙️ How to Use

The tool operates via a single executable with different commands, each corresponding to a specific action. You can find the executable in the leak-utils directory of the Eleakxir project.

Install

With go

go install "github.com/anotherhadi/eleakxir/leak-utils@latest"

With Nix/NixOS

Click to expand

From anywhere (using the repo URL):

nix run "github:anotherhadi/eleakxir#leak-utils" -- action [--flags value]

Permanent Installation:

# add the flake to your flake.nix
{
  inputs = {
    eleakxir.url = "github:anotherhadi/eleakxir";
  };
}

# then add it to your packages
environment.systemPackages = with pkgs; [ # or home.packages
    eleakxir.packages.${pkgs.system}.leak-utils
];

Available Actions

cleanParquet

Optimizes and cleans an existing Parquet file. This can be used to change columns, clean rows, ...

See:

leak-utils cleanParquet --help

infoParquet

Displays metadata and schema information for a given Parquet file. Useful for inspecting file structure and column types.

csvToParquet

Converts a .csv file into a highly compressed and efficient .parquet file. This is the recommended way to prepare your data for Eleakxir.

mergeFiles

Merges multiple files (of the same type) into a single, larger file. This is useful for combining smaller data leaks.

removeUrlSchemeFromUlp

This utility prevents the colon (:) in URL schemes like https:// from being mistakenly parsed as a column separator when processing ULP data in flat files like CSV or TXT.

🤝 Contributing

Contributions to leak-utils are welcome! Feel free to open issues or submit pull requests for new features, bug fixes, or performance improvements.