Capture and embed PDF files with the OneNote API
This topic provides background and platform-specific code showing how to both embed and display a PDF file in a captured OneNote notebook page using the <img> and <object> tags with the Microsoft OneNote API.
Last modified: January 22, 2016
Applies to: OneNote service
In this article
Embed and Render PDF files in an Android app
Embed and Render PDF files in an iOS app
Embed and Render PDF files in a Windows Phone app
Embed and Render PDF files in a Windows Store app
Embed and Render PDF files on a page using REST
Note
|
|---|
|
See this content on our new documentation site for consumer and enterprise OneNote APIs. |
When you're building a scanner app, or in other situations where the captured data comes in a PDF file, often you'll need to include the file in two different ways on the same OneNote page. The first way is to embed the PDF file as an object, so the user can save and copy the file directly. The second way is to have the API render the pages of the PDF file and place the images on that same OneNote page. By including the file data in a MIME part, and then referring to that data part in both <object> and <img> tags in your Presentation HTML block, the Microsoft OneNote API will do both. You only need to upload the PDF data once to use it in both ways.
The following multi-part POST code shows how to do this.
Content-Type:multipart/form-data; boundary=MyAppPartBoundary Authorization:bearer tokenString --MyAppPartBoundary Content-Disposition:form-data; name="Presentation" Content-type:text/html <!DOCTYPE html> <html> <head> <title>A page with an embedded and displayed PDF file</title> </head> <body> <p>Attached is the lease agreement for the expanded offices!</p> <object data-attachment="OfficeLease.pdf" data="name:OfficeLeasePartName" type="application/pdf" /> <p>Here's the contents of our new lease.</p> <img data-render-src="name:OfficeLeasePartName" width="900"/> </body> </html> --MyAppPartBoundary Content-Disposition:form-data; name="OfficeLeasePartName" Content-type:application/pdf ... PDF binary data ... --MyAppPartBoundary--
The code has two MIME parts: the Presentation part that contains the page HTML, and the OfficeLeasePartName part that contains the PDF document.
Tip |
|---|
Some PDF files are too large for a single OneNote page. When that happens, the API will add as many of the page images as will fit, and then embed that file on the page. If your Presentation HTML already embeds the file using an <object> tag, it will only be embedded once. |
In the Presentation HTML, the <object> tag embeds the PDF as a file that the user can copy directly out of the notebook page. It requires three attributes:
data-attachment="embeddedFilename.pdf" sets the file name and extension displayed on the OneNote page.
data="name:multiPartBlockName" gives the part name in the request that contains the binary file contents. The OneNote API does not support passing a URL reference here.
type="standardMimeType" indicates the MIME type of the file. For PDF files, this should be application/pdf. This is used to select the file icon on the page, and also determines which application starts when the user activates (starts, double-clicks, etc.) the file on the device from OneNote.
The <img> tag with the data-render-src attribute renders the pages of the PDF data as individual pages, placing each one onto the page in order. Be aware that to display the pages of a PDF file, you must use the data-render-src="name:BlockName" form. Using the <img> tag to render a PDF file using an internet URL won't work; for example <img data-render-src="http://www.example.com/ThisWillNotWork.pdf"/>.
The <img> tag uses the following attributes. If you don't specify either height or width, the API will use the page size from the PDF data, if those sizes are available:
data-render-src="name:multiPartBlockName" works like it does in the object tag, giving the part name where the PDF data resides. You can use the same block name for both the image and object tags.
height="1000" specifies the height, in pixels, of the rendered page images. This attribute is optional. This sets the total size of all the page images together. If you want each page to be 1000 pixels tall, and there are four pages, set the height to 4000 pixels.
width="900" gives the width, in pixels, of the rendered page images. Each image will be that wide. This is an optional attribute.
Remember these limits when you're embedding files in a OneNote API capture. We're working to remove or expand these limits, but for now they are:
Total POST size limit is ~70 MB, including file and other data. Captures with data more than that limit may make your app and captures unreliable, so be careful.
MIME part size limit is 25 MB. Larger data blocks will be rejected by the API. This applies to both images and file-data parts, and the size include the part headers.
Image limit is 30 per page. When using the src="internetURL" attribute, the API ignores <img> tags beyond the limit.
MIME parts limit is 6 per POST. That includes the Presentation HTML part.
Maximum number of <img/> and <object/> tags using data-render-src is 5…or 1. That is, images can have up to 5. But only 1 PDF document can be displayed via an <img data-render-src="name:blockName"/> tag. Additional rendered images and embedded files are ignored.
File-type icons are predefined. The OneNote API recognizes a wide variety of common file types, and embeds the file using predefined icons. If the API doesn't recognize the file type, it uses a generic file icon.
For more information about the <object> and <img> tags, see the https://msdn.microsoft.com/en-us/library/vs/alm/dn575442.aspx reference topic.
Important |
|---|
Before POST requests like the ones shown here can succeed, you need to Get a client ID for use with the OneNote API (or package ID for a Windows Store application), and your app has to Authenticate the user for the OneNote API. If you don't supply a valid OAuth token with your request, it will definitely fail. |
Tip |
|---|
As with most code samples in documentation, these codes should not be considered production-ready code. Things like detailed user-input validation have been left out to make it easier to understand the code flow. Carefully review your code for potential code-quality and security issues before you publish your app. |
The following code builds a multi-part request that contains a "Presentation" part with HTML, and a second part with PDF file data. In this example, a document file is compiled into the app, and will be inserted into the page as an embedded file, and then below that each page will be rendered as a bitmapped image.
In this code, the binary PDF data is passed to this member as the imageBinaryAsString parameter. The createPageWithImage function is adapted from the SendPageCreateAsyncTask class in the Android sample on Github. For more information, see Get the OneNote API sample applications.
/** * Creates a page with a PDF document file attachment and with the same PDF rendered on the page * @return The response received from the OneNote Service API for the create page operation */ public ApiResponse createPageWithAttachmentAndPdfRendering() { String attachmentPartName = "pdfattachment1"; InputStream is = null; try { this.postMultipartRequest(PAGES_ENDPOINT); String date = getDate(); String requestHtml = "<html>" + "<head>" + "<title>A page with a file attachment (Android Sample)</title>" + "<meta name=\"created\" content=\"" + date + "\" />" + "</head>" + "<body>" + "<p>This page contains a pdf file attachment</p>" + "<object data-attachment=\"attachment.pdf\" data=\"name:" + attachmentPartName + "\" />" + "<p>Here's the content of the PDF document :</p>" + "<img data-render-src=\"name:" + attachmentPartName + "\" alt=\"A beautiful logo\" width=\"1500\" />" + "</body>" + "</html>"; is = assetManager.open("attachment.pdf"); byte[] attachmentContents = readFromStreamToByteArray(is); //Add a part that contains the HTML content for this request and refers to another part for the file attachment this.addPart("Presentation", "text/html", requestHtml); //Add the content of the document file attachment in a separate part referenced by the part name this.addBinaryPart("pdfattachment1", "application/pdf", attachmentContents); this.finishMultipart(); ApiResponse response = this.getResponse(); return response; } catch (Exception createPageException) { createPageException.printStackTrace(); return null; } finally { if(is != null) { try { is.close(); } catch (IOException e) { e.printStackTrace(); } } } }
The following code builds a multi-part request that contains a "Presentation" part with HTML, and a second part with PDF file data. In this example, a document file is compiled into the app, and will be inserted into the page as an embedded file, and then below that each page will be rendered as a bitmapped image.
This code is adapted from the ONSCPSCreateExamples class in the iOS code sample on GitHub. For more information, see Get the OneNote API sample applications.
- (void)createPageWithAttachmentAndPdfRendering {
[self checkForAccessTokenExpiration];
NSString *attachmentPartName = @"pdfattachment1";
NSURL *attachmentURL = [[NSBundle mainBundle] URLForResource:@"attachment" withExtension:@"pdf"];
NSString *date = dateInISO8601Format();
NSData *fileData = [NSData dataWithContentsOfURL: attachmentURL];
NSString *simpleHtml = [NSString stringWithFormat:
@"<html>"
"<head>"
"<title>A simple page with an attachment from iOS</title>"
"<meta name=\"created\" content=\"%@\" />"
"</head>"
"<body>"
"<h1>This is a page with a PDF file attachment</h1>"
"<object data-attachment=\"attachment.pdf\" data=\"name:%@\" />"
"<p>Here's the contents of the PDF document :</p>"
"<img data-render-src=\"name:%@\" alt=\"Hello World\" width=\"1500\" />"
"</body>"
"</html>", date, attachmentPartName, attachmentPartName];
NSData *presentation = [simpleHtml dataUsingEncoding:NSUTF8StringEncoding];
NSMutableURLRequest *request = [[AFHTTPRequestSerializer serializer] multipartFormRequestWithMethod:@"POST" URLString:PagesEndPoint parameters:nil constructingBodyWithBlock: ^(id <AFMultipartFormData>formData) {
[formData
appendPartWithHeaders:@{
@"Content-Disposition" : @"form-data; name=\"Presentation\"",
@"Content-Type" : @"text/html"}
body:presentation];
[formData
appendPartWithHeaders:@{
@"Content-Disposition" : [NSString stringWithFormat:@"form-data; name=\"%@\"", attachmentPartName],
@"Content-Type" : @"application/pdf"}
body:fileData];
}];
if (liveClient.session) {
[request setValue:[@"Bearer " stringByAppendingString:accessToken] forHTTPHeaderField:@"Authorization"];
}
[NSURLConnection connectionWithRequest:request delegate:self];
}
The following code builds a multi-part request that contains a "Presentation" part with HTML, and a second part with PDF file data. In this example, a document file is compiled into the app, and will be inserted into the page as an embedded file, and then below that each page will be rendered as a bitmapped image.
/// <summary>
/// Creates a OneNote page with a PDF document attached and rendered
/// </summary>
private async void btn_CreateWithAttachmentAndPdfRendering_Click(object sender, RoutedEventArgs e)
{
const string attachmentPartName = "pdfattachment1";
string date = GetDate();
string attachmentHtml = "<html>" +
"<head>" +
"<title>A page created with a file attachment (WinPhone8 Sample)</title>" +
"<meta name=\"created\" content=\"" + date + "\" />" +
"</head>" +
"<body>" +
"<h1>This is a page with a PDF file attachment</h1>" +
"<object data-attachment=\"attachment.pdf\" data=\"name:" + attachmentPartName + "\" />" +
"<p>Here's the content of the PDF document :</p>" +
"<img data-render-src=\"name:" + attachmentPartName + "\" alt=\"Hello World\" width=\"1500\" />" +
"</body>" +
"</html>";
// Create the attachment part - make sure it is disposed after we've sent the message in order to close the stream.
Stream attachmentStream = GetAssetFileStream("Assets\\attachment.pdf");
using (var attachmentContent = new StreamContent(attachmentStream))
{
attachmentContent.Headers.ContentType = new MediaTypeHeaderValue("application/pdf");
HttpRequestMessage createMessage = new HttpRequestMessage(HttpMethod.Post, PagesEndpoint)
{
Content = new MultipartFormDataContent
{
{new StringContent(attachmentHtml, System.Text.Encoding.UTF8, "text/html"), "Presentation"},
{attachmentContent, attachmentPartName}
}
};
// Must send the request within the using block, or the attachment file stream will have been disposed.
await SendCreatePageRequest(createMessage);
}
}
The following code builds a multi-part request that contains a "Presentation" part with HTML, and a second part with PDF file data. In this example, a document file is compiled into the app, and will be inserted into the page as an embedded file, and then below that each page will be rendered as a bitmapped image.
/// <summary>
/// Create a page with a PDF document attached and rendered
/// </summary>
/// <param name="debug">Determines whether to execute this method under the debugger</param>
/// <returns>The converted HTTP response message</returns>
async public Task<StandardResponse> CreatePageWithPDFAttachedAndRendered(bool debug)
{
if(debug)
{
Debugger.Launch();
Debugger.Break();
}
var client = new HttpClient();
string date = GetDate();
//Note: API only supports JSON return type
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
// This allows you to see what happens when an unauthenticated call is made.
if (IsAuthenticated)
{
client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", _authClient.Session.AccessToken);
}
const string attachmentPartName = "pdfattachment1";
string attachmentRequestHtml = "<html>" +
"<head>" +
"<title>A page created with a PDF document attached and rendered</title>" +
"<meta name=\"created\" content=\"" + date + "\" />" +
"</head>" +
"<body>" +
"<h1>This is a page with a PDF file attachment</h1>" +
"<object data-attachment=\"attachment.pdf\" data=\"name:" + attachmentPartName + "\" />" +
"<p>Here's the content of the PDF document :</p>" +
"<img data-render-src=\"name:" + attachmentPartName + "\" alt=\"Hello World\" width=\"1500\" />" +
"</body>" +
"</html>";
HttpResponseMessage response;
using (var attachmentContent = new StreamContent(await GetBinaryStream("assets\\attachment.pdf")))
{
attachmentContent.Headers.ContentType = new MediaTypeHeaderValue("application/pdf");
HttpRequestMessage createMessage = new HttpRequestMessage(HttpMethod.Post, PagesEndPoint)
{
Content = new MultipartFormDataContent
{
{new StringContent(attachmentRequestHtml, System.Text.Encoding.UTF8, "text/html"), "Presentation"},
{attachmentContent, attachmentPartName}
}
};
// Must send the request within the using block, or the binary stream will have been disposed.
response = await client.SendAsync(createMessage);
}
return await TranslateResponse(response);
}
If your app is running in a mobile device, scanner or camera, and you want to both embed and display PDF files in OneNote pages, you will want to upload the file data directly to the API from the device. You do this by embedding the file data in a multi-part message, with separate parts for each file.
In the HTML of your request's "Presentation" part, include an object tag to embed the file, and an img tag to display the contents as bitmapped images. In the object tag data attribute, use the special syntax "name:RequestBlockPartId", where RequestBlockPartId is an alphanumeric identifier of a part in your multi-part request. This next example shows how to do that, using the string as the identifier.
Content-Type:multipart/form-data; boundary=MyAppPartBoundary Authorization:bearer tokenString --MyAppPartBoundary Content-Disposition:form-data; name="Presentation" Content-type:text/html <!DOCTYPE html> <html> <head> <title>A page with an embedded and displayed PDF file</title> </head> <body> <p>Attached is the lease agreement for the expanded offices!</p> <object data-attachment="OfficeLease.pdf" data="name:MyAppFileBlockName " type="application/pdf" /> <p>Here's the contents of our new lease.</p> <img data-render-src="name:MyAppFileBlockName " width="900"/> </body> </html> --MyAppPartBoundary Content-Disposition:form-data; name="MyAppFileBlockName" Content-type:application/pdf ... embedded PDF file binary data ... --MyAppPartBoundary--
If you need to embed multiple PDF files this way, each part Id has to be unique within the POST request.
The PDF file data is the binary file data; don't use Base64 or otherwise encode it.
Note
Important